Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: WorldBrain – full text, local search of your browsing history (worldbrain.io)
229 points by _mlxl on Aug 12, 2018 | hide | past | favorite | 69 comments



One of the reasons I keep coming back to Firefox is that Firefox seems to already be doing this. At least it searches all parts of the URL and the title, so I'm 99% successful at getting the right URL in the awesomebar when I'm looking for something.

Chrome is just horrible when it comes to this, and I can never get back to previous pages when searching via the addressbar.


On launch this was one of our banner features in Chrome -- full text search over your browsing history. I wrote some of the code. I believe we (Google's Chrome team) implemented (and contributed back) SQLite's full text search support exactly to make this feature.

I don't know why it was removed (after my time) but I do know it had a lot of problems.

1) Most users were unaware of the feature, and it's hard to find a way for them to discover it.

2) The index costs a lot of disk space, which makes #1 worse. There's a whole bunch of tradeoffs around how much storage to use vs how much history to keep. Pages that self-reload can cause the index to bloat endlessly (a real bug we had).

3) Having an index of pages locally is not sufficient to make a useful search engine. There's a lot of ranking involved in making google search good. Similarly we tried to show "snippets" of page text that showed why we showed the results we did and that itself requires a lot of effort to be useful.

#2 and #3 are just bugs that can be fixed with more effort, but it's hard to motivate that effort in the presence of #1.

(FWIW, the sibling comment about how it was removed to favor google.com searches isn't plausible to me, that's not how the Chrome team works.)


I'd be happy if it did it just for bookmarked pages. I have often wanted that in bookmark search and it could be incorporated directly into the bookmark searchbar.


Me too! In fact, when I think about it, bookmarks are such a nebulous thing. It's like I want them to do two diametrically opposed jobs.

On the one hand, when I have a bookmark for a site like HN, I want it to give me a quick way to get to the front page and see the latest posts so I can read and discuss them. In this case, the bookmark is serving like a pointer to an address whose contents are subject to change.

On the other hand, I bookmark pages with specific information I want to keep so I can refer back to it later. This is where bookmarks often fail me. Websites are transient by nature and bookmarks are extremely vulnerable to link rot. What I think I'd really want is a bookmark that saves an archive of the page so that I'll always have access to the information in its original form, even if the page changes or the site goes down. In this case, the bookmark is functioning more like a constant than a variable.

I'm aware that browsers such as Safari have the ability to save a page as a web archive file which includes all the data needed to render the page in its original form. The problem is that this feature completely punts on the issue of managing a set of these web archives, delegating that task to the Finder. I want a UI that is more integrated into the browser. If I bookmark a site it should save and manage the archive automatically. Full text search on all my bookmark archived pages should be built into the address bar. Perhaps in order to avoid the conflict between the two uses of bookmarks described above there could be a difference between favourites and bookmarks.

I don't know, does anyone else have any thoughts about this stuff?


Oli here from the team developing Memex.

thanks for speaking my mind here :)

We built Memex for that combination of use cases in mind. So right now you can already full-text search your bookmarks, and filter by time, domain and tags. Already on the mid-term roadmap we plan to enable full-html/text snapshots of visited pages, both locally and on-demand. For the latter we are potentially working with the Internet Archive.


Agree with #3, I built a similar tool with Rails/Elastic search/React couple of years back. It never returned the most relevant results on top. Realized Elastic search can only do that if I could add back-links for ranking and for adding back-links (I could be wrong) I would have to crawl the entire internet.


Chrome does this deliberately because they want you to do another google search instead of looking at your history.


One of the other responses here was from a (former?) chrome dev and they said the feature was implemented, had a lot of issues that could be eventually fixed but cost too much dev time to be worthwhile, and nobody used the feature anyways. While the dev said he didn't know why it got dropped, it was likely the lack of use and excess cost.


Wow, that is eye-opening to me!


Citation?


Well just try it yourself, it’s really that simple.

On the other hand, Chromium (the open-source version of Chrome) does this too so maybe it’s simply a UX decision rather than an evil plot to send more data to Google, though I personally believe that it’s the latter…


Sadly, even in Firefox now there seems to be no way to make the address bar never do a search :(.


Well if you make DuckDuckGo your standard search engine you only get search term suggestion without actually executing the search if you don't choose one of the terms or hit enter with a non-url in the bar - not sure if there is anything negative about this feature


I already have it set to not show any search suggestions since if I wanted that I would use the search box. I want it to not send the data anywhere if I hit enter (by accident, because I pasted what I thought was going to be a domain name but turned out to be something else) and it isn't a possible domain name or ip address. Not super high impact (accidently pasting a password I assume would generate a DNS lookup anyway) but it fells like a completely gratutious invasion of privacy when I have already indicated in multiple ways that I do not want to search from the address bar.


After searching my history for years using the awesomebar, I am pretty confident Firefox also keeps track of the queries themselves, and the choice you make in relation to the query.

Say there's two websites I visit regularly with similar names? If I usually load one after typing three characters, but load the other one after typing four characters, Firefox will present the former first in the first case, and the latter first in the second case.


Firefox does indeed keep track of the queries. They're located in the places.sqlite file in your profile folder, in the table moz_inputhistory.


In Chrome if you open the history and hit "show full history," you get a search bar that works better for history.


But you can’t search inside the content. It just searches the titles.


Does Firefox search IN the content of the pages you visited?

Almost can't imagine and in my test just now, that did not work.


Where's "show full history"? I don't see anything resembling that, nor any obscure dropdowns or popouts that might be hiding it.


The "show full history" menu option in History menu they mentioned is Mac only. On Windows, you can see the "show full history" option by holding the Go Back button at top left (hover your mouse on it you'll see a tooltip suggesting this).

The sad part is, that option just opens the built-in History page, just like pressing Ctrl + H, which I believe that's not what we wanted at all.

I think Chrome just redefined "full" while it regularly deletes history entries older than three months.


Ah, right, Linux has "show full history" in the Back button menu (which can also be opened by right-clicking).

And yeah, ^H isn't fun(/useful).


⌘+Y On the mac. It's the bottom option in the history menu.


The one thing I do miss in Firefox, though, is the chronological listing. I find my memory of the rough time, and sequence of events leading to an item, is usually stronger than my memory of the specific content.


If you press "view" on the top right of the history sidebar, and select "by last visited," it'll be in chronological order.


Good tip, but I still cannot search for a page to get to that timestamp and then scroll up/down in time to find something.


Hey, thanks for that! Might not be quite at Chrome's level, but a heck of an improvement over the grouped version.


No matter what firefox does well, it isn't going to keep converts as long as their default background is so blindingly white.

I still don't understand why firefox refuses to slightly gray out their background by default. Chrome puts less strain on my eyes and that's far more important than anything firefox does well.

Every time I try to stick with firefox, my eyes get strained and eventually I go back to chrome.


I’ve tried to find a solution to bookmark searching for a while. I’ve never found a product I liked or trust. Lately I’ve been manually adding bookmarks to a custom google search engine. I’m considering building an extension that will add them directly or sync chrome bookmarks. I figure google already knows what I’ve searched, so I feel much less sketchy about it.

Would this extension be interesting to anyone? It would be very simple, open source, and have no middle man. It would send links directly to a google CSE via their API.


I would have privacy concerns, because I'm not using Google all the time for my searches and there is no need to give them even more data.

The only solution I'd accept is one where all data is stored and indexed locally and there are good guarantees that deleted entries are actually deleted and/or wiped from the indices.


I’m not suggesting that you have to use google for the originating search. But you would need to use CSE to index any content you wanted to search for later. Arguably you could do this with a new google account if you’re concerned.


A few years ago I would have been excited about this, but I personally won't be giving Google any more data unless I'm forced to. I'd love a self-hosted and local-first option, but either way I wish you good luck on your potential project.


This product appears to index your bookmarks, did you see that?


Yup. I’ve read about the product but haven’t tried it. I’ll likely check it out.

My issue with these services is (and maybe this one is different) is that I have to run 3rd party software and extensions and I never know how long these companies/products will stay around.


Oli here from the team building Memex.

Yeah as mentioned the tool allows you to do bookmark full-text search. It's open-source, so it will stay no matter what. We build for resilience and for us as the WorldBrain.io company not needing to stick around in order for the service to survive. We see ourselves as the stewards of this tool, not the sole benefactor or proprietor.

Hope it will be helpful to you :)


I get the point of this, but me personally, I prefer to have aspects of me forgotten/gone rather than remembered, stored and searchable in the future. Yes, it's true that, about once every two months, I am looking for something that I swear I came across on the Internet at some point. However, the rest of the time, I'm able to re-find it just by doing another search, whether on search-engine-of-choice, or a search box on particular-website (e.g. socnet, stackoverflow, reddit, github, hacker news...).


Extension source code -> https://github.com/worldbrain/Memex


To think that the classic Opera browser used to do this within the search bar... man, good times those.


this project has been around for a while, see also some interesting comments: https://news.ycombinator.com/item?id=13427360


Looks like I can mark my Softwarerecs question as answered. hahah.

https://softwarerecs.stackexchange.com/q/46270/16751

While I've been waiting for this, I've been using 'Export History (2.2)' browser extension in Chrome. This saves your history to CSV or JSON.


OH! Not so fast. After a test drive, I think the Results page needs to have a button to use the same search on the web if no (useful) results are found.


Nice! I hacked something together to do this 12-odd years ago. I had something to save the pages I visit to folder, and then a desktop search tool dtSearch that I bought to index them. It was invaluable when I really needed it, but too awkward to be really useful. Often it ended up being easier to find the page again with Google if I remembered something unique about it.

This extension looks like it could finally make it convenient enough to be more commonly useful, and privacy focused enough that I'm willing to try it. Great work! Are you planning to charge something for this in the future, or for extra features? I would definitely be willing to pay for it.


Hey eloff,

Oli here, from the team developing Memex. Memex is open-source, so the browser extension will always stay free to use.

What we will charge for are some of the services that require us to host stuff. Like backups, multi-device syncing, API calls etc. We will run it as a completely modular pricing model, where you can upgrade on only those features you need. We don't like those usual 3 tier model, where you have to upgrade to the 'monster mega plan' in order to just get one feature :) But you can also completely self-host that, as we will make the server software open-source as well.

Hope Memex can be useful to you. We are running a crowdfund to support its development, where we offer some good discount on the future features in return: worldbrain.io/pricing

Let me know if I can be of more help.


What about Evernote, doesn't it already do this? And how's the search quality versus this extension?


Seems like the Falcon extension has not gotten any updates in the last year or two:

https://github.com/lengstrom/falcon


The landing page emphasizes the word "focussed", with double S's, which didn't look right to me. Apparently both "focused" and "focussed" are acceptable, with the single S version "focused" being highly preferred [0].

[0] http://www.future-perfect.co.uk/grammar-tip/is-it-focussed-o...


TIL, was about to say the same thing.


I nice idea, but an open source browser plugin with all local storage would fit my needs better.

I prototyped something roughly like this several years ago. I wrote a simple Firefox plugin that communicated with a locally running server written in Closure with a Clojurescript web app for browsing that used the same server backend. I stopped working on the because services like Evernote do a better job, at the loss of some privacy.

Edit: I didn’t intend to imply that Evernote reads or uses user data.



The description is horrible on their github page, do you know if this integrates your browser history into recoll or what exactly?


Good news :) This tool is open-source, and runs fully locally (except, as pointed out the feature that enables you to share quotes, because for that stuff you unfortunately need a server, still)

We custom built a search technology on IndexedDB and Dexie.js, which is capable of indexing around 5 years of your personal web-research locally in the browser.


> I nice idea, but an open source browser plugin with all local storage would fit my needs better.

This seems like it's local search. Are you saying you wish it was open source? The only cloud feature seems to be something about highlighting.


It would be great to be able to change the search engine keyword from 'w' to something else. My brain has hardwired that to the Wikipedia search.

Otherwise, I think this is super!


This seems a very interesting project. I'll see how this works in practice.

It starts to get icky when you notice the devs have a business plan to outsource the indexing to the cloud, but there's a commitment to keep the servers parts open source too, so that you can self host.


Oli here from the team building Memex.

Yeah indeed, having stuff in the cloud is not ideal when it comes to privacy and centralisation.

One of our core values is privacy and data ownership. So we do our best to make our business not dependent on (analysing and selling) your data, and instead provide you with service value you're willing to pay for. We are built with interoperability in mind, that will allow you to switch providers of Memex and Memex Cloud without frictions, in case there are breaches of trust, or simply better service.

We follow these values by currently building for offline first usage, where your data is locally indexed and searchable primarily. With our search technology you'll be able to get up to 5 years of your research done in the browser.

For the cloud part, it is unfortunately not yet possible to do performant search on encrypted data, otherwise it would not be such a big issue to have your index in the cloud. Equally unfortunate is that it comes with a lot of drawbacks to replicate all your data on all nodes, as opposed to have a central point to query. Especially when we are looking at phone usages. There it is really not practical, so there is a need to have some sort of cloud - UX is still very important. Most people can't be bothered with the drawbacks of decentralised and distributed systems (yet). We hope to get that switch in multiple smaller steps that guide (non-technical) users through a smooth transition to a Memex system that is as distributed as possible. (Check out Dat https://datproject.org/, a technology we likely use to make that first step possible)

And as you already noted, this stuff will be self-hostable. We see ourselves as a service provider first and want to serve people who can't/don't want to run their own server. A bit like the Wordpress model.

You can read more about our approaches to running this business in our vision post: worldbrain.io/vision


Thanks for the long response.

I was wondering about mobile usage too, and agree that having a server somewhere available for queries is the best solution. Since this deals with such private data, having a possibility for self-hosting is the correct solution.


Wouldn't it be a better alternative to use an autosave plugin for the bookmarks and use a local indexer like DocFetcher or OpenSemanticSearch through the browser? Then you can also search other resources and files.


"Search every word of every website you visited"

That will most likely return a lot of hits for sites I didn't like and would like to forget about.


But a normal web search already has this trait, which is presumably how you ended up on the garbage site in the first place. If you eventually found your result, a more narrow search can help you re-find it a few months down the road.


Oli here, from the team developing Memex.

Yeah indeed, just full-text search can let you end up with a lot of garbage. This is why it is so important, that you can search for various other "vague memories" to narrow down your search.

What you often remember about an article is stuff like: Did I bookmark it, when did I visit it, did I like/share/cite it on social media You can already filter by time, tags, domains, bookmarks, and soon also if you liked/shared or even seen it in your newsfeed, or on a friends wall, on Twitter and Facebook.

We gradually expand it so you can search with as much of your associative memories as possible.


I was imagining this for a long time. Wondering, how much more space/energy Chrome uses with this extensions enabled.

Any research on that?


https://worldbrain.io/#faq

Info on RAM/CPU/Storage, but no info on energy pressure.


Sounds like the very first version of Google Desktop Search before they turned it into a widget engine.


Nice, now you just need to enable a feature to search anyone else's browsing history


Oli here from the team building Memex :)

Good news: that's the plan! worldbrain.io/vision_deck


Sadly there is a lot to improve to be useful. I had horrible results with the searches.


Ive been building my own for a while. Will definitely give this a go!


for researching on a topic and bundling them.


how do you solve the problem of banning / not a robot captchas after to many requests to a site while indexing?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: