If you haven't heard of Brave Goggles (https://github.com/brave/goggles-quickstart) I highly recommend checking it out. Just being able to create the search index is a massive task, so being able to apply rules server-side to their "expanded recall set" will give you what most people building search engines want, which is to control the algorithm. We weren't able to do that until now since applying rules client-side doesn't work well on a small search result set.
Seems like you're burying the lead a bit since your "Basic Usage" involves running some Docker instance for some reason and you don't need to do that just to try it out?
It looks like Goggles are just text files hosted on GitHub or GitLab and you can try them out with Brave's search engine without installing anything. Some to try:
Kagi is a weird beast. I'd like to use it but I also don't understand how searches are private if I have to login. Not understanding that is definitely on me but I feel like it should be a frequent enough question that they try to make the answer obvious.
There's not much to understand about it, you just have to take their word for it that they're not storing anything. There's no way for the clients to enforce that, and I can't figure out any model that they could use to have a paid search engine that can guarantee privacy with zero trust. (Edit: a sibling points out Mullvad's payment model would help towards guaranteeing pseudonymity)
I pay for Kagi less as a privacy thing and more because it's a better search engine and consistently gets me better results than Google does.
This is an absolutely valid concern. The answer here, unfortunate as it is, is trust. There's probably things they can do here to remove the need for an actual identity (Mullvad's payment model comes to mind), but I suspect that takes more legal/financial infrastructure than they have access to at the moment.
PS: I am not a lawyer, but for what it's worth, if you _pay_ for Kagi and they advertise their privacy policy as a feature you pay for (see https://kagi.com/privacy), one could reasonable sue for false advertising or demand a refund for not providing a promised service. This would make any pushback subject to consumer protection laws in addition to whatever nascent privacy law they may be subject to.
I thought Brave was just a web browser with built-in adblock, but after your comment I decided to look it up on wikipedia. Holey moley, what a nightmare.
It's by far the best browser out there. Based on Chromium. Doesn't allow websites to abuse its technical capabilities (i.e. share your data via 3rd party ad servers). You can tip content creators. You can partake in the community and all the things Brave is building, or just use the browser in its default state. If you want a crypto + search laughing stock, check out https://coinmarketcap.com/currencies/presearch/ .
Same. I wonder what the web would look like now if not for Mozilla fighting for open web standards.
I guess I don't really have to wonder. I could just factory reset any Pixel phone, swipe to the news feed, and click on any story. It's literally hard to find the content (drivel) between the advertisements and screen hijacking bullshit.
yeah, multiple times i have asked people "why isnt brave a fork of firefox" and they say "its pita" and you need quick and easy way to customize, blah blah blah.
then you have half a dozen firefox forks running absolutely fine, each one has its good and bad but the thing is, if these 1-2 man outfits can fork and release their own firefox forks with additional features/customizations, why cant brave? or do they don't want to do the actual leg work and only want to take praise for "doing best", eh
sorry for not responding earlier..... my initial question still remains.
there is a "The benefits of moving from Muon to Chromium" so if you replace the word chromium with firefox, do you think any line would be wrong or a lie?
extensions are available in firefox just as easily so that sounds like a lame excuse.
I don't see how it's a scam or ponzi scheme. It's completely opt in. You don't put in any money yourself. There is no referral nonsense.
It's just a little token they pay you with should you decide to turn on ads. It's actually a rather interesting cryptocurrency given that it's actually being used for something.
BAT is even traded at Binance. At the height of the cryptocurrency bull market, it was worth almost 2 US dollars.
You can define some basic rules & it'll go out and crawl those particular sites. Or use one that someone else has built. It can also sync with your Chrome/Firefox bookmarks. Would love feedback from folks who get a chance to use it !
It's interesting that this uses a distributed P2P index. That's a very good idea and one of the things that has held me back from even thinking about trying to build my own tech-focused search engine.
One thing I was hoping to see in the FAQ was how they prevent rogue nodes from inserting spam or other kinds of mischief into the public index.
I hesitate to mention it, because I had a bad experience with them and it involves "crypto," which is always a hot-button, but https://presearch.com is also playing in the distributed search engine space, and their approach actually seems reasonably well thought out <https://www.presearch.io/vision.pdf>. Unfortunately that paper does not seem to address your rogue nodes observation, which makes me doubt their success beyond their availability issues when I tried it. All of that is above and beyond the "download this closed source binary, trust us!" which noped me right out from considering running a node
I doubt that it is a problem yet but if I were i charge of building a solution to said problem I would try to build a distributed trust system where bad nodes could be flagged and that flag spread to the rest of the network. Those that trust your node would lower the trust ranking of the flagged nodes the more flags against them the lower the ranking the rouge node would get.
Unless node identity were stable, which doesn't match my mental model of "crypto," then flagging Node 0x42e7a28dd454 doesn't prevent the Node owner from just starting a new one. See also: reddit spam accounts
For something as dynamic as "fetch HTML into the index," that actually turns into a really hard problem to solve since there's for sure not one agreed upon "correct" answer, and even with "the html must differ only by ${foo}%" or whatever, the amount a malicious actor would need to alter HTML to achieve bad outcomes is actually comparatively small
Depending on your requirements, you may be able to do it with a framework of your choice, a simple crawler and Elasticsearch.
I'm basically doing just that for my little side project with Symfony + DomCrawler + Elastic. But i'm only crawling the home page and the results are manually curated.
Use this as a personal knowledge base. Indexed my blog. Indexed a bookmarks export. Indexed a knowledge base. Works well. It also convinced me of power user ui
I keep everything on my home server: photos, music, home videos, movies, downloaded webpages, ebooks, instruction manuals, etc., all shared out over HTTP. Yacy basically gives me a centralized, private search engine for my house.
Example searches:
"Frigidaire manual"
"living room collection:Photos"
"London Philharmonic Orchestra collection:Music"
Of course, having things in an organized hierarchical file system, with good metadata, helps.
I really liked that one time, when recoll brought up more than 10 year old photos I had mostly forgotten about. I had searched the name of a person and a few photos still had exif tags in them from when I had used Picasa back then to tag them with the person's name.
Do you expose things like photos/videos/pdfs through a public web server? I thought about using Yacy for something similar, but wasn't sure about wanting to just leave everything available unauthenticated on the local network.
I've configured my web server so that it's accessible over the Internet, but only if you have a client certificate issued from my certificate authority.
> Adds the current webpage to the local YACY index. This assumes you are running YaCy in localhost:8090.
Is there a way to use this plugin with an instance of Yacy hosted somewhere other than localhost? I tried making a bookmarklet to do something similar, but never ended up getting it working.
They seem to have a configuration option for it: https://github.com/tecoholic/yacy-it/blob/main/options.html#... but that file isn't present in the most recent tag, so it's possible it just needs a release or you'd need to build the extension from source
The recent version supports configuring your own host from the extention's settings page. You don't have to build your own, just install from Firefox Add-ons site
That's what I'm doing - exporting bookmarks, links from my notes (markdown, etc), HN/Reddit upvoted links, starred github projects, etc - and then having YaCy index them.
In theory, this means I can use this as a search engine of things I found interesting / potentially useful.
In practice, I never search it, but that's more a limitation on my workflows than anything else.
I love the idea of this, but I tried to spin up my own instance and was immediately overwhelmed by the million little knobs and settings for it.
It seems like a lot of fun if you understand all the tuning, but I feel like the current state alienates most users who want to use it in simple scenarios.
Default settings works well enough but I agree 90% should be hidden behind an advanced settings check box. (I suspect the organization of features is more obvious in German.) There are also lots of other cool things one can do that are not in the interface but arguably should be.
That said, for what it is it is pretty epic already. As a proof of concept it's completely convincing.
There are lots of settings because it's very powerful software. I don't understand the part about being overwhelmed... surely the developers have chosen sane defaults for most things and you can just ignore the ones you don't understand?
That wasn't my experience. YaCy didn't do what I wanted out of the box, so I was just left with 100+ settings that I didn't know how to adjust to get to a desired state.
Recently installed YaCy on my Synology via docker image the provide. Already saved about 10Gb of content interesting to me. Now, I have a personal Search Engine. Awesome.
Yeah, if it is just one articles or a blog post I crawl at depth 0, and if it is someone's personal website who I enjoy reading always, no matter what they write, I do an infinite crawl on that specific domain.
Off-topic, but how do you like Synology? I'm familiar with one of their units for work, but I'm looking into a new NAS for my home, and I'm trying to decide between Synology or building my own and putting Nextcloud on it.
Also not OP. I've got a Synology 918+ that I've used for years, and as a file store, I'm quite pleased.
I've tried running apps on it, and the ones that are available are decent, but I pretty quickly got to where I needed to SSH in to make certain things happen, and that felt weird for an appliance like this. I added Docker and ran a bunch of stuff on that, and that was kind of a pain. They don't make it easy to update the images and the community's solution is to SSH in and install watchtower to do it.
I'm now just using it for network file storage and running all those services on a Linux box instead.
I thought about just putting the drives in the Linux box, but I did some network testing and the NAS was faster, and it provides a lot of storage-related niceties, so I'm keeping it in the mix. For instance, I recently decided to upgrade the drives to faster, larger ones, and it's been pretty easy.
Thanks! So are you running the first-party Synology Drive, Moments, etc. for file/photo syncing, or do you run something like Nextcloud on your Linux box? Or do you not use software like that?
I'm not using that kind of stuff. Mine is mostly about video with a little sorta-backup for files that don't matter a ton, but I'd still rather not lose.
Not OP, but I've been using a Synology NAS since 2013 and it's a great product. I bought a router from them as well, which is also superb. I think it's a fabulous investment.
Expectations: file/photo sync, media server, ad blocking (Pi-hole). I saw that Synology has first-party apps for most of this (Synology Drive, Moments, Video).
It's photo app is NPM garbage. Sure, I have an ancient ds115j (armv7@800Mhz, 256Mb RAM), but I couldn't use it.
But despite it's ancient-ness I was able to update it to the latest Synology OS (DSM), which tells how S. is supporting their products.
So be sure to get a version with enough RAM, I would go for 2Gb+ versions in your place, so avoid ds220j and ds218play, look at ds218 (without 'play' suffix) or ds220+. Oops, their site says VMM is supported only on DS220+, so I think you have no choice there, but 220+ has an additional memory slot, which could be handy.
For literally $300[1] you can't do better.
However, there is a way to install DSM on non-Synology hardware, so if have a desktop PC to run it I would advise for you to test it out.
I used a small Synology NAS from 2012-2019, at which point I replaced it with small linux box because I wanted ZFS. Inability to support ZFS was really the only reason I replaced it; it was still working fine.
Vanilla Ubuntu 18.04 LTS. Every couple of months or so I update all the packages and reboot. That's really all the maintenance I've ever done on it, apart from initial setup. I ought to set it up so that it can email me if a zfs scrub ever detects a problem, but I haven't done that yet.
Not OP but counter to what everyone else said, I don’t like mine. I bought the cheapest one available when it was on sale. It validated I wanted a NAS but it was too weak. Any usage would be slow and all the apps dragged it down. The apps are nice with how easy it is to install and get working, but if you wanna use it as a server not just NAS… you get what you pay for.
I still use it but plan to build a server this winter and gift the synology yo my father.
I like Synology a lot but mainly use it for storage/backup. It's a very expensive way to host containers IMO. I would look to a Mac Mini for something like that.
Thanks. My main use-cases are file/photo syncing, media server, etc. I saw that Synology has first-party apps for those things, so that would be the main draw for me.
Even as a media server I personally find it too expensive to buy one that can handle 4K transcoding, frequently needed for subtitles. I just use Synology to server the files and run Plex on a separate machine.
I would like to use this. However, in the past when I've tried it I didn't like the results. It would be nice to hear about more competition in the P2P information retrieval (search engine) tech space. YaCy seems to be the only one I've consistently heard about over the years.
Has anyone tried LinkAce? I'd love to hear someone's thoughts on YaCy vs LinkAce.
This is great timing. After looking at YaCy for my Synology NAS a few week ago, I looked at some alternatives. I like the look of LinkAce, though it seems to be less popular and I haven't found much on how a setup on a Synology NAS works.
I'd love some advice, I have a massive number of bookmarks across dozens of folders. Something like this is exactly what I'm looking for.
They serve very different purposes. While a search engine in turn can archives sites it isn't the only purpose. LinkAce is designed more for bookmarking and archiving sites akin to a bookmark manager, not as a search engine.
Copernic used to be a great way to do this. Register every search engine you like in the local software, apply rules, search all the web search engines at once. Until they went 100% corporate, it was awesome.
I have about 100,000 PDFs that I want indexed and searchable. They're on a website and I want people to be able to visit the website and search through the PDFs.
YaCy: Decentralized Web Search - https://news.ycombinator.com/item?id=22246732 - Feb 2020 (41 comments)
YaCy: a free distributed search engine - https://news.ycombinator.com/item?id=12433010 - Sept 2016 (24 comments)
YaCy – Peer to Peer Search Engine - https://news.ycombinator.com/item?id=11956268 - June 2016 (3 comments)
YaCy: Decentralized Web Search - https://news.ycombinator.com/item?id=8746883 - Dec 2014 (29 comments)
YaCy takes on Google with open source search engine - https://news.ycombinator.com/item?id=3288586 - Nov 2011 (17 comments)