Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Open Sourcing Vespa, Yahoo’s Big Data Processing and Serving Engine (vespa.ai)
66 points by martinp on Sept 26, 2017 | hide | past | favorite | 5 comments


There's a lot in there.

Cluster file distribution with bittorrent https://github.com/vespa-engine/vespa/tree/master/filedistri...


If someone was familiar with Vespa in 2011, but hasn't had access to it until now, what's new since then?


At Flickr, we worked closely with the Vespa team from 2011 through 2016 on a wide range of advancements:

   * partial document refeeding (i.e. expedite indexing a new field to 20+ billion documents without refeeding everything and staying online handling 100M+ free text queries a day)
   * visual similarity search - check out the tensor ranking features [1] [2]
   * online elasticity - add/remove replicas / shards online. A must when it could take weeks+ to re-feed from scratch. This is non-trivial to make work smoothly at scale. 
   * latency / tail-latency on complex queries. p90 reduction from 3,000 to 30 ms.
This is a major gift to the open-source community of a battle-tested search engine that works reliably without babysitting with very large datasets, and simultaneous high query / high feed volumes. Huge debt of gratitude to the team in Trondheim and Verizon/Oath/Yahoo legal & management teams for making this happen. :+1:

[1] http://docs.vespa.ai/documentation/tensor-intro.html [2] http://docs.vespa.ai/documentation/tensor-user-guide.html


Not precisely sure where we were in 2011, but I think these are the biggest ones that came after, off the top of my head (i.e sure to be missing something):

  - Merging content and index clusters to one to make index clusters elastic and auto-recovering on data loss.
  - Fully realtime writes.
  - Support more advanced machine-learned ranking through tensors.
  - Streaming (personal) search supporting a large write rate.
  - Document references.
  - WAND and RANK operators.
  - Rank features over multivalue text fields.
  - Predicate fields.
  - Lots and lots of performance work.


Powers bits of Flickr. Interesting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: