Open Sourcing Vespa, Yahoo’s Big Data Processing and Serving Engine

mkj · on Sept 27, 2017

There's a lot in there.

Cluster file distribution with bittorrent https://github.com/vespa-engine/vespa/tree/master/filedistri...

toast0 · on Sept 27, 2017

If someone was familiar with Vespa in 2011, but hasn't had access to it until now, what's new since then?

tedd4u · on Sept 27, 2017

At Flickr, we worked closely with the Vespa team from 2011 through 2016 on a wide range of advancements:

   * partial document refeeding (i.e. expedite indexing a new field to 20+ billion documents without refeeding everything and staying online handling 100M+ free text queries a day)
   * visual similarity search - check out the tensor ranking features [1] [2]
   * online elasticity - add/remove replicas / shards online. A must when it could take weeks+ to re-feed from scratch. This is non-trivial to make work smoothly at scale. 
   * latency / tail-latency on complex queries. p90 reduction from 3,000 to 30 ms.

This is a major gift to the open-source community of a battle-tested search engine that works reliably without babysitting with very large datasets, and simultaneous high query / high feed volumes. Huge debt of gratitude to the team in Trondheim and Verizon/Oath/Yahoo legal & management teams for making this happen. :+1:

[1] http://docs.vespa.ai/documentation/tensor-intro.html [2] http://docs.vespa.ai/documentation/tensor-user-guide.html

RealJon · on Sept 27, 2017

Not precisely sure where we were in 2011, but I think these are the biggest ones that came after, off the top of my head (i.e sure to be missing something):

  - Merging content and index clusters to one to make index clusters elastic and auto-recovering on data loss.
  - Fully realtime writes.
  - Support more advanced machine-learned ranking through tensors.
  - Streaming (personal) search supporting a large write rate.
  - Document references.
  - WAND and RANK operators.
  - Rank features over multivalue text fields.
  - Predicate fields.
  - Lots and lots of performance work.

groodt · on Sept 27, 2017

Powers bits of Flickr. Interesting.