Hacker News new | past | comments | ask | show | jobs | submit login

That's has a major downgrade. For binary embeddings, the top 10 results are same as fp32, albeit shuffled. However after the 10th result, I think quality degrades quite a bit. I was planning to add a reranking strategy for binary embeddings. What do you think?



Try this trick that I learned from Cohere: - Fetch top 10*k (i.e. 100) results using the hamming distance - Rerank by taking dot product between query embedding (full precision) and binary doc embeddings - Show top-10 results after re-ranking


This is pretty cool. The dot product would give the unnormalized cosine similarity from a smaller pool. Thank you so much!


Recommend reranking. You basically get full resolution performance for a negligible latency hit. (Unless you need to make two network calls…)

MixedBread supports matryoshka embeddings too so that’s another option to explore on the latency-recall curve.


> Recommend reranking.

Will explore it thoroughly then!

> MixedBread supports matryoshka embeddings too so that’s another option to explore on the latency-recall curve.

Yes, exactly why I went with this model!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: