That's has a major downgrade. For binary embeddings, the top 10 results are same...

amitness · 2024-12-26T09:08:25 1735204105

Try this trick that I learned from Cohere: - Fetch top 10*k (i.e. 100) results using the hamming distance - Rerank by taking dot product between query embedding (full precision) and binary doc embeddings - Show top-10 results after re-ranking

Quizzical4230 · 2024-12-26T13:13:41 1735218821

This is pretty cool. The dot product would give the unnormalized cosine similarity from a smaller pool. Thank you so much!

intalentive · 2024-12-25T18:21:39 1735150899

Recommend reranking. You basically get full resolution performance for a negligible latency hit. (Unless you need to make two network calls…)

MixedBread supports matryoshka embeddings too so that’s another option to explore on the latency-recall curve.

Quizzical4230 · 2024-12-26T05:15:58 1735190158

> Recommend reranking.

Will explore it thoroughly then!

> MixedBread supports matryoshka embeddings too so that’s another option to explore on the latency-recall curve.

Yes, exactly why I went with this model!