That's has a major downgrade. For binary embeddings, the top 10 results are same as fp32, albeit shuffled. However after the 10th result, I think quality degrades quite a bit. I was planning to add a reranking strategy for binary embeddings. What do you think?
Try this trick that I learned from Cohere:
- Fetch top 10*k (i.e. 100) results using the hamming distance
- Rerank by taking dot product between query embedding (full precision) and binary doc embeddings
- Show top-10 results after re-ranking