You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To make LLM faster we need faster retrieval system. Here comes Embedding Quantization. Embedding quantization is great technique to save cost on Vector DB, significantly faster retrieval while preserving retrieval performance.
Unofficial Implementation of Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval and Evaluation of RAG system using "SEMALEX" evaluation metric .
ποΈ Compress and search large embedding datasets with Vectro+, a high-performance Rust toolkit for efficient similarity search and streaming compression.