Highlights
🛠️ Fixed numerous bugs across indexing and retrievers
📏 Integrated RAGAS evaluation metrics
🧩 Added new RAG pipelines/configs
💻 Introduced CLI for indexing: rankify-index
Bug fixes (selected)
-
Lucene (BM25): stable JsonCollection wiring, index dir layout, robust load_index.
-
Contriever: fixed JSONL/TSV mismatch; chunked embedding generation; float32 normalization; safer serialization & cleanup.
-
BGE: correct CLS pooling + L2 normalization; cosine via IndexFlatIP; chunk merge validation.
-
ColBERT: deterministic collection.tsv with sequential IDs; original↔sequential ID mappings; TSV verification & better diagnostics; loader-based load.
-
ANCE: robust doc-id extraction across fields; consistent FAISS↔docid mapping; safer metadata writer.
-
DPR: reliable Pyserini encode/index orchestration; mapping & metadata persisted.