Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 5 days ago • 67
Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval Paper • 2604.04734 • Published 10 days ago • 11
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment Paper • 2604.05684 • Published 9 days ago • 9
ConTEB evaluation datasets Collection Evaluation datasets of the ConTEB benchmark. Use "test" split where available, otherwise "validation", otherwise "train". • 8 items • Updated Jun 2, 2025 • 3
Diffusion-Pretrained Dense and Contextual Embeddings Paper • 2602.11151 • Published Feb 11 • 23
view article Article Nano-BEIR: A Multilingual Information Retrieval Benchmark with Quality-Enhanced Queries Dec 22, 2025 • 9
🦢SWIM-IR Dataset [NAACL'24] Collection 29 million Synthetic Wikipedia-based Multilingual Retrieval Training Pairs. • 4 items • Updated Mar 31, 2025 • 8
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 165