lvwerra's picture
lvwerra HF Staff
Add beautiful biomedical text retrieval resource page
a26a85b verified
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Bio Text Retrieval — Resources</title>
<style>
@import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap');
:root {
--bg: #0a0a0b;
--surface: #111113;
--surface-2: #18181b;
--border: #27272a;
--border-hover: #3f3f46;
--text: #fafafa;
--text-2: #a1a1aa;
--text-3: #71717a;
--accent: #6ee7b7;
--accent-dim: rgba(110, 231, 183, 0.1);
--accent-2: #67e8f9;
--accent-3: #c4b5fd;
--accent-4: #fda4af;
--accent-5: #fcd34d;
--radius: 12px;
--radius-sm: 8px;
}
* { margin: 0; padding: 0; box-sizing: border-box; }
html { scroll-behavior: smooth; }
body {
font-family: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
background: var(--bg);
color: var(--text);
line-height: 1.6;
-webkit-font-smoothing: antialiased;
}
/* ── NAV ─────────────────────────── */
nav {
position: fixed; top: 0; left: 0; right: 0; z-index: 100;
background: rgba(10, 10, 11, 0.8);
backdrop-filter: blur(20px);
border-bottom: 1px solid var(--border);
padding: 0 2rem;
}
nav .inner {
max-width: 1200px; margin: 0 auto;
display: flex; align-items: center; justify-content: space-between;
height: 56px;
}
nav .logo {
font-weight: 600; font-size: 0.95rem; letter-spacing: -0.02em;
display: flex; align-items: center; gap: 8px;
}
nav .logo span { color: var(--accent); }
nav .links { display: flex; gap: 6px; }
nav .links a {
color: var(--text-2); text-decoration: none; font-size: 0.82rem;
font-weight: 450; padding: 6px 12px; border-radius: 6px;
transition: all 0.15s;
}
nav .links a:hover { color: var(--text); background: var(--surface-2); }
/* ── HERO ─────────────────────────── */
.hero {
padding: 140px 2rem 80px;
text-align: center;
position: relative;
overflow: hidden;
}
.hero::before {
content: '';
position: absolute; top: 60px; left: 50%; transform: translateX(-50%);
width: 600px; height: 400px;
background: radial-gradient(ellipse, rgba(110,231,183,0.06) 0%, transparent 70%);
pointer-events: none;
}
.hero h1 {
font-size: clamp(2.4rem, 5vw, 3.8rem);
font-weight: 700;
letter-spacing: -0.04em;
line-height: 1.1;
margin-bottom: 1rem;
}
.hero h1 em {
font-style: normal;
background: linear-gradient(135deg, var(--accent), var(--accent-2));
-webkit-background-clip: text; -webkit-text-fill-color: transparent;
}
.hero p {
color: var(--text-2); font-size: 1.1rem; max-width: 560px;
margin: 0 auto 2rem; font-weight: 350;
}
.hero .tag-row {
display: flex; gap: 8px; justify-content: center; flex-wrap: wrap;
}
.tag {
font-size: 0.75rem; font-weight: 500; padding: 5px 12px;
border-radius: 100px; border: 1px solid var(--border);
color: var(--text-2); background: var(--surface);
}
/* ── SECTION ──────────────────────── */
section {
max-width: 1200px; margin: 0 auto;
padding: 60px 2rem 0;
}
.section-head {
margin-bottom: 2rem;
}
.section-head h2 {
font-size: 1.5rem; font-weight: 650; letter-spacing: -0.03em;
display: flex; align-items: center; gap: 10px;
}
.section-head h2 .icon {
width: 32px; height: 32px; border-radius: var(--radius-sm);
display: grid; place-items: center; font-size: 1rem;
}
.section-head p {
color: var(--text-3); font-size: 0.9rem; margin-top: 6px;
max-width: 600px;
}
/* ── TIMELINE ─────────────────────── */
.timeline {
position: relative;
padding-left: 32px;
}
.timeline::before {
content: '';
position: absolute; left: 7px; top: 0; bottom: 0; width: 2px;
background: linear-gradient(to bottom, var(--accent), var(--border) 80%, transparent);
}
.timeline-item {
position: relative;
margin-bottom: 2.5rem;
}
.timeline-item::before {
content: '';
position: absolute; left: -29px; top: 8px;
width: 10px; height: 10px; border-radius: 50%;
background: var(--accent);
box-shadow: 0 0 8px rgba(110,231,183,0.4);
}
.timeline-item .year {
font-family: 'JetBrains Mono', monospace;
font-size: 0.75rem; color: var(--accent); font-weight: 500;
margin-bottom: 4px;
}
.timeline-item h3 {
font-size: 1.05rem; font-weight: 600; letter-spacing: -0.02em;
margin-bottom: 4px;
}
.timeline-item h3 a {
color: var(--text); text-decoration: none;
transition: color 0.15s;
}
.timeline-item h3 a:hover { color: var(--accent); }
.timeline-item .desc {
color: var(--text-2); font-size: 0.88rem; line-height: 1.55;
}
.timeline-item .meta {
display: flex; gap: 8px; margin-top: 8px; flex-wrap: wrap;
}
.timeline-item .meta a {
font-size: 0.72rem; padding: 3px 10px; border-radius: 100px;
text-decoration: none; font-weight: 500;
border: 1px solid var(--border); color: var(--text-3);
transition: all 0.15s;
}
.timeline-item .meta a:hover {
border-color: var(--accent); color: var(--accent);
}
/* ── CARD GRID ────────────────────── */
.grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(320px, 1fr));
gap: 16px;
}
.card {
background: var(--surface);
border: 1px solid var(--border);
border-radius: var(--radius);
padding: 20px;
transition: border-color 0.2s, transform 0.2s;
text-decoration: none; color: inherit;
display: flex; flex-direction: column;
}
.card:hover {
border-color: var(--border-hover);
transform: translateY(-2px);
}
.card .card-top {
display: flex; justify-content: space-between; align-items: flex-start;
margin-bottom: 10px;
}
.card h3 {
font-size: 0.95rem; font-weight: 600; letter-spacing: -0.01em;
}
.card .badge {
font-size: 0.68rem; font-weight: 600; padding: 3px 9px;
border-radius: 100px; white-space: nowrap; flex-shrink: 0;
}
.badge-model { background: rgba(110,231,183,0.12); color: var(--accent); }
.badge-dataset { background: rgba(103,232,249,0.12); color: var(--accent-2); }
.badge-bench { background: rgba(196,181,253,0.12); color: var(--accent-3); }
.badge-paper { background: rgba(253,164,175,0.12); color: var(--accent-4); }
.badge-training { background: rgba(252,211,77,0.12); color: var(--accent-5); }
.card .desc {
color: var(--text-2); font-size: 0.84rem; flex: 1;
line-height: 1.5;
}
.card .card-footer {
margin-top: 12px; display: flex; gap: 6px; flex-wrap: wrap;
}
.card .pill {
font-family: 'JetBrains Mono', monospace;
font-size: 0.68rem; padding: 3px 8px; border-radius: 4px;
background: var(--surface-2); color: var(--text-3);
}
/* ── BENCHMARK TABLE ─────────────── */
.table-wrap {
overflow-x: auto;
border: 1px solid var(--border);
border-radius: var(--radius);
background: var(--surface);
}
table {
width: 100%; border-collapse: collapse;
font-size: 0.84rem;
}
thead th {
text-align: left; padding: 12px 16px;
font-weight: 600; font-size: 0.78rem; color: var(--text-3);
text-transform: uppercase; letter-spacing: 0.05em;
border-bottom: 1px solid var(--border);
position: sticky; top: 0; background: var(--surface);
}
tbody td {
padding: 11px 16px; border-bottom: 1px solid var(--border);
color: var(--text-2);
}
tbody tr:last-child td { border-bottom: none; }
tbody tr:hover { background: var(--surface-2); }
tbody td:first-child { font-weight: 500; color: var(--text); }
td a {
color: var(--accent); text-decoration: none;
}
td a:hover { text-decoration: underline; }
/* ── LEADERBOARD TABLE ───────────── */
.lb-rank {
font-family: 'JetBrains Mono', monospace;
font-weight: 600; color: var(--accent); font-size: 0.85rem;
}
.lb-score {
font-family: 'JetBrains Mono', monospace;
font-weight: 500; color: var(--accent-5);
}
/* ── RECIPE CARDS ────────────────── */
.recipe-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(360px, 1fr));
gap: 16px;
}
.recipe-card {
background: var(--surface);
border: 1px solid var(--border);
border-radius: var(--radius);
padding: 20px; position: relative;
overflow: hidden;
}
.recipe-card::before {
content: '';
position: absolute; top: 0; left: 0; right: 0; height: 3px;
}
.recipe-card:nth-child(1)::before { background: var(--accent); }
.recipe-card:nth-child(2)::before { background: var(--accent-2); }
.recipe-card:nth-child(3)::before { background: var(--accent-3); }
.recipe-card .rank {
font-family: 'JetBrains Mono', monospace;
font-size: 0.72rem; color: var(--text-3); margin-bottom: 6px;
}
.recipe-card h3 { font-size: 1rem; font-weight: 600; margin-bottom: 8px; }
.recipe-card .recipe-desc {
color: var(--text-2); font-size: 0.84rem; line-height: 1.5;
}
.recipe-card .recipe-result {
margin-top: 12px; padding: 10px 14px;
background: var(--surface-2); border-radius: var(--radius-sm);
font-family: 'JetBrains Mono', monospace;
font-size: 0.78rem; color: var(--accent);
}
/* ── PATH SECTION ────────────────── */
.path-list {
counter-reset: step;
}
.path-step {
display: flex; gap: 16px; margin-bottom: 1.5rem;
align-items: flex-start;
}
.path-step .num {
counter-increment: step;
width: 36px; height: 36px; border-radius: 50%;
background: var(--accent-dim);
border: 1px solid rgba(110,231,183,0.2);
display: grid; place-items: center;
font-family: 'JetBrains Mono', monospace;
font-size: 0.82rem; font-weight: 600;
color: var(--accent); flex-shrink: 0;
}
.path-step .content h3 {
font-size: 0.95rem; font-weight: 600;
margin-bottom: 4px;
}
.path-step .content p {
color: var(--text-2); font-size: 0.84rem;
}
.path-step .content a {
color: var(--accent); text-decoration: none;
}
.path-step .content a:hover { text-decoration: underline; }
/* ── FOOTER ───────────────────────── */
footer {
max-width: 1200px; margin: 80px auto 0;
padding: 30px 2rem;
border-top: 1px solid var(--border);
display: flex; justify-content: space-between; align-items: center;
flex-wrap: wrap; gap: 12px;
}
footer p {
color: var(--text-3); font-size: 0.78rem;
}
footer a { color: var(--accent); text-decoration: none; }
footer a:hover { text-decoration: underline; }
/* ── RESPONSIVE ───────────────────── */
@media (max-width: 640px) {
nav .links { display: none; }
.grid, .recipe-grid {
grid-template-columns: 1fr;
}
.hero { padding: 120px 1.5rem 50px; }
section { padding: 40px 1.5rem 0; }
}
</style>
</head>
<body>
<!-- NAV -->
<nav>
<div class="inner">
<div class="logo">🧬 <span>BioRetrieval</span></div>
<div class="links">
<a href="#evolution">Evolution</a>
<a href="#models">Models</a>
<a href="#benchmarks">Benchmarks</a>
<a href="#datasets">Datasets</a>
<a href="#leaderboard">Leaderboard</a>
<a href="#start">Get Started</a>
</div>
</div>
</nav>
<!-- HERO -->
<div class="hero">
<h1>Biomedical <em>Text Retrieval</em></h1>
<p>A curated map of papers, models, datasets, and benchmarks for dense retrieval in the biomedical domain.</p>
<div class="tag-row">
<span class="tag">15 papers</span>
<span class="tag">10+ models</span>
<span class="tag">12 benchmarks</span>
<span class="tag">7 training datasets</span>
</div>
</div>
<!-- EVOLUTION TIMELINE -->
<section id="evolution">
<div class="section-head">
<h2><span class="icon">📜</span> Evolution of BioRetrieval</h2>
<p>From domain pretraining to LLM-based retrievers — key milestones that shaped the field.</p>
</div>
<div class="timeline">
<div class="timeline-item">
<div class="year">2019</div>
<h3><a href="https://arxiv.org/abs/1901.08746">BioBERT</a></h3>
<div class="desc">First BERT fine-tuned on PubMed abstracts + PMC full texts. Proved domain pretraining consistently improves biomedical NER, RE, and QA. The baseline everything is measured against.</div>
<div class="meta">
<a href="https://hf.co/dmis-lab/biobert-v1.1">🤗 dmis-lab/biobert-v1.1</a>
<a href="https://arxiv.org/abs/1901.08746">arXiv</a>
</div>
</div>
<div class="timeline-item">
<div class="year">2019</div>
<h3><a href="https://arxiv.org/abs/1903.10676">SciBERT</a></h3>
<div class="desc">BERT pretrained on 1.14M scientific papers (18% biomed, 82% CS). Broader scientific vocabulary makes it competitive on retrieval and used as backbone for SLEDGE-Z (TREC-COVID SOTA).</div>
<div class="meta">
<a href="https://hf.co/allenai/scibert_scivocab_uncased">🤗 allenai/scibert</a>
<a href="https://arxiv.org/abs/1903.10676">arXiv</a>
</div>
</div>
<div class="timeline-item">
<div class="year">2020</div>
<h3><a href="https://arxiv.org/abs/2007.15779">PubMedBERT</a></h3>
<div class="desc">Showed pretraining from scratch on PubMed beats continual pretraining from general BERT. Introduced the BLURB benchmark. Became the de-facto backbone for biomedical retrieval fine-tuning.</div>
<div class="meta">
<a href="https://hf.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract">🤗 microsoft/PubMedBERT</a>
<a href="https://arxiv.org/abs/2007.15779">arXiv</a>
</div>
</div>
<div class="timeline-item">
<div class="year">2021</div>
<h3><a href="https://arxiv.org/abs/2010.11784">SapBERT</a></h3>
<div class="desc">Self-alignment pretraining using UMLS ontology + metric learning. SOTA on medical entity linking without task-specific supervision — the go-to for entity disambiguation and concept normalization.</div>
<div class="meta">
<a href="https://hf.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext-mean-token">🤗 cambridgeltl/SapBERT</a>
<a href="https://arxiv.org/abs/2010.11784">arXiv</a>
</div>
</div>
<div class="timeline-item">
<div class="year">2021</div>
<h3><a href="https://arxiv.org/abs/2104.08663">BEIR Benchmark</a></h3>
<div class="desc">18 datasets, 9 tasks — revealed that dense models trained on MS MARCO generalize poorly to biomedical domains. BM25 often wins out-of-distribution. The standard evaluation framework.</div>
<div class="meta">
<a href="https://github.com/beir-cellar/beir">GitHub</a>
<a href="https://arxiv.org/abs/2104.08663">arXiv · NeurIPS '21</a>
</div>
</div>
<div class="timeline-item">
<div class="year">2022</div>
<h3><a href="https://arxiv.org/abs/2203.15827">BioLinkBERT</a></h3>
<div class="desc">Pretrains via hyperlinked documents in the same context window + Document Relation Prediction. Excels at multi-hop biomedical reasoning (BioASQ, USMLE QA).</div>
<div class="meta">
<a href="https://hf.co/michiyasunaga/BioLinkBERT-large">🤗 michiyasunaga/BioLinkBERT-large</a>
<a href="https://arxiv.org/abs/2203.15827">arXiv · ACL '22</a>
</div>
</div>
<div class="timeline-item">
<div class="year">2023</div>
<h3><a href="https://arxiv.org/abs/2307.00589">MedCPT / BioCPT</a></h3>
<div class="desc">Trained on 255M PubMed user click logs via contrastive learning — zero-shot SOTA on 5 biomedical IR tasks. Released as query/article encoder pair + cross-encoder reranker. Click logs as free supervision.</div>
<div class="meta">
<a href="https://hf.co/ncbi/MedCPT-Query-Encoder">🤗 Query Encoder (382K↓)</a>
<a href="https://hf.co/ncbi/MedCPT-Article-Encoder">🤗 Article Encoder</a>
<a href="https://hf.co/ncbi/MedCPT-Cross-Encoder">🤗 Cross-Encoder</a>
<a href="https://arxiv.org/abs/2307.00589">arXiv</a>
</div>
</div>
<div class="timeline-item">
<div class="year">2023</div>
<h3><a href="https://arxiv.org/abs/2311.16075">BioLORD-2023</a></h3>
<div class="desc">Grounds biomedical concepts in UMLS definitions via multi-phase contrastive learning + LLM self-distillation + weight averaging. SOTA on MedSTS, MedNLI-S, EHR-Rel-B. Multilingual variants available.</div>
<div class="meta">
<a href="https://hf.co/FremyCompany/BioLORD-2023">🤗 FremyCompany/BioLORD-2023 (147K↓)</a>
<a href="https://arxiv.org/abs/2311.16075">arXiv · EMNLP '23</a>
</div>
</div>
<div class="timeline-item">
<div class="year">2024</div>
<h3><a href="https://arxiv.org/abs/2404.18443">BMRetriever</a></h3>
<div class="desc">LLM-based retriever: unsupervised contrastive pretraining on PubMed/textbooks/StatPearls, then instruction fine-tuning on 11 datasets. 410M model outperforms baselines 11.7× larger. 2B matches 5B+ models.</div>
<div class="meta">
<a href="https://hf.co/BMRetriever/BMRetriever-410M">🤗 410M</a>
<a href="https://hf.co/BMRetriever/BMRetriever-2B">🤗 2B</a>
<a href="https://hf.co/BMRetriever/BMRetriever-7B">🤗 7B</a>
<a href="https://arxiv.org/abs/2404.18443">arXiv · EMNLP '24</a>
</div>
</div>
<div class="timeline-item">
<div class="year">2024</div>
<h3><a href="https://arxiv.org/abs/2511.08029">BiCA</a></h3>
<div class="desc">Citation-aware hard negatives: 2-hop citation graphs from PubMed articles for semantic hard-negative mining. Fine-tunes GTE-small/base with only 20K examples — consistent BEIR + LoTTE gains.</div>
<div class="meta">
<a href="https://github.com/NiravBhattLab/BiCA">GitHub</a>
<a href="https://arxiv.org/abs/2511.08029">arXiv</a>
</div>
</div>
<div class="timeline-item">
<div class="year">2025</div>
<h3><a href="https://arxiv.org/abs/2507.19407">MedTE + MedTEB</a></h3>
<div class="desc">51-task medical embedding benchmark (classification, clustering, retrieval). MedTE model (GTE-Base fine-tuned on 7 medical corpora) achieves mean 0.578 vs 0.539 next-best. The new comprehensive eval standard.</div>
<div class="meta">
<a href="https://hf.co/MohammadKhodadad/MedTE">🤗 MohammadKhodadad/MedTE</a>
<a href="https://github.com/MohammadKhodadad/MedTEB">GitHub</a>
<a href="https://arxiv.org/abs/2507.19407">arXiv</a>
</div>
</div>
<div class="timeline-item">
<div class="year">2025</div>
<h3><a href="https://arxiv.org/abs/2604.15591">BioHiCL</a></h3>
<div class="desc">Hierarchical MeSH supervision: depth-weighted contrastive loss + LoRA on BGE models. 0.1B model achieves IR Avg 0.543, beating BMRetriever-1B. Best on NFCorpus and SCIDOCS. Current efficiency SOTA.</div>
<div class="meta">
<a href="https://hf.co/LunaLan07/BioHiCL-Base">🤗 BioHiCL-Base</a>
<a href="https://hf.co/LunaLan07/BioHiCL-Large">🤗 BioHiCL-Large</a>
<a href="https://arxiv.org/abs/2604.15591">arXiv</a>
</div>
</div>
</div>
</section>
<!-- MODELS -->
<section id="models">
<div class="section-head">
<h2><span class="icon">🧠</span> Models</h2>
<p>Production-ready retrieval models available on the Hugging Face Hub.</p>
</div>
<div class="grid">
<a class="card" href="https://hf.co/ncbi/MedCPT-Query-Encoder" target="_blank">
<div class="card-top">
<h3>MedCPT (Query + Article)</h3>
<span class="badge badge-model">Model</span>
</div>
<div class="desc">Asymmetric bi-encoder from NCBI, trained on 255M PubMed click logs. Separate query/article encoders + cross-encoder reranker. Zero-shot SOTA on biomedical IR.</div>
<div class="card-footer">
<span class="pill">382K ↓</span>
<span class="pill">BERT-base</span>
<span class="pill">ncbi</span>
</div>
</a>
<a class="card" href="https://hf.co/FremyCompany/BioLORD-2023" target="_blank">
<div class="card-top">
<h3>BioLORD-2023</h3>
<span class="badge badge-model">Model</span>
</div>
<div class="desc">UMLS-grounded sentence embeddings via multi-phase contrastive + LLM distillation + weight averaging. SOTA on clinical STS, entity linking, and concept similarity. Multilingual variants available.</div>
<div class="card-footer">
<span class="pill">147K ↓</span>
<span class="pill">MPNet</span>
<span class="pill">sentence-transformers</span>
</div>
</a>
<a class="card" href="https://hf.co/BMRetriever/BMRetriever-410M" target="_blank">
<div class="card-top">
<h3>BMRetriever (410M–7B)</h3>
<span class="badge badge-model">Model</span>
</div>
<div class="desc">LLM-based retriever family. Instruction-formatted queries + last-token pooling. 410M outperforms 5B+ baselines. Models at 410M (GPT-NeoX), 2B (Gemma), 7B (Mistral).</div>
<div class="card-footer">
<span class="pill">410M–7B</span>
<span class="pill">MIT</span>
<span class="pill">EMNLP '24</span>
</div>
</a>
<a class="card" href="https://hf.co/LunaLan07/BioHiCL-Base" target="_blank">
<div class="card-top">
<h3>BioHiCL</h3>
<span class="badge badge-model">Model</span>
</div>
<div class="desc">MeSH hierarchy-supervised BGE model. Depth-weighted contrastive loss + LoRA. 0.1B params achieves IR Avg 0.543, beating BMRetriever-1B. Best efficiency/performance ratio.</div>
<div class="card-footer">
<span class="pill">110M</span>
<span class="pill">BERT</span>
<span class="pill">2025</span>
</div>
</a>
<a class="card" href="https://hf.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext-mean-token" target="_blank">
<div class="card-top">
<h3>SapBERT</h3>
<span class="badge badge-model">Model</span>
</div>
<div class="desc">Self-alignment on UMLS synonyms via metric learning. Go-to model for medical entity linking and concept normalization. No task-specific labels needed.</div>
<div class="card-footer">
<span class="pill">457K ↓</span>
<span class="pill">PubMedBERT</span>
<span class="pill">NAACL '21</span>
</div>
</a>
<a class="card" href="https://hf.co/MohammadKhodadad/MedTE" target="_blank">
<div class="card-top">
<h3>MedTE</h3>
<span class="badge badge-model">Model</span>
</div>
<div class="desc">GTE-Base fine-tuned on 7 diverse medical corpora (PubMed, MIMIC-IV, ClinicalTrials, bioRxiv/medRxiv). Mean 0.578 on MedTEB — best medical general-purpose embedding model.</div>
<div class="card-footer">
<span class="pill">~110M</span>
<span class="pill">GTE-Base</span>
<span class="pill">2025</span>
</div>
</a>
<a class="card" href="https://hf.co/michiyasunaga/BioLinkBERT-large" target="_blank">
<div class="card-top">
<h3>BioLinkBERT</h3>
<span class="badge badge-model">Model</span>
</div>
<div class="desc">Document-link pretraining on PubMed hyperlinks. Excels at multi-hop biomedical reasoning — top performer on BioASQ QA and USMLE-style questions.</div>
<div class="card-footer">
<span class="pill">7.3K ↓</span>
<span class="pill">BERT-large</span>
<span class="pill">ACL '22</span>
</div>
</a>
<a class="card" href="https://hf.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract" target="_blank">
<div class="card-top">
<h3>PubMedBERT</h3>
<span class="badge badge-model">Model</span>
</div>
<div class="desc">The backbone model for biomedical fine-tuning. Pretrained from scratch on PubMed — not continued from general BERT. Foundation for MedCPT, SapBERT, and many others.</div>
<div class="card-footer">
<span class="pill">110M</span>
<span class="pill">microsoft</span>
<span class="pill">BLURB</span>
</div>
</a>
</div>
</section>
<!-- BENCHMARKS -->
<section id="benchmarks">
<div class="section-head">
<h2><span class="icon">📊</span> Benchmarks</h2>
<p>Standard evaluation suites for biomedical retrieval — use these to measure your models.</p>
</div>
<div class="table-wrap">
<table>
<thead>
<tr>
<th>Benchmark</th>
<th>Task</th>
<th>Domain</th>
<th>Scale</th>
<th>Metric</th>
<th>Link</th>
</tr>
</thead>
<tbody>
<tr>
<td>NFCorpus</td>
<td>Ad-hoc search</td>
<td>Nutrition / Medicine</td>
<td>323 queries · 3.6K docs</td>
<td>nDCG@10</td>
<td><a href="https://hf.co/datasets/BeIR/nfcorpus">🤗 BeIR/nfcorpus</a></td>
</tr>
<tr>
<td>TREC-COVID</td>
<td>Ad-hoc retrieval</td>
<td>COVID-19 / CORD-19</td>
<td>50 queries · 171K docs</td>
<td>nDCG@10</td>
<td><a href="https://hf.co/datasets/BeIR/trec-covid">🤗 BeIR/trec-covid</a></td>
</tr>
<tr>
<td>SciFact</td>
<td>Claim verification</td>
<td>Scientific claims</td>
<td>~300 queries · 5K abstracts</td>
<td>nDCG@10</td>
<td><a href="https://hf.co/datasets/BeIR/scifact">🤗 BeIR/scifact</a></td>
</tr>
<tr>
<td>BioASQ</td>
<td>QA retrieval</td>
<td>Biomedical QA</td>
<td>Varies annually</td>
<td>MAP, nDCG</td>
<td><a href="http://participants-area.bioasq.org/">bioasq.org</a></td>
</tr>
<tr>
<td>SCIDOCS</td>
<td>Document similarity</td>
<td>Scientific papers</td>
<td>1K queries · 25K docs</td>
<td>nDCG@10</td>
<td><a href="https://hf.co/datasets/BeIR/scidocs">🤗 BeIR/scidocs</a></td>
</tr>
<tr>
<td>BIOSSES</td>
<td>Sentence similarity</td>
<td>Biomedical</td>
<td>100 sentence pairs</td>
<td>Pearson r</td>
<td><a href="https://hf.co/datasets/tabilab/biosses">🤗 tabilab/biosses</a></td>
</tr>
<tr>
<td>PubMedQA</td>
<td>QA retrieval</td>
<td>PubMed abstracts</td>
<td>1K labeled</td>
<td>Accuracy</td>
<td><a href="https://hf.co/datasets/qiaojin/PubMedQA">🤗 PubMedQA</a></td>
</tr>
<tr>
<td>MedTEB</td>
<td>51 medical tasks</td>
<td>Pan-medical</td>
<td>Comprehensive</td>
<td>Multi-metric</td>
<td><a href="https://github.com/MohammadKhodadad/MedTEB">GitHub</a></td>
</tr>
<tr>
<td>R2MED</td>
<td>Reasoning retrieval</td>
<td>Clinical decision</td>
<td>Multi-type</td>
<td>nDCG@10</td>
<td><a href="https://arxiv.org/abs/2505.14558">arXiv</a></td>
</tr>
</tbody>
</table>
</div>
</section>
<!-- DATASETS -->
<section id="datasets">
<div class="section-head">
<h2><span class="icon">🗂️</span> Training Datasets</h2>
<p>Key corpora and labeled data for training biomedical retrieval models.</p>
</div>
<div class="grid">
<a class="card" href="https://hf.co/datasets/MedRAG/pubmed" target="_blank">
<div class="card-top">
<h3>MedRAG/pubmed</h3>
<span class="badge badge-dataset">Dataset</span>
</div>
<div class="desc">PubMed abstracts corpus — the core pretraining data for biomedical models. Used by BMRetriever, MedTE, and most domain-adapted models.</div>
<div class="card-footer">
<span class="pill">pretraining</span>
<span class="pill">abstracts</span>
</div>
</a>
<a class="card" href="https://hf.co/datasets/MedRAG/textbooks" target="_blank">
<div class="card-top">
<h3>MedRAG/textbooks</h3>
<span class="badge badge-dataset">Dataset</span>
</div>
<div class="desc">Medical textbook passages — high-quality, structured biomedical knowledge. Core fine-tuning data for BMRetriever and RAG applications.</div>
<div class="card-footer">
<span class="pill">fine-tuning</span>
<span class="pill">textbooks</span>
</div>
</a>
<a class="card" href="https://hf.co/datasets/MedRAG/statpearls" target="_blank">
<div class="card-top">
<h3>MedRAG/statpearls</h3>
<span class="badge badge-dataset">Dataset</span>
</div>
<div class="desc">StatPearls clinical reference articles — continuously updated clinical content used for retriever fine-tuning and medical Q&A.</div>
<div class="card-footer">
<span class="pill">clinical</span>
<span class="pill">fine-tuning</span>
</div>
</a>
<a class="card" href="https://hf.co/datasets/BMRetriever/biomed_retrieval_dataset" target="_blank">
<div class="card-top">
<h3>BMRetriever Training Mix</h3>
<span class="badge badge-training">Training</span>
</div>
<div class="desc">11-task instruction mixture for biomedical retrieval fine-tuning — query-document pairs spanning medical QA, entity linking, and scientific claim verification.</div>
<div class="card-footer">
<span class="pill">instruction</span>
<span class="pill">11 tasks</span>
</div>
</a>
<a class="card" href="https://hf.co/datasets/FremyCompany/BioLORD-Dataset" target="_blank">
<div class="card-top">
<h3>BioLORD Dataset</h3>
<span class="badge badge-dataset">Dataset</span>
</div>
<div class="desc">UMLS concept definition pairs for contrastive learning. Powers BioLORD-2023's clinical concept embeddings and medical entity similarity.</div>
<div class="card-footer">
<span class="pill">UMLS</span>
<span class="pill">contrastive</span>
</div>
</a>
<a class="card" href="https://hf.co/datasets/allenai/cord19" target="_blank">
<div class="card-top">
<h3>CORD-19</h3>
<span class="badge badge-dataset">Dataset</span>
</div>
<div class="desc">COVID-19 Open Research Dataset — 400K+ research papers. The corpus behind TREC-COVID, used for pandemic-era retrieval research and benchmarking.</div>
<div class="card-footer">
<span class="pill">400K+ papers</span>
<span class="pill">COVID-19</span>
</div>
</a>
</div>
</section>
<!-- LEADERBOARD -->
<section id="leaderboard">
<div class="section-head">
<h2><span class="icon">🏆</span> Training Recipes Leaderboard</h2>
<p>Ranked by result quality — the best published approaches for training a biomedical retriever.</p>
</div>
<div class="table-wrap">
<table>
<thead>
<tr>
<th>#</th>
<th>Model</th>
<th>Params</th>
<th>Training Recipe</th>
<th>Best Result</th>
<th>Paper</th>
</tr>
</thead>
<tbody>
<tr>
<td class="lb-rank">1</td>
<td>BioHiCL-Base</td>
<td>0.1B</td>
<td>BGE + MeSH hierarchy contrastive (depth-weighted) + LoRA</td>
<td class="lb-score">IR Avg 0.543, NFCorpus 0.379</td>
<td><a href="https://arxiv.org/abs/2604.15591">2604.15591</a></td>
</tr>
<tr>
<td class="lb-rank">2</td>
<td>BMRetriever-2B</td>
<td>2B</td>
<td>LLM + unsupervised contrastive on PubMed/textbooks + instruction FT</td>
<td class="lb-score">Matches 5B+ across 11 tasks</td>
<td><a href="https://arxiv.org/abs/2404.18443">2404.18443</a></td>
</tr>
<tr>
<td class="lb-rank">3</td>
<td>MedTE</td>
<td>~0.1B</td>
<td>GTE-Base + self-supervised contrastive on 7 medical corpora</td>
<td class="lb-score">MedTEB mean 0.578</td>
<td><a href="https://arxiv.org/abs/2507.19407">2507.19407</a></td>
</tr>
<tr>
<td class="lb-rank">4</td>
<td>BiCA-Base</td>
<td>~0.1B</td>
<td>GTE-Base + 2-hop citation hard negatives, 20K examples</td>
<td class="lb-score">Consistent BEIR + LoTTE ↑</td>
<td><a href="https://arxiv.org/abs/2511.08029">2511.08029</a></td>
</tr>
<tr>
<td class="lb-rank">5</td>
<td>MedCPT</td>
<td>~0.1B</td>
<td>PubMedBERT + 255M click-log contrastive (retriever + reranker)</td>
<td class="lb-score">Zero-shot SOTA on 5 bio IR tasks</td>
<td><a href="https://arxiv.org/abs/2307.00589">2307.00589</a></td>
</tr>
<tr>
<td class="lb-rank">6</td>
<td>BioLORD-2023</td>
<td>~0.1B</td>
<td>PubMedBERT + UMLS definitions contrastive + LLM distillation + WA</td>
<td class="lb-score">SOTA MedSTS, EHR-Rel-B</td>
<td><a href="https://arxiv.org/abs/2311.16075">2311.16075</a></td>
</tr>
</tbody>
</table>
</div>
</section>
<!-- GET STARTED -->
<section id="start">
<div class="section-head">
<h2><span class="icon">🚀</span> Get Started</h2>
<p>Recommended learning path for biomedical text retrieval.</p>
</div>
<div class="path-list">
<div class="path-step">
<div class="num">1</div>
<div class="content">
<h3>Understand the evaluation landscape</h3>
<p>Read the <a href="https://arxiv.org/abs/2104.08663">BEIR paper</a> to understand why domain generalization is hard. Run BM25 as your baseline on <a href="https://hf.co/datasets/BeIR/nfcorpus">NFCorpus</a> — it's surprisingly competitive and sets a meaningful floor.</p>
</div>
</div>
<div class="path-step">
<div class="num">2</div>
<div class="content">
<h3>Try a zero-shot retriever</h3>
<p>Use <a href="https://hf.co/ncbi/MedCPT-Query-Encoder">MedCPT</a> — the cleanest example of domain-specific contrastive pretraining. Separate query + article encoders make it intuitive. Evaluate on BEIR biomedical subsets.</p>
</div>
</div>
<div class="path-step">
<div class="num">3</div>
<div class="content">
<h3>Scale up with LLM retrievers</h3>
<p>Deploy <a href="https://hf.co/BMRetriever/BMRetriever-410M">BMRetriever-410M</a> for production — it outperforms models 11× larger. Use instruction-formatted queries with last-token pooling. The eval code is clean and well-documented.</p>
</div>
</div>
<div class="path-step">
<div class="num">4</div>
<div class="content">
<h3>Comprehensive evaluation</h3>
<p>Benchmark on <a href="https://github.com/MohammadKhodadad/MedTEB">MedTEB</a> — 51 medical embedding tasks, much broader than BEIR biomedical subsets alone. This is the new comprehensive standard (2025).</p>
</div>
</div>
<div class="path-step">
<div class="num">5</div>
<div class="content">
<h3>Fine-tune your own retriever</h3>
<p>Use <a href="https://arxiv.org/abs/2511.08029">BiCA's</a> citation-graph hard negatives for cheap, effective training data. Or <a href="https://arxiv.org/abs/2604.15591">BioHiCL's</a> MeSH hierarchy supervision — 0.1B params matching 1B+ models.</p>
</div>
</div>
</div>
</section>
<!-- FOOTER -->
<footer>
<p>Built with 🧬 as a resource hub for biomedical text retrieval research.</p>
<p>All linked resources are maintained by their respective authors. <a href="https://huggingface.co/lvwerra">@lvwerra</a></p>
</footer>
</body>
</html>