Instructions to use gaspard-loeillot/embeddinggemma-mimic-infonce with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use gaspard-loeillot/embeddinggemma-mimic-infonce with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("gaspard-loeillot/embeddinggemma-mimic-infonce") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
embeddinggemma-mimic-infonce
A 300M-parameter sentence embedding model fine-tuned from
google/embeddinggemma-300m on
temporal note pairs from MIMIC-III using a pure InfoNCE temporal contrastive objective.
This is the temporal-only baseline within the project; a hierarchical-loss extension is
available at gaspard-loeillot/embeddinggemma-mimic-hierarchical.
This model was produced as a CS 4701 Practicum in AI project at Cornell University (Spring 2026). It is a research artifact; it is not approved for any clinical use.
TL;DR
| Metric | OpenAI text-embedding-3-small | OpenAI text-embedding-3-large | EmbeddingGemma-300m (vanilla) | InfoNCE fine-tuned (this model) | Hierarchical fine-tuned |
|---|---|---|---|---|---|
| Top-1 note recall | 0.31% | 0.31% | 0.35% | 0.84% | 1.20% |
| Top-5 note recall | 5.14% | 5.45% | 5.99% | 47.17% | 67.13% |
| Top-10 note recall | 6.44% | 6.99% | 7.57% | 66.68% | 84.44% |
| Diagnosis macro-AUROC (top-25 ICD-9) | 0.895 | 0.905 | 0.897 | 0.945 | 0.947 |
| Silhouette by ICD chapter (cosine, k=5000) | -0.054 | -0.045 | -0.053 | -0.057 | -0.066 |
| Silhouette by note category | +0.016 | +0.043 | -0.017 | -0.089 | -0.098 |
| Silhouette delta (cat - chap) | +0.070 | +0.089 | +0.036 | -0.032 | -0.032 |
The bottom three rows are the most informative: every baseline organizes its embedding space more strongly by note category (style) than by ICD chapter (clinical content). After contrastive fine-tuning, the sign of this delta flips: embeddings now organize themselves more strongly by clinical content than by stylistic structure. On note recall this model achieves 47.17% top-5 — close to the 65% Radical Health baseline and ~9x the vanilla EmbeddingGemma starting point — using only the temporal positive signal (no hierarchical labels). The hierarchical extension closes the remaining gap.
Intended use
- Patient-record search and retrieval over clinical notes.
- Patient-similarity / cohort discovery.
- Clinical-trajectory analysis.
- A drop-in replacement for general-purpose embedding APIs in research RAG pipelines on EHR-like text.
Out-of-scope use
- Any clinical decision support, diagnostic, or therapeutic application.
- Identifying or re-identifying patients.
- Use on data outside the MIMIC-III DUA without independent ethics approval.
Training details
Base model. google/embeddinggemma-300m (300M parameters; 768-dimensional output via the
SentenceTransformer pooling + dense pipeline).
Corpus. MIMIC-III v1.4 NOTEEVENTS, restricted to the 10,000-patient Kaggle subset, then further sub-sampled to 500 patients (23,657 temporal note pairs) for training compute. The 500-patient subset reflects the team's realized GPU budget and is a known limitation of the released model.
Loss. Standard InfoNCE temporal contrastive loss with in-batch negatives. For a batch
of B (anchor, positive) note pairs:
logits[i, j] = (anchor_i · positive_j^T) / temperature
loss = cross_entropy(logits, labels=arange(B))
The anchor is a patient note at time t, the positive is the same patient's note at
time t+1, and negatives are the other B-1 positives in the batch (notes from
different patients). This is the same temporal contrastive setup used by Radical Health
AI in their MIMIC-III work.
Hyperparameters.
temperature = 0.07- batch size
32, AdamW withlr = 2e-5, cosine LR schedule, gradient clipping at L2 norm 1.0 - max sequence length
256(CPU-fallback constraint; see Compute notes) - 5 training epochs
Compute notes. Training was performed on Apple MPS / CPU and Google Colab T4. Apple
MPS could not fit the full 512-token training graph for EmbeddingGemma-300m; the team
fell back to CPU at max_length=256. This is reflected in absolute recall numbers and
should be a target of any further work.
Usage
from sentence_transformers import SentenceTransformer
m = SentenceTransformer("gaspard-loeillot/embeddinggemma-mimic-infonce")
embeddings = m.encode(["clinical note text..."], normalize_embeddings=True)
For retrieval at scale, use FAISS:
import faiss, numpy as np
corpus_emb = m.encode(corpus_notes, normalize_embeddings=True, convert_to_numpy=True).astype("float32")
index = faiss.IndexFlatIP(corpus_emb.shape[1])
index.add(corpus_emb)
query_emb = m.encode([query], normalize_embeddings=True, convert_to_numpy=True).astype("float32")
sims, idx = index.search(query_emb, k=10)
Evaluation methodology
See the hierarchical companion model card for shared methodology details. All five evaluated models share identical inputs, splits, and evaluation code.
Limitations
- Trained on 500 patients, not 10,000+. This is the realized compute budget, not the final design intent.
- 256-token context cap. Long notes are truncated.
- Within-patient retrieval gap to hierarchical variant. This model achieves 47% top-5 recall vs. 67% for the hierarchical extension. If retrieval is the primary downstream use, the hierarchical model is preferred.
- Demographic and institutional skew. MIMIC-III is a single ICU at a single tertiary care center over 2001-2012. Generalization outside this distribution is not validated.
- Not certified for any clinical use.
Citation
@misc{shvartsman_lin_loeillot_2026_embeddinggemma_mimic_infonce,
author = {Shvartsman, Benjamin and Lin, Timothy and Loeillot, Gaspard},
title = {EmbeddingGemma fine-tuned on MIMIC-III with InfoNCE temporal
contrastive learning},
year = {2026},
howpublished = {Cornell CS 4701 Practicum in AI project},
note = {\url{https://huggingface.co/gaspard-loeillot/embeddinggemma-mimic-infonce}}
}
If you use the base model, please also cite EmbeddingGemma:
@article{embedding_gemma_2025,
title={EmbeddingGemma: Powerful and Lightweight Text Representations},
author={Schechter Vera, Henrique and others},
publisher={Google DeepMind},
year={2025},
url={https://arxiv.org/abs/2509.20354}
}
And, if you fine-tune on similar data, the underlying clinical resource:
Johnson AEW, Pollard TJ, Shen L, Lehman LH, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Scientific Data, 2016. doi:10.1038/sdata.2016.35
Acknowledgements
This project replicates and extends the contrastive fine-tuning approach described by Radical Health AI in "Training a model that understands your notes 7x better than OpenAI" (2025). All errors are our own.
- Downloads last month
- 18
Model tree for gaspard-loeillot/embeddinggemma-mimic-infonce
Base model
google/embeddinggemma-300m