embeddinggemma-mimic-infonce

A 300M-parameter sentence embedding model fine-tuned from google/embeddinggemma-300m on temporal note pairs from MIMIC-III using a pure InfoNCE temporal contrastive objective. This is the temporal-only baseline within the project; a hierarchical-loss extension is available at gaspard-loeillot/embeddinggemma-mimic-hierarchical.

This model was produced as a CS 4701 Practicum in AI project at Cornell University (Spring 2026). It is a research artifact; it is not approved for any clinical use.

TL;DR

Metric	OpenAI text-embedding-3-small	OpenAI text-embedding-3-large	EmbeddingGemma-300m (vanilla)	InfoNCE fine-tuned (this model)	Hierarchical fine-tuned
Top-1 note recall	0.31%	0.31%	0.35%	0.84%	1.20%
Top-5 note recall	5.14%	5.45%	5.99%	47.17%	67.13%
Top-10 note recall	6.44%	6.99%	7.57%	66.68%	84.44%
Diagnosis macro-AUROC (top-25 ICD-9)	0.895	0.905	0.897	0.945	0.947
Silhouette by ICD chapter (cosine, k=5000)	-0.054	-0.045	-0.053	-0.057	-0.066
Silhouette by note category	+0.016	+0.043	-0.017	-0.089	-0.098
Silhouette delta (cat - chap)	+0.070	+0.089	+0.036	-0.032	-0.032

The bottom three rows are the most informative: every baseline organizes its embedding space more strongly by note category (style) than by ICD chapter (clinical content). After contrastive fine-tuning, the sign of this delta flips: embeddings now organize themselves more strongly by clinical content than by stylistic structure. On note recall this model achieves 47.17% top-5 — close to the 65% Radical Health baseline and ~9x the vanilla EmbeddingGemma starting point — using only the temporal positive signal (no hierarchical labels). The hierarchical extension closes the remaining gap.

Intended use

Patient-record search and retrieval over clinical notes.
Patient-similarity / cohort discovery.
Clinical-trajectory analysis.
A drop-in replacement for general-purpose embedding APIs in research RAG pipelines on EHR-like text.

Out-of-scope use

Any clinical decision support, diagnostic, or therapeutic application.
Identifying or re-identifying patients.
Use on data outside the MIMIC-III DUA without independent ethics approval.

Training details

Base model. google/embeddinggemma-300m (300M parameters; 768-dimensional output via the SentenceTransformer pooling + dense pipeline).

Corpus. MIMIC-III v1.4 NOTEEVENTS, restricted to the 10,000-patient Kaggle subset, then further sub-sampled to 500 patients (23,657 temporal note pairs) for training compute. The 500-patient subset reflects the team's realized GPU budget and is a known limitation of the released model.

Loss. Standard InfoNCE temporal contrastive loss with in-batch negatives. For a batch of B (anchor, positive) note pairs:

logits[i, j] = (anchor_i · positive_j^T) / temperature
loss         = cross_entropy(logits, labels=arange(B))

The anchor is a patient note at time t, the positive is the same patient's note at time t+1, and negatives are the other B-1 positives in the batch (notes from different patients). This is the same temporal contrastive setup used by Radical Health AI in their MIMIC-III work.

Hyperparameters.

temperature = 0.07
batch size 32, AdamW with lr = 2e-5, cosine LR schedule, gradient clipping at L2 norm 1.0
max sequence length 256 (CPU-fallback constraint; see Compute notes)
5 training epochs

Compute notes. Training was performed on Apple MPS / CPU and Google Colab T4. Apple MPS could not fit the full 512-token training graph for EmbeddingGemma-300m; the team fell back to CPU at max_length=256. This is reflected in absolute recall numbers and should be a target of any further work.

Usage

from sentence_transformers import SentenceTransformer

m = SentenceTransformer("gaspard-loeillot/embeddinggemma-mimic-infonce")
embeddings = m.encode(["clinical note text..."], normalize_embeddings=True)

For retrieval at scale, use FAISS:

import faiss, numpy as np

corpus_emb = m.encode(corpus_notes, normalize_embeddings=True, convert_to_numpy=True).astype("float32")
index = faiss.IndexFlatIP(corpus_emb.shape[1])
index.add(corpus_emb)
query_emb = m.encode([query], normalize_embeddings=True, convert_to_numpy=True).astype("float32")
sims, idx = index.search(query_emb, k=10)

Evaluation methodology

See the hierarchical companion model card for shared methodology details. All five evaluated models share identical inputs, splits, and evaluation code.

Limitations

Trained on 500 patients, not 10,000+. This is the realized compute budget, not the final design intent.
256-token context cap. Long notes are truncated.
Within-patient retrieval gap to hierarchical variant. This model achieves 47% top-5 recall vs. 67% for the hierarchical extension. If retrieval is the primary downstream use, the hierarchical model is preferred.
Demographic and institutional skew. MIMIC-III is a single ICU at a single tertiary care center over 2001-2012. Generalization outside this distribution is not validated.
Not certified for any clinical use.

Citation

@misc{shvartsman_lin_loeillot_2026_embeddinggemma_mimic_infonce,
  author       = {Shvartsman, Benjamin and Lin, Timothy and Loeillot, Gaspard},
  title        = {EmbeddingGemma fine-tuned on MIMIC-III with InfoNCE temporal
                  contrastive learning},
  year         = {2026},
  howpublished = {Cornell CS 4701 Practicum in AI project},
  note         = {\url{https://huggingface.co/gaspard-loeillot/embeddinggemma-mimic-infonce}}
}

If you use the base model, please also cite EmbeddingGemma:

@article{embedding_gemma_2025,
  title={EmbeddingGemma: Powerful and Lightweight Text Representations},
  author={Schechter Vera, Henrique and others},
  publisher={Google DeepMind},
  year={2025},
  url={https://arxiv.org/abs/2509.20354}
}

And, if you fine-tune on similar data, the underlying clinical resource:

Johnson AEW, Pollard TJ, Shen L, Lehman LH, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Scientific Data, 2016. doi:10.1038/sdata.2016.35

Acknowledgements

This project replicates and extends the contrastive fine-tuning approach described by Radical Health AI in "Training a model that understands your notes 7x better than OpenAI" (2025). All errors are our own.

Downloads last month: 18

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for gaspard-loeillot/embeddinggemma-mimic-infonce

Base model

google/embeddinggemma-300m

Finetuned

(236)

this model

Paper for gaspard-loeillot/embeddinggemma-mimic-infonce

EmbeddingGemma: Powerful and Lightweight Text Representations

Paper • 2509.20354 • Published Sep 24, 2025 • 49