Gemma-3-27B-MeditronFO

Gemma-3-27B-MeditronFO is a 27B-parameter medical specialist LLM, produced by supervised fine-tuning of Gemma-3-27B-IT on the Fully Open Meditron Corpus.

This model is part of the Fully Open Meditron family — the first end-to-end auditable pipeline for clinical LLMs, with open weights, open data, open training recipe, and clinician-vetted corpus construction.

Gemma-3-27B-MeditronFO is preferred over MedGemma-27B in 58.6% of Auto-MOOVE comparisons and scores higher on HealthBench Hard (47.15 vs 41.95) — despite being trained from a fully open pipeline.

Performance

Accuracy (%) on standard medical benchmarks. See the paper for full evaluation details, confidence intervals, and open-ended Auto-MOOVE results.

Benchmark Gemma-3-27B-IT Gemma-3-27B-MeditronFO Δ
MedMCQA 62.75 63.71 +0.96
MedQA 76.20 77.61 +1.41
PubMedQA 74.60 75.80 +1.20
MedXpertQA 16.69 18.00 +1.31
HealthBench Hard 45.78 47.15 +1.37
Average 55.20 56.45 +1.25

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "EPFLiGHT/Gemma-3-27B-MeditronFO"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "A 62-year-old woman presents with a three-day history of dyspnea on exertion and a productive cough. What is the differential diagnosis?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Training

  • Base model: Gemma-3-27B-IT
  • Corpus: Fully Open Meditron601k examples (150M tokens), aggregating eight public medical QA datasets with three clinician-vetted synthetic components: exam-style QA, guideline-grounded QA from 46,469 clinical practice guidelines, and open-ended clinical vignettes
  • Hardware: NVIDIA GH200 nodes
  • Framework: Axolotl with FSDP v2 / DeepSpeed ZeRO-3, Flash Attention 2, bf16 mixed precision
  • Decontamination: System-wide two-stage n-gram and token-alignment decontamination against all evaluation benchmarks

Full hyperparameters are in Appendix I of the paper.

Intended Use

Research only. This model is intended to support research on medical LLMs, auditing of clinical AI systems, and reproducibility of the Fully Open Meditron pipeline.

It is not validated for clinical deployment, individual patient advice, autonomous decision-making, or any other deployment-adjacent use. Conduct independent domain-specific safety evaluation before any such use.

Citation

If you use this model, please cite:

@misc{theimerlienhard2026fullyopenmeditronauditable,
  title         = {Fully Open Meditron: An Auditable Pipeline for Clinical LLMs},
  author        = {Xavier Theimer-Lienhard and Mushtaha El-Amin and Fay Elhassan and Sahaj Vaidya and Victor Cartier-Negadi and David Sasu and Lars Klein and Mary-Anne Hartley},
  year          = {2026},
  eprint        = {2605.16215},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
  url           = {https://arxiv.org/abs/2605.16215}
}

License

Released under the gemma license. This model is derived from Google's Gemma-3-27B-IT and is therefore subject to the Gemma Terms of Use. Note that while the Fully Open Meditron training pipeline (data, recipe, code) is fully open, the underlying Gemma-3 base is open-weight rather than fully open.

Downloads last month
71
Safetensors
Model size
29B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EPFLiGHT/Gemma-3-27B-MeditronFO

Finetuned
(421)
this model

Dataset used to train EPFLiGHT/Gemma-3-27B-MeditronFO

Collection including EPFLiGHT/Gemma-3-27B-MeditronFO

Paper for EPFLiGHT/Gemma-3-27B-MeditronFO