Xkrilandar commited on
Commit
d265332
Β·
verified Β·
1 Parent(s): 242ed90

Add model card

Browse files
Files changed (1) hide show
  1. README.md +106 -0
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - medical
9
+ - clinical
10
+ - healthcare
11
+ - meditron
12
+ - fully-open
13
+ - medical-llm
14
+ base_model: allenai/OLMo-2-0325-32B-SFT
15
+ datasets:
16
+ - EPFLiGHT/fully-open-meditron
17
+ ---
18
+
19
+ # OLMo-2-32B-MeditronFO
20
+
21
+ **OLMo-2-32B-MeditronFO** is a 32B-parameter medical specialist LLM, produced by supervised fine-tuning of [OLMo-2-32B-SFT](https://huggingface.co/allenai/OLMo-2-0325-32B-SFT) on the [Fully Open Meditron Corpus](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron).
22
+
23
+ This model is part of the **Fully Open Meditron** family β€” the first end-to-end auditable pipeline for clinical LLMs, with open weights, open data, open training recipe, and clinician-vetted corpus construction.
24
+
25
+ > OLMo-2-32B-MeditronFO improves +5.26 points over its base on aggregate medical benchmarks while preserving general-purpose capability.
26
+
27
+ - πŸ“„ **Paper:** *Fully Open Meditron: An Auditable Pipeline for Clinical LLMs*
28
+ - πŸ’» **Code:** [github.com/EPFLiGHT/FullyOpenMeditron](https://github.com/EPFLiGHT/FullyOpenMeditron)
29
+ - πŸ“š **Collection:** [Fully Open Meditron on the Hub](https://huggingface.co/EPFLiGHT)
30
+ - πŸ—‚οΈ **Training corpus:** [EPFLiGHT/fully-open-meditron](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron)
31
+
32
+ ## Performance
33
+
34
+ Accuracy (%) on standard medical benchmarks. See the paper for full evaluation details, confidence intervals, and open-ended Auto-MOOVE results.
35
+
36
+ | Benchmark | OLMo-2-32B-SFT | **OLMo-2-32B-MeditronFO** | Ξ” |
37
+ |---|---:|---:|---:|
38
+ | MedMCQA | 59.10 | **57.83** | -1.27 |
39
+ | MedQA | 66.22 | **69.44** | +3.22 |
40
+ | PubMedQA | 72.00 | **76.60** | +4.60 |
41
+ | MedXpertQA | 13.02 | **17.96** | +4.94 |
42
+ | HealthBench Hard | 19.75 | **33.82** | +14.07 |
43
+ | **Average** | 45.88 | **51.13** | +5.25 |
44
+
45
+ ## Usage
46
+
47
+ ```python
48
+ from transformers import AutoModelForCausalLM, AutoTokenizer
49
+ import torch
50
+
51
+ model_id = "EPFLiGHT/OLMo-2-32B-MeditronFO"
52
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
53
+ model = AutoModelForCausalLM.from_pretrained(
54
+ model_id,
55
+ torch_dtype=torch.bfloat16,
56
+ device_map="auto",
57
+ )
58
+
59
+ messages = [
60
+ {"role": "user", "content": "A 62-year-old woman presents with a three-day history of dyspnea on exertion and a productive cough. What is the differential diagnosis?"},
61
+ ]
62
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
63
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
64
+
65
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
66
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
67
+ ```
68
+
69
+ ## Training
70
+
71
+ - **Base model:** [OLMo-2-32B-SFT](https://huggingface.co/allenai/OLMo-2-0325-32B-SFT)
72
+ - **Corpus:** [Fully Open Meditron](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron) β€” ~601k examples (~150M tokens), aggregating eight public medical QA datasets with three clinician-vetted synthetic components: exam-style QA, guideline-grounded QA from 46,469 clinical practice guidelines, and open-ended clinical vignettes
73
+ - **Hardware:** NVIDIA GH200 nodes
74
+ - **Framework:** Axolotl with FSDP v2 / DeepSpeed ZeRO-3, Flash Attention 2, bf16 mixed precision
75
+ - **Decontamination:** System-wide two-stage n-gram and token-alignment decontamination against all evaluation benchmarks
76
+
77
+ Full hyperparameters are in Appendix I of the paper.
78
+
79
+ ## Intended Use
80
+
81
+ **Research only.** This model is intended to support research on medical LLMs, auditing of clinical AI systems, and reproducibility of the Fully Open Meditron pipeline.
82
+
83
+ It is **not validated for clinical deployment, individual patient advice, autonomous decision-making, or any other deployment-adjacent use.** Conduct independent domain-specific safety evaluation before any such use.
84
+
85
+ ## Limitations
86
+
87
+ - Inherits limitations of the training corpus: ~64% of items are synthetic, generated by a single teacher model (gpt-oss-120b), introducing model-specific stylistic and reasoning biases.
88
+ - Decontamination is syntactic (n-gram and token alignment) rather than semantic, leaving open the possibility of paraphrased leakage.
89
+ - Some general-purpose instruction-following capability is degraded relative to the base model β€” a tradeoff common to medical-specialist fine-tuning.
90
+ - Predominantly English; non-English clinical content is underrepresented.
91
+ - Inherits geographic and demographic biases of the source datasets, partially mitigated by AfriMed-QA inclusion.
92
+ - May produce confidently incorrect outputs (hallucinations).
93
+
94
+ ## Citation
95
+
96
+ ```bibtex
97
+ @inproceedings{theimerlienhard2026meditron,
98
+ title={Fully Open Meditron: An Auditable Pipeline for Clinical LLMs},
99
+ author={Theimer-Lienhard, Xavier and El-Amin, Mushtaha and Elhassan, Fay and Vaidya, Sahaj and Cartier-Negadi, Victor and Sasu, David and Klein, Lars and Hartley, Mary-Anne},
100
+ year={2026}
101
+ }
102
+ ```
103
+
104
+ ## License
105
+
106
+ Released under the **apache-2.0** license. Permissive use including commercial, subject to attribution.