Xkrilandar commited on
Commit
4ea41b7
Β·
verified Β·
1 Parent(s): 63b9488

Add model card

Browse files
Files changed (1) hide show
  1. README.md +106 -0
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - medical
9
+ - clinical
10
+ - healthcare
11
+ - meditron
12
+ - fully-open
13
+ - medical-llm
14
+ base_model: utter-project/EuroLLM-22B-Instruct
15
+ datasets:
16
+ - EPFLiGHT/fully-open-meditron
17
+ ---
18
+
19
+ # EuroLLM-22B-MeditronFO
20
+
21
+ **EuroLLM-22B-MeditronFO** is a 22B-parameter medical specialist LLM, produced by supervised fine-tuning of [EuroLLM-22B-Instruct](https://huggingface.co/utter-project/EuroLLM-22B-Instruct) on the [Fully Open Meditron Corpus](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron).
22
+
23
+ This model is part of the **Fully Open Meditron** family β€” the first end-to-end auditable pipeline for clinical LLMs, with open weights, open data, open training recipe, and clinician-vetted corpus construction.
24
+
25
+ > EuroLLM-22B-MeditronFO is preferred over its base in 67.2% of Auto-MOOVE pairwise comparisons (adjusted win rate).
26
+
27
+ - πŸ“„ **Paper:** *Fully Open Meditron: An Auditable Pipeline for Clinical LLMs*
28
+ - πŸ’» **Code:** [github.com/EPFLiGHT/FullyOpenMeditron](https://github.com/EPFLiGHT/FullyOpenMeditron)
29
+ - πŸ“š **Collection:** [Fully Open Meditron on the Hub](https://huggingface.co/EPFLiGHT)
30
+ - πŸ—‚οΈ **Training corpus:** [EPFLiGHT/fully-open-meditron](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron)
31
+
32
+ ## Performance
33
+
34
+ Accuracy (%) on standard medical benchmarks. See the paper for full evaluation details, confidence intervals, and open-ended Auto-MOOVE results.
35
+
36
+ | Benchmark | EuroLLM-22B-Instruct | **EuroLLM-22B-MeditronFO** | Ξ” |
37
+ |---|---:|---:|---:|
38
+ | MedMCQA | 54.94 | **54.79** | -0.15 |
39
+ | MedQA | 66.61 | **63.16** | -3.45 |
40
+ | PubMedQA | 73.60 | **78.00** | +4.40 |
41
+ | MedXpertQA | 14.61 | **14.61** | +0.00 |
42
+ | HealthBench Hard | 34.79 | **37.38** | +2.59 |
43
+ | **Average** | 48.91 | **49.59** | +0.68 |
44
+
45
+ ## Usage
46
+
47
+ ```python
48
+ from transformers import AutoModelForCausalLM, AutoTokenizer
49
+ import torch
50
+
51
+ model_id = "EPFLiGHT/EuroLLM-22B-MeditronFO"
52
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
53
+ model = AutoModelForCausalLM.from_pretrained(
54
+ model_id,
55
+ torch_dtype=torch.bfloat16,
56
+ device_map="auto",
57
+ )
58
+
59
+ messages = [
60
+ {"role": "user", "content": "A 62-year-old woman presents with a three-day history of dyspnea on exertion and a productive cough. What is the differential diagnosis?"},
61
+ ]
62
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
63
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
64
+
65
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
66
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
67
+ ```
68
+
69
+ ## Training
70
+
71
+ - **Base model:** [EuroLLM-22B-Instruct](https://huggingface.co/utter-project/EuroLLM-22B-Instruct)
72
+ - **Corpus:** [Fully Open Meditron](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron) β€” ~601k examples (~150M tokens), aggregating eight public medical QA datasets with three clinician-vetted synthetic components: exam-style QA, guideline-grounded QA from 46,469 clinical practice guidelines, and open-ended clinical vignettes
73
+ - **Hardware:** NVIDIA GH200 nodes
74
+ - **Framework:** Axolotl with FSDP v2 / DeepSpeed ZeRO-3, Flash Attention 2, bf16 mixed precision
75
+ - **Decontamination:** System-wide two-stage n-gram and token-alignment decontamination against all evaluation benchmarks
76
+
77
+ Full hyperparameters are in Appendix I of the paper.
78
+
79
+ ## Intended Use
80
+
81
+ **Research only.** This model is intended to support research on medical LLMs, auditing of clinical AI systems, and reproducibility of the Fully Open Meditron pipeline.
82
+
83
+ It is **not validated for clinical deployment, individual patient advice, autonomous decision-making, or any other deployment-adjacent use.** Conduct independent domain-specific safety evaluation before any such use.
84
+
85
+ ## Limitations
86
+
87
+ - Inherits limitations of the training corpus: ~64% of items are synthetic, generated by a single teacher model (gpt-oss-120b), introducing model-specific stylistic and reasoning biases.
88
+ - Decontamination is syntactic (n-gram and token alignment) rather than semantic, leaving open the possibility of paraphrased leakage.
89
+ - Some general-purpose instruction-following capability is degraded relative to the base model β€” a tradeoff common to medical-specialist fine-tuning.
90
+ - Predominantly English; non-English clinical content is underrepresented.
91
+ - Inherits geographic and demographic biases of the source datasets, partially mitigated by AfriMed-QA inclusion.
92
+ - May produce confidently incorrect outputs (hallucinations).
93
+
94
+ ## Citation
95
+
96
+ ```bibtex
97
+ @inproceedings{theimerlienhard2026meditron,
98
+ title={Fully Open Meditron: An Auditable Pipeline for Clinical LLMs},
99
+ author={Theimer-Lienhard, Xavier and El-Amin, Mushtaha and Elhassan, Fay and Vaidya, Sahaj and Cartier-Negadi, Victor and Sasu, David and Klein, Lars and Hartley, Mary-Anne},
100
+ year={2026}
101
+ }
102
+ ```
103
+
104
+ ## License
105
+
106
+ Released under the **apache-2.0** license. Permissive use including commercial, subject to attribution.