Xkrilandar commited on
Commit
2ea6940
·
verified ·
1 Parent(s): eb8ca44

Add model card

Browse files
Files changed (1) hide show
  1. README.md +106 -0
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - medical
9
+ - clinical
10
+ - healthcare
11
+ - meditron
12
+ - fully-open
13
+ - medical-llm
14
+ base_model: swiss-ai/Apertus-70B-Instruct-2509
15
+ datasets:
16
+ - EPFLiGHT/fully-open-meditron
17
+ ---
18
+
19
+ # Apertus-70B-MeditronFO
20
+
21
+ **Apertus-70B-MeditronFO** is a 70B-parameter medical specialist LLM, produced by supervised fine-tuning of [Apertus-70B-Instruct](https://huggingface.co/swiss-ai/Apertus-70B-Instruct-2509) on the [Fully Open Meditron Corpus](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron).
22
+
23
+ This model is part of the **Fully Open Meditron** family — the first end-to-end auditable pipeline for clinical LLMs, with open weights, open data, open training recipe, and clinician-vetted corpus construction.
24
+
25
+ > Apertus-70B-MeditronFO establishes a new state of the art among fully open medical LLMs, and is preferred over Llama-3.1-70B-Meditron in 96.6% of pairwise Auto-MOOVE comparisons.
26
+
27
+ - 📄 **Paper:** *Fully Open Meditron: An Auditable Pipeline for Clinical LLMs*
28
+ - 💻 **Code:** [github.com/EPFLiGHT/FullyOpenMeditron](https://github.com/EPFLiGHT/FullyOpenMeditron)
29
+ - 📚 **Collection:** [Fully Open Meditron on the Hub](https://huggingface.co/EPFLiGHT)
30
+ - 🗂️ **Training corpus:** [EPFLiGHT/fully-open-meditron](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron)
31
+
32
+ ## Performance
33
+
34
+ Accuracy (%) on standard medical benchmarks. See the paper for full evaluation details, confidence intervals, and open-ended Auto-MOOVE results.
35
+
36
+ | Benchmark | Apertus-70B-Instruct | **Apertus-70B-MeditronFO** | Δ |
37
+ |---|---:|---:|---:|
38
+ | MedMCQA | 52.43 | **56.32** | +3.89 |
39
+ | MedQA | 60.64 | **68.58** | +7.94 |
40
+ | PubMedQA | 66.80 | **75.20** | +8.40 |
41
+ | MedXpertQA | 12.33 | **16.90** | +4.57 |
42
+ | HealthBench Hard | 32.28 | **40.14** | +7.86 |
43
+ | **Average** | 44.90 | **51.43** | +6.53 |
44
+
45
+ ## Usage
46
+
47
+ ```python
48
+ from transformers import AutoModelForCausalLM, AutoTokenizer
49
+ import torch
50
+
51
+ model_id = "EPFLiGHT/Apertus-70B-MeditronFO"
52
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
53
+ model = AutoModelForCausalLM.from_pretrained(
54
+ model_id,
55
+ torch_dtype=torch.bfloat16,
56
+ device_map="auto",
57
+ )
58
+
59
+ messages = [
60
+ {"role": "user", "content": "A 62-year-old woman presents with a three-day history of dyspnea on exertion and a productive cough. What is the differential diagnosis?"},
61
+ ]
62
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
63
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
64
+
65
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
66
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
67
+ ```
68
+
69
+ ## Training
70
+
71
+ - **Base model:** [Apertus-70B-Instruct](https://huggingface.co/swiss-ai/Apertus-70B-Instruct-2509)
72
+ - **Corpus:** [Fully Open Meditron](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron) — ~601k examples (~150M tokens), aggregating eight public medical QA datasets with three clinician-vetted synthetic components: exam-style QA, guideline-grounded QA from 46,469 clinical practice guidelines, and open-ended clinical vignettes
73
+ - **Hardware:** NVIDIA GH200 nodes
74
+ - **Framework:** Axolotl with FSDP v2 / DeepSpeed ZeRO-3, Flash Attention 2, bf16 mixed precision
75
+ - **Decontamination:** System-wide two-stage n-gram and token-alignment decontamination against all evaluation benchmarks
76
+
77
+ Full hyperparameters are in Appendix I of the paper.
78
+
79
+ ## Intended Use
80
+
81
+ **Research only.** This model is intended to support research on medical LLMs, auditing of clinical AI systems, and reproducibility of the Fully Open Meditron pipeline.
82
+
83
+ It is **not validated for clinical deployment, individual patient advice, autonomous decision-making, or any other deployment-adjacent use.** Conduct independent domain-specific safety evaluation before any such use.
84
+
85
+ ## Limitations
86
+
87
+ - Inherits limitations of the training corpus: ~64% of items are synthetic, generated by a single teacher model (gpt-oss-120b), introducing model-specific stylistic and reasoning biases.
88
+ - Decontamination is syntactic (n-gram and token alignment) rather than semantic, leaving open the possibility of paraphrased leakage.
89
+ - Some general-purpose instruction-following capability is degraded relative to the base model — a tradeoff common to medical-specialist fine-tuning.
90
+ - Predominantly English; non-English clinical content is underrepresented.
91
+ - Inherits geographic and demographic biases of the source datasets, partially mitigated by AfriMed-QA inclusion.
92
+ - May produce confidently incorrect outputs (hallucinations).
93
+
94
+ ## Citation
95
+
96
+ ```bibtex
97
+ @inproceedings{theimerlienhard2026meditron,
98
+ title={Fully Open Meditron: An Auditable Pipeline for Clinical LLMs},
99
+ author={Theimer-Lienhard, Xavier and El-Amin, Mushtaha and Elhassan, Fay and Vaidya, Sahaj and Cartier-Negadi, Victor and Sasu, David and Klein, Lars and Hartley, Mary-Anne},
100
+ year={2026}
101
+ }
102
+ ```
103
+
104
+ ## License
105
+
106
+ Released under the **apache-2.0** license. Permissive use including commercial, subject to attribution.