Instructions to use EPFLiGHT/OLMo-2-32B-MeditronFO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EPFLiGHT/OLMo-2-32B-MeditronFO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EPFLiGHT/OLMo-2-32B-MeditronFO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EPFLiGHT/OLMo-2-32B-MeditronFO")
model = AutoModelForCausalLM.from_pretrained("EPFLiGHT/OLMo-2-32B-MeditronFO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use EPFLiGHT/OLMo-2-32B-MeditronFO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EPFLiGHT/OLMo-2-32B-MeditronFO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EPFLiGHT/OLMo-2-32B-MeditronFO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EPFLiGHT/OLMo-2-32B-MeditronFO

SGLang

How to use EPFLiGHT/OLMo-2-32B-MeditronFO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EPFLiGHT/OLMo-2-32B-MeditronFO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EPFLiGHT/OLMo-2-32B-MeditronFO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EPFLiGHT/OLMo-2-32B-MeditronFO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EPFLiGHT/OLMo-2-32B-MeditronFO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use EPFLiGHT/OLMo-2-32B-MeditronFO with Docker Model Runner:
```
docker model run hf.co/EPFLiGHT/OLMo-2-32B-MeditronFO
```

OLMo-2-32B-MeditronFO / README.md

Xkrilandar

Update model card with paper link and citation

6d241a1 verified 2 days ago

preview code

raw

history blame contribute delete

4.54 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- medical
	- clinical
	- healthcare
	- meditron
	- fully-open
	- medical-llm
	base_model: allenai/OLMo-2-0325-32B-SFT
	base_model_relation: finetune
	datasets:
	- EPFLiGHT/fully-open-meditron
	---

	# OLMo-2-32B-MeditronFO

	OLMo-2-32B-MeditronFO is a 32B-parameter medical specialist LLM, produced by supervised fine-tuning of [OLMo-2-32B-SFT](https://huggingface.co/allenai/OLMo-2-0325-32B-SFT) on the [Fully Open Meditron Corpus](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron).

	This model is part of the Fully Open Meditron family — the first end-to-end auditable pipeline for clinical LLMs, with open weights, open data, open training recipe, and clinician-vetted corpus construction.

	> OLMo-2-32B-MeditronFO improves +5.26 points over its base on aggregate medical benchmarks while preserving general-purpose capability.

	- 📄 Paper: [Fully Open Meditron: An Auditable Pipeline for Clinical LLMs](https://arxiv.org/abs/2605.16215)
	- 💻 Code: [github.com/EPFLiGHT/FullyOpenMeditron](https://github.com/EPFLiGHT/FullyOpenMeditron)
	- 📚 Collection: [MeditronFO](https://huggingface.co/collections/EPFLiGHT/meditronfo)
	- 🗂️ Training corpus: [EPFLiGHT/fully-open-meditron](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron)

	## Performance

	Accuracy (%) on standard medical benchmarks. See the paper for full evaluation details, confidence intervals, and open-ended Auto-MOOVE results.

	\| Benchmark \| OLMo-2-32B-SFT \| OLMo-2-32B-MeditronFO \| Δ \|
	\|---\|---:\|---:\|---:\|
	\| MedMCQA \| 59.10 \| 57.83 \| -1.27 \|
	\| MedQA \| 66.22 \| 69.44 \| +3.22 \|
	\| PubMedQA \| 72.00 \| 76.60 \| +4.60 \|
	\| MedXpertQA \| 13.02 \| 17.96 \| +4.94 \|
	\| HealthBench Hard \| 19.75 \| 33.82 \| +14.07 \|
	\| Average \| 45.88 \| 51.13 \| +5.25 \|

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "EPFLiGHT/OLMo-2-32B-MeditronFO"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	messages = [
	{"role": "user", "content": "A 62-year-old woman presents with a three-day history of dyspnea on exertion and a productive cough. What is the differential diagnosis?"},
	]
	inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
	).to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
	print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
	```

	## Training

	- Base model: [OLMo-2-32B-SFT](https://huggingface.co/allenai/OLMo-2-0325-32B-SFT)
	- Corpus: [Fully Open Meditron](https://huggingface.co/datasets/EPFLiGHT/fully-open-meditron) — ~601k examples (~150M tokens), aggregating eight public medical QA datasets with three clinician-vetted synthetic components: exam-style QA, guideline-grounded QA from 46,469 clinical practice guidelines, and open-ended clinical vignettes
	- Hardware: NVIDIA GH200 nodes
	- Framework: Axolotl with FSDP v2 / DeepSpeed ZeRO-3, Flash Attention 2, bf16 mixed precision
	- Decontamination: System-wide two-stage n-gram and token-alignment decontamination against all evaluation benchmarks

	Full hyperparameters are in Appendix I of the paper.

	## Intended Use

	Research only. This model is intended to support research on medical LLMs, auditing of clinical AI systems, and reproducibility of the Fully Open Meditron pipeline.

	It is not validated for clinical deployment, individual patient advice, autonomous decision-making, or any other deployment-adjacent use. Conduct independent domain-specific safety evaluation before any such use.

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{theimerlienhard2026fullyopenmeditronauditable,
	title = {Fully Open Meditron: An Auditable Pipeline for Clinical LLMs},
	author = {Xavier Theimer-Lienhard and Mushtaha El-Amin and Fay Elhassan and Sahaj Vaidya and Victor Cartier-Negadi and David Sasu and Lars Klein and Mary-Anne Hartley},
	year = {2026},
	eprint = {2605.16215},
	archivePrefix = {arXiv},
	primaryClass = {cs.AI},
	url = {https://arxiv.org/abs/2605.16215}
	}
	```

	## License

	Released under the apache-2.0 license. Permissive use including commercial, subject to attribution.