Clinical Reasoning Model (Test 1)
A fine-tuned version of Llama 3.2 3B Instruct trained to produce step-by-step diagnostic reasoning chains from clinical patient cases.
Purpose
This model was created for educational purposes only. It is designed to demonstrate how a language model can walk through the clinical reasoning process, connecting patient findings (history, physical exam, labs, imaging) to a final diagnosis in a structured, step-by-step format.
This model is NOT intended for clinical use, patient care, or medical decision-making.
What It Does
Given a patient case (chief complaint, history, exam findings, labs, and imaging), the model produces:
- A final diagnosis
- A numbered reasoning chain that explains how each piece of clinical evidence supports or leads to that diagnosis
Example
Input:
A patient presents with productive cough, fatigue, and chest congestion. History of prior TB treatment. Chest CT shows a thin-walled cavity in the right lower lobe with adjacent calcified granulomas and bronchiectasis.
Output:
FINAL DIAGNOSIS: Tuberculosis
Step 1: Cavities in the lungs are common in active tuberculosis, especially when the walls of the cavities are thin, indicating the possibility of active disease or reactivation of infection. Supporting evidence: Superior segment right lower lobe relatively thin-walled cavity
Step 2: The patient had been treated for tuberculosis several years earlier, which is important background information because tuberculosis can recur. Supporting evidence: TB treated years ago
Step 3: In patients with a history of tuberculosis, these symptoms may indicate activity or recurrence of tuberculosis. Supporting evidence: symptoms of URI including fatigue, productive cough, runny nose, and chest congestion
Training Details
Dataset
Trained on the DiReCT (Diagnostic Reasoning for Clinical Notes) dataset, which contains 511 clinical notes sourced from MIMIC-IV. Each note was annotated by physicians with structured diagnostic reasoning trees mapping clinical observations to final diagnoses.
The dataset covers 25 disease categories and 73 unique diagnoses, including:
- Acute Coronary Syndrome (NSTEMI, Unstable Angina)
- Heart Failure (HFrEF, HFpEF)
- Stroke (Hemorrhagic, Ischemic)
- Pulmonary Embolism
- Pneumonia
- COPD
- Multiple Sclerosis
- Tuberculosis
- Hypertension
- And many more
Training Configuration
| Parameter | Value |
|---|---|
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Method | SFT with LoRA (PEFT) |
| Quantization | 4-bit (NF4) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Learning rate | 3e-5 |
| Epochs | 3 |
| Batch size | 1 (effective 8 with gradient accumulation) |
| Precision | FP16 |
| Hardware | NVIDIA T4 (Google Colab) |
Training Results
The model trained for 3 epochs with a steady decrease in loss:
| Step | Training Loss |
|---|---|
| 10 | 22.38 |
| 30 | 19.23 |
| 50 | 17.03 |
| 70 | 15.23 |
| 90 | 15.08 |
| 110 | 15.07 |
| 130 | 14.57 |
| 150 | 13.90 |
| 170 | 14.35 |
| 180 | 13.71 |
Limitations
- Not for clinical use. This model is an educational experiment and should never be used for actual patient care or medical decision-making.
- Small training set. 511 cases is a modest dataset for fine-tuning. The model may not generalize well to diseases or presentations not represented in the training data.
- Small base model. Llama 3.2 3B is a relatively small model. Larger models would likely produce better reasoning.
- Biases. The training data comes from a single institution (MIMIC-IV / Beth Israel Deaconess Medical Center), so the model may reflect that institution's patient population and clinical practices.
- Hallucination risk. Like all language models, this model can generate plausible-sounding but incorrect medical reasoning.
Citation
If you use this model, please cite the DiReCT dataset:
@article{wang2024direct,
title={DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models},
author={Wang, Bowen and Chang, Jiuyang and Qian, Yiming and others},
journal={arXiv preprint arXiv:2408.01933},
year={2024}
}
@article{PhysioNet-mimic-iv-ext-direct-1.0.0,
author = {Wang, Bowen and Chang, Jiuyang and Qian, Yiming},
title = {{MIMIC-IV-Ext-DiReCT}},
journal = {{PhysioNet}},
year = {2025},
doi = {10.13026/yf96-kc87}
}
Contact
This model was created as a learning exercise in fine-tuning language models for medical education applications. Created by Arman Yalcin www.linkedin.com/in/arman8514581
Model tree for ploppy2/Clinical-Reasoning-Test1
Base model
meta-llama/Llama-3.2-3B-Instruct