Clinical Reasoning Model (Test 1)

A fine-tuned version of Llama 3.2 3B Instruct trained to produce step-by-step diagnostic reasoning chains from clinical patient cases.

Purpose

This model was created for educational purposes only. It is designed to demonstrate how a language model can walk through the clinical reasoning process, connecting patient findings (history, physical exam, labs, imaging) to a final diagnosis in a structured, step-by-step format.

This model is NOT intended for clinical use, patient care, or medical decision-making.

What It Does

Given a patient case (chief complaint, history, exam findings, labs, and imaging), the model produces:

  1. A final diagnosis
  2. A numbered reasoning chain that explains how each piece of clinical evidence supports or leads to that diagnosis

Example

Input:

A patient presents with productive cough, fatigue, and chest congestion. History of prior TB treatment. Chest CT shows a thin-walled cavity in the right lower lobe with adjacent calcified granulomas and bronchiectasis.

Output:

FINAL DIAGNOSIS: Tuberculosis

Step 1: Cavities in the lungs are common in active tuberculosis, especially when the walls of the cavities are thin, indicating the possibility of active disease or reactivation of infection. Supporting evidence: Superior segment right lower lobe relatively thin-walled cavity

Step 2: The patient had been treated for tuberculosis several years earlier, which is important background information because tuberculosis can recur. Supporting evidence: TB treated years ago

Step 3: In patients with a history of tuberculosis, these symptoms may indicate activity or recurrence of tuberculosis. Supporting evidence: symptoms of URI including fatigue, productive cough, runny nose, and chest congestion

Training Details

Dataset

Trained on the DiReCT (Diagnostic Reasoning for Clinical Notes) dataset, which contains 511 clinical notes sourced from MIMIC-IV. Each note was annotated by physicians with structured diagnostic reasoning trees mapping clinical observations to final diagnoses.

The dataset covers 25 disease categories and 73 unique diagnoses, including:

  • Acute Coronary Syndrome (NSTEMI, Unstable Angina)
  • Heart Failure (HFrEF, HFpEF)
  • Stroke (Hemorrhagic, Ischemic)
  • Pulmonary Embolism
  • Pneumonia
  • COPD
  • Multiple Sclerosis
  • Tuberculosis
  • Hypertension
  • And many more

Training Configuration

Parameter Value
Base model meta-llama/Llama-3.2-3B-Instruct
Method SFT with LoRA (PEFT)
Quantization 4-bit (NF4)
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.05
Learning rate 3e-5
Epochs 3
Batch size 1 (effective 8 with gradient accumulation)
Precision FP16
Hardware NVIDIA T4 (Google Colab)

Training Results

The model trained for 3 epochs with a steady decrease in loss:

Step Training Loss
10 22.38
30 19.23
50 17.03
70 15.23
90 15.08
110 15.07
130 14.57
150 13.90
170 14.35
180 13.71

Limitations

  • Not for clinical use. This model is an educational experiment and should never be used for actual patient care or medical decision-making.
  • Small training set. 511 cases is a modest dataset for fine-tuning. The model may not generalize well to diseases or presentations not represented in the training data.
  • Small base model. Llama 3.2 3B is a relatively small model. Larger models would likely produce better reasoning.
  • Biases. The training data comes from a single institution (MIMIC-IV / Beth Israel Deaconess Medical Center), so the model may reflect that institution's patient population and clinical practices.
  • Hallucination risk. Like all language models, this model can generate plausible-sounding but incorrect medical reasoning.

Citation

If you use this model, please cite the DiReCT dataset:

@article{wang2024direct,
  title={DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models},
  author={Wang, Bowen and Chang, Jiuyang and Qian, Yiming and others},
  journal={arXiv preprint arXiv:2408.01933},
  year={2024}
}
@article{PhysioNet-mimic-iv-ext-direct-1.0.0,
  author = {Wang, Bowen and Chang, Jiuyang and Qian, Yiming},
  title = {{MIMIC-IV-Ext-DiReCT}},
  journal = {{PhysioNet}},
  year = {2025},
  doi = {10.13026/yf96-kc87}
}

Contact

This model was created as a learning exercise in fine-tuning language models for medical education applications. Created by Arman Yalcin www.linkedin.com/in/arman8514581

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ploppy2/Clinical-Reasoning-Test1

Adapter
(663)
this model

Paper for ploppy2/Clinical-Reasoning-Test1