Model Card for smollm3-discharge-sentences-sft

This model is a fine-tuned version of HuggingFaceTB/SmolLM3-3B-Base for clinical sentence classification. It has been trained using TRL.

Model Description

This model classifies individual sentences from hospital discharge summaries into categories of follow-up actions. Given a sentence, it outputs a JSON object indicating which action categories apply.

Quick start

from transformers import pipeline

sentence = "The patient was to follow up with Dr. Greene in three to four weeks."
generator = pipeline("text-generation", model="chrisvoncsefalvay/smollm3-discharge-sentences-sft", device="cuda")
output = generator([
    {"role": "system", "content": "You are a clinical action item classifier..."},
    {"role": "user", "content": f"Classify this sentence:\n\n{sentence}"}
], max_new_tokens=64, return_full_text=False)[0]
print(output["generated_text"])
# Output: {"categories": ["appointment"]}

Evaluation Results

Evaluated on 5,313 test samples from chrisvoncsefalvay/smol-discharge-sentences-sft.

Overall Metrics

Metric	Score
JSON Validity	97.5%
Exact Match Accuracy	76.3%
Micro F1	0.631
Macro F1	0.568
Micro Precision	0.734
Micro Recall	0.553

Per-Category Performance

Category	Precision	Recall	F1	Support
instructions	0.931	0.460	0.616	1153
appointment	0.905	0.682	0.778	660
medication	0.354	0.707	0.471	239
lab	0.740	0.689	0.714	132
procedure	0.857	0.343	0.490	35
imaging	0.667	0.811	0.732	37
other	0.294	0.127	0.177	79

Key Findings

Strengths:

High JSON validity (97.5%) - reliable structured output
Strong precision on instructions (93.1%) and appointment (90.5%)
Best F1 on appointment (0.778) and imaging (0.732)

Limitations:

Lower recall on instructions (46.0%) and procedure (34.3%)
Weak performance on other category (F1=0.177)

Training procedure

This model was trained with SFT on the chrisvoncsefalvay/smol-discharge-sentences-sft dataset.

Training samples: 25,782
Validation samples: 5,142
Epochs: 3
Effective batch size: 16
Learning rate: 5e-5
LoRA rank: 64

Framework versions

TRL: 0.25.1
Transformers: 4.57.3
Pytorch: 2.9.1
Datasets: 4.4.1
Tokenizers: 0.22.1

Intended Use

This model is intended for research purposes in clinical NLP, specifically for:

Identifying follow-up action items in discharge summaries
Structured extraction of patient instructions
Clinical document analysis pipelines

Limitations

Trained on MIMIC-III data (US hospital system, English only)
May not generalize to other clinical contexts or languages
Should not be used for clinical decision-making without human review

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chrisvoncsefalvay/smollm3-discharge-sentences-sft

Base model

HuggingFaceTB/SmolLM3-3B-Base

Finetuned

(86)

this model

chrisvoncsefalvay
/

smollm3-discharge-sentences-sft