Model Card for smollm3-discharge-sentences-sft

This model is a fine-tuned version of HuggingFaceTB/SmolLM3-3B-Base for clinical sentence classification. It has been trained using TRL.

Model Description

This model classifies individual sentences from hospital discharge summaries into categories of follow-up actions. Given a sentence, it outputs a JSON object indicating which action categories apply.

Categories

  • instructions: Case-specific instructions for the patient
  • appointment: Appointment-related followup
  • medication: Medication-related followups
  • lab: Lab-related followup
  • procedure: Procedure-related followup
  • imaging: Imaging-related followup
  • other: Other helpful contextual information
  • none: Not an action item

Quick start

from transformers import pipeline

sentence = "The patient was to follow up with Dr. Greene in three to four weeks."
generator = pipeline("text-generation", model="chrisvoncsefalvay/smollm3-discharge-sentences-sft", device="cuda")
output = generator([
    {"role": "system", "content": "You are a clinical action item classifier..."},
    {"role": "user", "content": f"Classify this sentence:\n\n{sentence}"}
], max_new_tokens=64, return_full_text=False)[0]
print(output["generated_text"])
# Output: {"categories": ["appointment"]}

Evaluation Results

Evaluated on 5,313 test samples from chrisvoncsefalvay/smol-discharge-sentences-sft.

Overall Metrics

Metric Score
JSON Validity 97.5%
Exact Match Accuracy 76.3%
Micro F1 0.631
Macro F1 0.568
Micro Precision 0.734
Micro Recall 0.553

Per-Category Performance

Category Precision Recall F1 Support
instructions 0.931 0.460 0.616 1153
appointment 0.905 0.682 0.778 660
medication 0.354 0.707 0.471 239
lab 0.740 0.689 0.714 132
procedure 0.857 0.343 0.490 35
imaging 0.667 0.811 0.732 37
other 0.294 0.127 0.177 79

Key Findings

Strengths:

  • High JSON validity (97.5%) - reliable structured output
  • Strong precision on instructions (93.1%) and appointment (90.5%)
  • Best F1 on appointment (0.778) and imaging (0.732)

Limitations:

  • Lower recall on instructions (46.0%) and procedure (34.3%)
  • Weak performance on other category (F1=0.177)

Training procedure

This model was trained with SFT on the chrisvoncsefalvay/smol-discharge-sentences-sft dataset.

  • Training samples: 25,782
  • Validation samples: 5,142
  • Epochs: 3
  • Effective batch size: 16
  • Learning rate: 5e-5
  • LoRA rank: 64

Framework versions

  • TRL: 0.25.1
  • Transformers: 4.57.3
  • Pytorch: 2.9.1
  • Datasets: 4.4.1
  • Tokenizers: 0.22.1

Intended Use

This model is intended for research purposes in clinical NLP, specifically for:

  • Identifying follow-up action items in discharge summaries
  • Structured extraction of patient instructions
  • Clinical document analysis pipelines

Limitations

  • Trained on MIMIC-III data (US hospital system, English only)
  • May not generalize to other clinical contexts or languages
  • Should not be used for clinical decision-making without human review

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chrisvoncsefalvay/smollm3-discharge-sentences-sft

Finetuned
(86)
this model

Dataset used to train chrisvoncsefalvay/smollm3-discharge-sentences-sft