Text Generation
PEFT
Safetensors
Transformers
English
medical
icd-10
clinical-coding
healthcare
lora
sft
trl
conversational
Rakshithch's picture
Update README with comprehensive model card, evaluation results, and training guide
80447b7 verified
---
base_model: Qwen/Qwen2.5-0.5B-Instruct
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
language:
- en
tags:
- medical
- icd-10
- clinical-coding
- healthcare
- lora
- peft
- sft
- transformers
- trl
- base_model:adapter:Qwen/Qwen2.5-0.5B-Instruct
datasets:
- FiscaAI/synth-ehr-icd10cm-prompt
- Rakshithch/icd10cm-clinical-coding-sft
---
# Qwen2.5-0.5B ICD-10-CM Clinical Coder (LoRA Adapter)
A fine-tuned LoRA adapter for automatic **ICD-10-CM diagnosis code classification** from clinical text descriptions. Trained on synthetic EHR records for healthcare claims processing.
## πŸ₯ Use Case
Designed for healthcare analytics pipelines:
- **Claims Processing**: Suggest ICD-10-CM codes for X12 EDI 837 claims
- **Denial Rate Reduction**: Auto-coding review to catch miscoded claims
- **Diagnosis Trend Analysis**: Automated coding for population health analytics
## πŸ“Š Current Results (Proof-of-Concept)
This adapter was trained with limited compute (50 steps, CPU, 500 examples from top-20 codes):
| Metric | Score |
|--------|-------|
| **Exact Match** | 28.6% |
| **Category Match (3-char)** | 28.6% |
| **Chapter Match (1st letter)** | 48.6% |
| Training Loss | 0.649 (from 1.97) |
| Token Accuracy | 94% (from 62%) |
**Top-performing codes** (100% accuracy): G47.31 (sleep apnea), G57.10 (meralgia paresthetica), J32.9 (chronic sinusitis)
> ⚠️ **These results are from a minimal CPU training run.** Full GPU training on 366K examples with Qwen2.5-1.5B should achieve **41-58% exact match** based on [Lenz et al. (2025)](https://arxiv.org/abs/2510.13624).
## πŸš€ Quick Start
```python
from transformers import pipeline
pipe = pipeline("text-generation", model="Rakshithch/qwen2.5-0.5b-icd10cm-coder", device_map="auto")
messages = [
{"role": "system", "content": "You are an expert medical coder specializing in ICD-10-CM coding for healthcare claims processing."},
{"role": "user", "content": "Patient presents with chronic sinusitis, nasal congestion and facial pressure for 3 months."},
]
result = pipe(messages, max_new_tokens=128, do_sample=False)
print(result[0]["generated_text"][-1]["content"])
# Expected: J32.9 - Chronic sinusitis, unspecified
```
## πŸ‹οΈ Full GPU Training (Recommended)
For production-quality results, run the included GPU training script:
```bash
pip install torch transformers trl peft datasets trackio accelerate flash-attn
python train_icd10_gpu.py
```
**Hardware**: A10G (24GB VRAM) or better | **Time**: ~2-3 hours | **Expected**: 41-58% exact match
The script fine-tunes **Qwen2.5-1.5B-Instruct** with LoRA (r=16) on the full 366K dataset.
## πŸ“‹ Training Details
### Dataset
- **Source**: [FiscaAI/synth-ehr-icd10cm-prompt](https://hf.co/datasets/FiscaAI/synth-ehr-icd10cm-prompt) β†’ [Rakshithch/icd10cm-clinical-coding-sft](https://hf.co/datasets/Rakshithch/icd10cm-clinical-coding-sft)
- **Size**: 366,118 examples (329K train / 18K val / 18K test)
- **Codes**: 5,071 unique ICD-10-CM codes across all major chapters
- **Format**: Clinical notes β†’ ICD-10-CM code + explanation
### Literature Basis
| Paper | Key Finding |
|-------|-------------|
| [Lenz et al. 2025](https://arxiv.org/abs/2510.13624) | Instruction-tuning LLMs on ICD catalog QA β†’ 41-58% exact accuracy |
| [MERA (2025)](https://arxiv.org/abs/2501.17326) | Code memorization pre-phase improves ICD coding by 15%+ |
| [PLM-CA (2025)](https://arxiv.org/abs/2603.00221) | BERT + label-wise attention β†’ 71.8% micro-F1 on 1.8M patient cohort |
### Hyperparameters
```
Base Model: Qwen/Qwen2.5-0.5B-Instruct (proof-of-concept)
β†’ For production: Qwen/Qwen2.5-1.5B-Instruct
LoRA: r=8, alpha=16, target=q/k/v/o_proj
Learning Rate: 2e-4 (AdamW, cosine schedule)
Effective Batch Size: 2 (batch=1, grad_accum=2)
Max Sequence Length: 384 tokens
Training: 50 steps (~8 minutes on CPU)
Loss: prompt/completion format (loss on ICD codes only)
```
### Per-Code Accuracy (Top-20 Codes)
| Code | Description | Accuracy |
|------|-------------|----------|
| G47.31 | Primary central sleep apnea | 100% (3/3) |
| G57.10 | Meralgia paresthetica, unspecified | 100% (4/4) |
| J32.9 | Chronic sinusitis, unspecified | 100% (4/4) |
| R30.0 | Dysuria | 67% (4/6) |
| J02.9 | Acute pharyngitis, unspecified | 50% (1/2) |
| M79.10 | Myalgia, unspecified site | 40% (2/5) |
## ⚠️ Limitations
- **Proof-of-concept**: Only 50 training steps on 500 examples β€” full training needed for production use
- **Synthetic data**: Trained on synthetic clinical notes, not real patient records
- **Single-label**: One ICD-10 code per clinical note (real claims often have multiple codes)
- **Not for clinical use**: Should not replace human medical coders β€” requires expert review
- **US ICD-10-CM only**: Not validated for ICD-10-GM, ICD-10-AM, or other national modifications
## πŸ“ˆ Expected Improvement with Full Training
Based on literature, scaling from our proof-of-concept to full training should yield:
| Configuration | Expected Exact Match |
|---------------|---------------------|
| Current (50 steps, 500 examples, 0.5B) | 28.6% |
| Full data (366K examples, 0.5B, 3 epochs) | ~35-45% |
| Qwen2.5-1.5B + full data + LoRA r=16 | ~45-58% |
| + Code memorization pre-phase (MERA) | ~55-65% |
| PLM-CA encoder approach (110M params) | ~70%+ micro-F1 |
## Framework Versions
- PEFT 0.19.1
- TRL (latest)
- Transformers (latest)