Update README with comprehensive model card, evaluation results, and training guide
80447b7 verified | base_model: Qwen/Qwen2.5-0.5B-Instruct | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| license: apache-2.0 | |
| language: | |
| - en | |
| tags: | |
| - medical | |
| - icd-10 | |
| - clinical-coding | |
| - healthcare | |
| - lora | |
| - peft | |
| - sft | |
| - transformers | |
| - trl | |
| - base_model:adapter:Qwen/Qwen2.5-0.5B-Instruct | |
| datasets: | |
| - FiscaAI/synth-ehr-icd10cm-prompt | |
| - Rakshithch/icd10cm-clinical-coding-sft | |
| # Qwen2.5-0.5B ICD-10-CM Clinical Coder (LoRA Adapter) | |
| A fine-tuned LoRA adapter for automatic **ICD-10-CM diagnosis code classification** from clinical text descriptions. Trained on synthetic EHR records for healthcare claims processing. | |
| ## π₯ Use Case | |
| Designed for healthcare analytics pipelines: | |
| - **Claims Processing**: Suggest ICD-10-CM codes for X12 EDI 837 claims | |
| - **Denial Rate Reduction**: Auto-coding review to catch miscoded claims | |
| - **Diagnosis Trend Analysis**: Automated coding for population health analytics | |
| ## π Current Results (Proof-of-Concept) | |
| This adapter was trained with limited compute (50 steps, CPU, 500 examples from top-20 codes): | |
| | Metric | Score | | |
| |--------|-------| | |
| | **Exact Match** | 28.6% | | |
| | **Category Match (3-char)** | 28.6% | | |
| | **Chapter Match (1st letter)** | 48.6% | | |
| | Training Loss | 0.649 (from 1.97) | | |
| | Token Accuracy | 94% (from 62%) | | |
| **Top-performing codes** (100% accuracy): G47.31 (sleep apnea), G57.10 (meralgia paresthetica), J32.9 (chronic sinusitis) | |
| > β οΈ **These results are from a minimal CPU training run.** Full GPU training on 366K examples with Qwen2.5-1.5B should achieve **41-58% exact match** based on [Lenz et al. (2025)](https://arxiv.org/abs/2510.13624). | |
| ## π Quick Start | |
| ```python | |
| from transformers import pipeline | |
| pipe = pipeline("text-generation", model="Rakshithch/qwen2.5-0.5b-icd10cm-coder", device_map="auto") | |
| messages = [ | |
| {"role": "system", "content": "You are an expert medical coder specializing in ICD-10-CM coding for healthcare claims processing."}, | |
| {"role": "user", "content": "Patient presents with chronic sinusitis, nasal congestion and facial pressure for 3 months."}, | |
| ] | |
| result = pipe(messages, max_new_tokens=128, do_sample=False) | |
| print(result[0]["generated_text"][-1]["content"]) | |
| # Expected: J32.9 - Chronic sinusitis, unspecified | |
| ``` | |
| ## ποΈ Full GPU Training (Recommended) | |
| For production-quality results, run the included GPU training script: | |
| ```bash | |
| pip install torch transformers trl peft datasets trackio accelerate flash-attn | |
| python train_icd10_gpu.py | |
| ``` | |
| **Hardware**: A10G (24GB VRAM) or better | **Time**: ~2-3 hours | **Expected**: 41-58% exact match | |
| The script fine-tunes **Qwen2.5-1.5B-Instruct** with LoRA (r=16) on the full 366K dataset. | |
| ## π Training Details | |
| ### Dataset | |
| - **Source**: [FiscaAI/synth-ehr-icd10cm-prompt](https://hf.co/datasets/FiscaAI/synth-ehr-icd10cm-prompt) β [Rakshithch/icd10cm-clinical-coding-sft](https://hf.co/datasets/Rakshithch/icd10cm-clinical-coding-sft) | |
| - **Size**: 366,118 examples (329K train / 18K val / 18K test) | |
| - **Codes**: 5,071 unique ICD-10-CM codes across all major chapters | |
| - **Format**: Clinical notes β ICD-10-CM code + explanation | |
| ### Literature Basis | |
| | Paper | Key Finding | | |
| |-------|-------------| | |
| | [Lenz et al. 2025](https://arxiv.org/abs/2510.13624) | Instruction-tuning LLMs on ICD catalog QA β 41-58% exact accuracy | | |
| | [MERA (2025)](https://arxiv.org/abs/2501.17326) | Code memorization pre-phase improves ICD coding by 15%+ | | |
| | [PLM-CA (2025)](https://arxiv.org/abs/2603.00221) | BERT + label-wise attention β 71.8% micro-F1 on 1.8M patient cohort | | |
| ### Hyperparameters | |
| ``` | |
| Base Model: Qwen/Qwen2.5-0.5B-Instruct (proof-of-concept) | |
| β For production: Qwen/Qwen2.5-1.5B-Instruct | |
| LoRA: r=8, alpha=16, target=q/k/v/o_proj | |
| Learning Rate: 2e-4 (AdamW, cosine schedule) | |
| Effective Batch Size: 2 (batch=1, grad_accum=2) | |
| Max Sequence Length: 384 tokens | |
| Training: 50 steps (~8 minutes on CPU) | |
| Loss: prompt/completion format (loss on ICD codes only) | |
| ``` | |
| ### Per-Code Accuracy (Top-20 Codes) | |
| | Code | Description | Accuracy | | |
| |------|-------------|----------| | |
| | G47.31 | Primary central sleep apnea | 100% (3/3) | | |
| | G57.10 | Meralgia paresthetica, unspecified | 100% (4/4) | | |
| | J32.9 | Chronic sinusitis, unspecified | 100% (4/4) | | |
| | R30.0 | Dysuria | 67% (4/6) | | |
| | J02.9 | Acute pharyngitis, unspecified | 50% (1/2) | | |
| | M79.10 | Myalgia, unspecified site | 40% (2/5) | | |
| ## β οΈ Limitations | |
| - **Proof-of-concept**: Only 50 training steps on 500 examples β full training needed for production use | |
| - **Synthetic data**: Trained on synthetic clinical notes, not real patient records | |
| - **Single-label**: One ICD-10 code per clinical note (real claims often have multiple codes) | |
| - **Not for clinical use**: Should not replace human medical coders β requires expert review | |
| - **US ICD-10-CM only**: Not validated for ICD-10-GM, ICD-10-AM, or other national modifications | |
| ## π Expected Improvement with Full Training | |
| Based on literature, scaling from our proof-of-concept to full training should yield: | |
| | Configuration | Expected Exact Match | | |
| |---------------|---------------------| | |
| | Current (50 steps, 500 examples, 0.5B) | 28.6% | | |
| | Full data (366K examples, 0.5B, 3 epochs) | ~35-45% | | |
| | Qwen2.5-1.5B + full data + LoRA r=16 | ~45-58% | | |
| | + Code memorization pre-phase (MERA) | ~55-65% | | |
| | PLM-CA encoder approach (110M params) | ~70%+ micro-F1 | | |
| ## Framework Versions | |
| - PEFT 0.19.1 | |
| - TRL (latest) | |
| - Transformers (latest) | |