Instructions to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "Rakshithch/qwen2.5-0.5b-icd10cm-coder")

Transformers

How to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Rakshithch/qwen2.5-0.5b-icd10cm-coder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Rakshithch/qwen2.5-0.5b-icd10cm-coder", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Rakshithch/qwen2.5-0.5b-icd10cm-coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rakshithch/qwen2.5-0.5b-icd10cm-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Rakshithch/qwen2.5-0.5b-icd10cm-coder

SGLang

How to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Rakshithch/qwen2.5-0.5b-icd10cm-coder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rakshithch/qwen2.5-0.5b-icd10cm-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Rakshithch/qwen2.5-0.5b-icd10cm-coder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rakshithch/qwen2.5-0.5b-icd10cm-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with Docker Model Runner:
```
docker model run hf.co/Rakshithch/qwen2.5-0.5b-icd10cm-coder
```

qwen2.5-0.5b-icd10cm-coder / README.md

Rakshithch

Update README with comprehensive model card, evaluation results, and training guide

80447b7 verified 16 days ago

preview code

raw

history blame contribute delete

5.42 kB

	---
	base_model: Qwen/Qwen2.5-0.5B-Instruct
	library_name: peft
	pipeline_tag: text-generation
	license: apache-2.0
	language:
	- en
	tags:
	- medical
	- icd-10
	- clinical-coding
	- healthcare
	- lora
	- peft
	- sft
	- transformers
	- trl
	- base_model:adapter:Qwen/Qwen2.5-0.5B-Instruct
	datasets:
	- FiscaAI/synth-ehr-icd10cm-prompt
	- Rakshithch/icd10cm-clinical-coding-sft
	---

	# Qwen2.5-0.5B ICD-10-CM Clinical Coder (LoRA Adapter)

	A fine-tuned LoRA adapter for automatic ICD-10-CM diagnosis code classification from clinical text descriptions. Trained on synthetic EHR records for healthcare claims processing.

	## 🏥 Use Case

	Designed for healthcare analytics pipelines:
	- Claims Processing: Suggest ICD-10-CM codes for X12 EDI 837 claims
	- Denial Rate Reduction: Auto-coding review to catch miscoded claims
	- Diagnosis Trend Analysis: Automated coding for population health analytics

	## 📊 Current Results (Proof-of-Concept)

	This adapter was trained with limited compute (50 steps, CPU, 500 examples from top-20 codes):

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Exact Match \| 28.6% \|
	\| Category Match (3-char) \| 28.6% \|
	\| Chapter Match (1st letter) \| 48.6% \|
	\| Training Loss \| 0.649 (from 1.97) \|
	\| Token Accuracy \| 94% (from 62%) \|

	Top-performing codes (100% accuracy): G47.31 (sleep apnea), G57.10 (meralgia paresthetica), J32.9 (chronic sinusitis)

	> ⚠️ These results are from a minimal CPU training run. Full GPU training on 366K examples with Qwen2.5-1.5B should achieve 41-58% exact match based on [Lenz et al. (2025)](https://arxiv.org/abs/2510.13624).

	## 🚀 Quick Start

	```python
	from transformers import pipeline

	pipe = pipeline("text-generation", model="Rakshithch/qwen2.5-0.5b-icd10cm-coder", device_map="auto")

	messages = [
	{"role": "system", "content": "You are an expert medical coder specializing in ICD-10-CM coding for healthcare claims processing."},
	{"role": "user", "content": "Patient presents with chronic sinusitis, nasal congestion and facial pressure for 3 months."},
	]

	result = pipe(messages, max_new_tokens=128, do_sample=False)
	print(result[0]["generated_text"][-1]["content"])
	# Expected: J32.9 - Chronic sinusitis, unspecified
	```

	## 🏋️ Full GPU Training (Recommended)

	For production-quality results, run the included GPU training script:

	```bash
	pip install torch transformers trl peft datasets trackio accelerate flash-attn
	python train_icd10_gpu.py
	```

	Hardware: A10G (24GB VRAM) or better \| Time: ~2-3 hours \| Expected: 41-58% exact match

	The script fine-tunes Qwen2.5-1.5B-Instruct with LoRA (r=16) on the full 366K dataset.

	## 📋 Training Details

	### Dataset
	- Source: [FiscaAI/synth-ehr-icd10cm-prompt](https://hf.co/datasets/FiscaAI/synth-ehr-icd10cm-prompt) → [Rakshithch/icd10cm-clinical-coding-sft](https://hf.co/datasets/Rakshithch/icd10cm-clinical-coding-sft)
	- Size: 366,118 examples (329K train / 18K val / 18K test)
	- Codes: 5,071 unique ICD-10-CM codes across all major chapters
	- Format: Clinical notes → ICD-10-CM code + explanation

	### Literature Basis
	\| Paper \| Key Finding \|
	\|-------\|-------------\|
	\| [Lenz et al. 2025](https://arxiv.org/abs/2510.13624) \| Instruction-tuning LLMs on ICD catalog QA → 41-58% exact accuracy \|
	\| [MERA (2025)](https://arxiv.org/abs/2501.17326) \| Code memorization pre-phase improves ICD coding by 15%+ \|
	\| [PLM-CA (2025)](https://arxiv.org/abs/2603.00221) \| BERT + label-wise attention → 71.8% micro-F1 on 1.8M patient cohort \|

	### Hyperparameters
	```
	Base Model: Qwen/Qwen2.5-0.5B-Instruct (proof-of-concept)
	→ For production: Qwen/Qwen2.5-1.5B-Instruct
	LoRA: r=8, alpha=16, target=q/k/v/o_proj
	Learning Rate: 2e-4 (AdamW, cosine schedule)
	Effective Batch Size: 2 (batch=1, grad_accum=2)
	Max Sequence Length: 384 tokens
	Training: 50 steps (~8 minutes on CPU)
	Loss: prompt/completion format (loss on ICD codes only)
	```

	### Per-Code Accuracy (Top-20 Codes)
	\| Code \| Description \| Accuracy \|
	\|------\|-------------\|----------\|
	\| G47.31 \| Primary central sleep apnea \| 100% (3/3) \|
	\| G57.10 \| Meralgia paresthetica, unspecified \| 100% (4/4) \|
	\| J32.9 \| Chronic sinusitis, unspecified \| 100% (4/4) \|
	\| R30.0 \| Dysuria \| 67% (4/6) \|
	\| J02.9 \| Acute pharyngitis, unspecified \| 50% (1/2) \|
	\| M79.10 \| Myalgia, unspecified site \| 40% (2/5) \|

	## ⚠️ Limitations

	- Proof-of-concept: Only 50 training steps on 500 examples — full training needed for production use
	- Synthetic data: Trained on synthetic clinical notes, not real patient records
	- Single-label: One ICD-10 code per clinical note (real claims often have multiple codes)
	- Not for clinical use: Should not replace human medical coders — requires expert review
	- US ICD-10-CM only: Not validated for ICD-10-GM, ICD-10-AM, or other national modifications

	## 📈 Expected Improvement with Full Training

	Based on literature, scaling from our proof-of-concept to full training should yield:

	\| Configuration \| Expected Exact Match \|
	\|---------------\|---------------------\|
	\| Current (50 steps, 500 examples, 0.5B) \| 28.6% \|
	\| Full data (366K examples, 0.5B, 3 epochs) \| ~35-45% \|
	\| Qwen2.5-1.5B + full data + LoRA r=16 \| ~45-58% \|
	\| + Code memorization pre-phase (MERA) \| ~55-65% \|
	\| PLM-CA encoder approach (110M params) \| ~70%+ micro-F1 \|

	## Framework Versions
	- PEFT 0.19.1
	- TRL (latest)
	- Transformers (latest)