Instructions to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "Rakshithch/qwen2.5-0.5b-icd10cm-coder")

Transformers

How to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Rakshithch/qwen2.5-0.5b-icd10cm-coder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Rakshithch/qwen2.5-0.5b-icd10cm-coder", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Rakshithch/qwen2.5-0.5b-icd10cm-coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rakshithch/qwen2.5-0.5b-icd10cm-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Rakshithch/qwen2.5-0.5b-icd10cm-coder

SGLang

How to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Rakshithch/qwen2.5-0.5b-icd10cm-coder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rakshithch/qwen2.5-0.5b-icd10cm-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Rakshithch/qwen2.5-0.5b-icd10cm-coder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rakshithch/qwen2.5-0.5b-icd10cm-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Rakshithch/qwen2.5-0.5b-icd10cm-coder with Docker Model Runner:
```
docker model run hf.co/Rakshithch/qwen2.5-0.5b-icd10cm-coder
```

Rakshithch commited on 18 days ago

Commit

39600f7

verified ·

1 Parent(s): 124157b

Add GPU training script

Browse files

Files changed (1) hide show

train_icd10_gpu.py +375 -0

train_icd10_gpu.py ADDED Viewed

	@@ -0,0 +1,375 @@

+"""
+ICD-10-CM Clinical Coding Fine-tuning Script
+=============================================
+Fine-tunes Qwen2.5-1.5B-Instruct with LoRA on synthetic EHR data
+for ICD-10-CM code classification from clinical text.
+Based on:
+- Recipe 3 from literature review (Lenz et al., arxiv:2510.13624)
+- FiscaAI/synth-ehr-icd10cm-prompt dataset (366K rows, 5071 codes)
+- TRL SFTTrainer with prompt/completion format (loss on codes only)
+"""
+import os
+import re
+import json
+import random
+import numpy as np
+from collections import Counter
+import torch
+import trackio
+from datasets import load_dataset, Dataset
+from peft import LoraConfig
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from trl import SFTConfig, SFTTrainer
+# ============================================================================
+# Configuration
+# ============================================================================
+MODEL_NAME = "Qwen/Qwen2.5-1.5B-Instruct"
+HUB_MODEL_ID = "Rakshithch/qwen2.5-1.5b-icd10cm-coder"
+DATASET_NAME = "FiscaAI/synth-ehr-icd10cm-prompt"
+OUTPUT_DIR = "./qwen2.5-1.5b-icd10cm-lora"
+# Training hyperparameters (from literature: LoRA SFT recipe)
+LEARNING_RATE = 2e-4        # LoRA ~10x base LR
+NUM_EPOCHS = 3
+BATCH_SIZE = 4
+GRAD_ACCUM = 8              # effective batch = 32
+MAX_SEQ_LENGTH = 1024       # P95 of user+assistant text fits in ~512 tokens
+LORA_R = 16
+LORA_ALPHA = 32
+# Data splits
+TRAIN_SIZE = 0.90
+VAL_SIZE = 0.05
+TEST_SIZE = 0.05
+SEED = 42
+# ============================================================================
+# Initialize trackio
+# ============================================================================
+trackio.init(
+    project="icd10-clinical-coding",
+    name="qwen2.5-1.5b-lora-r16-full",
+    config={
+        "model": MODEL_NAME,
+        "dataset": DATASET_NAME,
+        "lora_r": LORA_R,
+        "lora_alpha": LORA_ALPHA,
+        "lr": LEARNING_RATE,
+        "epochs": NUM_EPOCHS,
+        "batch_size": BATCH_SIZE,
+        "grad_accum": GRAD_ACCUM,
+        "max_seq_length": MAX_SEQ_LENGTH,
+    },
+)
+# ============================================================================
+# 1. Load and prepare dataset
+# ============================================================================
+print("=" * 70)
+print("Loading dataset...")
+print("=" * 70)
+raw_ds = load_dataset(DATASET_NAME, split="train")
+print(f"Total rows: {len(raw_ds)}")
+# Remove empty/null user fields
+raw_ds = raw_ds.filter(lambda x: x["user"] and x["user"].strip() != "")
+print(f"After filtering empties: {len(raw_ds)}")
+# Improved system prompt for ICD-10-CM coding in healthcare claims context
+SYSTEM_PROMPT = (
+    "You are an expert medical coder specializing in ICD-10-CM coding for "
+    "healthcare claims processing (X12 EDI 837 format). Given a clinical "
+    "note or symptom description, identify the correct ICD-10-CM diagnosis "
+    "code. Provide the code followed by a brief explanation."
+)
+def format_to_prompt_completion(example):
+    """Convert to prompt/completion format for loss on completion only."""
+    prompt = [
+        {"role": "system", "content": SYSTEM_PROMPT},
+        {"role": "user", "content": example["user"]},
+    ]
+    # Extract just the ICD code and explanation from assistant
+    completion = [
+        {"role": "assistant", "content": example["assistant"]},
+    ]
+    return {"prompt": prompt, "completion": completion}
+print("Formatting dataset to prompt/completion...")
+formatted_ds = raw_ds.map(
+    format_to_prompt_completion,
+    remove_columns=raw_ds.column_names,
+    num_proc=4,
+    desc="Formatting",
+)
+# Split into train/val/test
+print("Splitting dataset...")
+ds_split = formatted_ds.train_test_split(test_size=(VAL_SIZE + TEST_SIZE), seed=SEED)
+val_test = ds_split["test"].train_test_split(test_size=TEST_SIZE / (VAL_SIZE + TEST_SIZE), seed=SEED)
+train_ds = ds_split["train"]
+val_ds = val_test["train"]
+test_ds = val_test["test"]
+print(f"Train: {len(train_ds)}, Val: {len(val_ds)}, Test: {len(test_ds)}")
+# ============================================================================
+# 2. Model & LoRA setup
+# ============================================================================
+print("\n" + "=" * 70)
+print("Loading model...")
+print("=" * 70)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_NAME,
+    dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2",
+    device_map="auto",
+)
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
+if tokenizer.pad_token is None:
+    tokenizer.pad_token = tokenizer.eos_token
+print(f"Model loaded: {MODEL_NAME}")
+print(f"Model dtype: {model.dtype}")
+print(f"Parameters: {sum(p.numel() for p in model.parameters()) / 1e9:.2f}B")
+peft_config = LoraConfig(
+    r=LORA_R,
+    lora_alpha=LORA_ALPHA,
+    lora_dropout=0.05,
+    bias="none",
+    task_type="CAUSAL_LM",
+    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
+                     "gate_proj", "up_proj", "down_proj"],  # all attention + MLP
+)
+# ============================================================================
+# 3. Training
+# ============================================================================
+print("\n" + "=" * 70)
+print("Setting up training...")
+print("=" * 70)
+training_args = SFTConfig(
+    output_dir=OUTPUT_DIR,
+    num_train_epochs=NUM_EPOCHS,
+    per_device_train_batch_size=BATCH_SIZE,
+    per_device_eval_batch_size=BATCH_SIZE,
+    gradient_accumulation_steps=GRAD_ACCUM,
+    learning_rate=LEARNING_RATE,
+    lr_scheduler_type="cosine",
+    warmup_ratio=0.05,
+    optim="adamw_torch_fused",
+    bf16=True,
+    max_length=MAX_SEQ_LENGTH,
+    gradient_checkpointing=True,
+    gradient_checkpointing_kwargs={"use_reentrant": False},
+    # Logging
+    logging_steps=25,
+    logging_first_step=True,
+    disable_tqdm=True,
+    report_to="trackio",
+    run_name="qwen2.5-1.5b-icd10cm-lora-r16",
+    # Evaluation
+    eval_strategy="steps",
+    eval_steps=500,
+    save_strategy="steps",
+    save_steps=500,
+    save_total_limit=3,
+    load_best_model_at_end=True,
+    metric_for_best_model="eval_loss",
+    # Push to Hub
+    push_to_hub=True,
+    hub_model_id=HUB_MODEL_ID,
+    hub_strategy="every_save",
+)
+trainer = SFTTrainer(
+    model=model,
+    args=training_args,
+    train_dataset=train_ds,
+    eval_dataset=val_ds,
+    peft_config=peft_config,
+    processing_class=tokenizer,
+)
+print(f"Trainable parameters: {trainer.model.print_trainable_parameters()}")
+print(f"\nStarting training for {NUM_EPOCHS} epochs...")
+train_result = trainer.train()
+print("\n" + "=" * 70)
+print("Training complete!")
+print(f"Train loss: {train_result.training_loss:.4f}")
+print("=" * 70)
+# Save final model
+trainer.save_model(OUTPUT_DIR)
+trainer.push_to_hub()
+# ============================================================================
+# 4. Evaluation on test set
+# ============================================================================
+print("\n" + "=" * 70)
+print("Evaluating on test set...")
+print("=" * 70)
+from transformers import pipeline
+# Load fine-tuned model for inference
+pipe = pipeline(
+    "text-generation",
+    model=OUTPUT_DIR,
+    tokenizer=tokenizer,
+    device_map="auto",
+    max_new_tokens=128,
+)
+# Evaluation metrics
+correct_exact = 0
+correct_partial = 0
+correct_chapter = 0
+correct_category = 0  # first 3 chars (e.g., J18)
+total = 0
+results = []
+# Sample test set for evaluation (max 2000 for speed)
+eval_size = min(2000, len(test_ds))
+eval_indices = random.sample(range(len(test_ds)), eval_size)
+print(f"Evaluating on {eval_size} test examples...")
+for idx, i in enumerate(eval_indices):
+    example = test_ds[i]
+    # Build the prompt messages
+    messages = example["prompt"]
+    # Generate
+    output = pipe(messages, max_new_tokens=128, do_sample=False, temperature=None)
+    generated = output[0]["generated_text"][-1]["content"]
+    # Extract predicted ICD code from generated text
+    # Pattern: look for ICD-10-CM code format (letter + digits + optional dot + more chars)
+    pred_codes = re.findall(r'\b([A-Z]\d{2}(?:\.\d{1,4})?(?:[A-Z])?)\b', generated)
+    # Extract ground truth code from completion
+    gt_text = example["completion"][0]["content"]
+    gt_codes = re.findall(r'\b([A-Z]\d{2}(?:\.\d{1,4})?(?:[A-Z])?)\b', gt_text)
+    if gt_codes and pred_codes:
+        gt_code = gt_codes[0]
+        pred_code = pred_codes[0]
+        # Exact match
+        if pred_code == gt_code:
+            correct_exact += 1
+        # Partial match (code without laterality suffix)
+        gt_base = gt_code.split('.')[0] + ('.' + gt_code.split('.')[1][:2] if '.' in gt_code else '')
+        pred_base = pred_code.split('.')[0] + ('.' + pred_code.split('.')[1][:2] if '.' in pred_code else '')
+        if pred_base == gt_base:
+            correct_partial += 1
+        # Category match (first 3 chars, e.g., J18, M24)
+        if pred_code[:3] == gt_code[:3]:
+            correct_category += 1
+        # Chapter match (first letter)
+        if pred_code[0] == gt_code[0]:
+            correct_chapter += 1
+        results.append({
+            "gt_code": gt_code,
+            "pred_code": pred_code,
+            "exact_match": pred_code == gt_code,
+            "category_match": pred_code[:3] == gt_code[:3],
+        })
+    else:
+        results.append({
+            "gt_code": gt_codes[0] if gt_codes else "NONE",
+            "pred_code": pred_codes[0] if pred_codes else "NONE",
+            "exact_match": False,
+            "category_match": False,
+        })
+    total += 1
+    if (idx + 1) % 200 == 0:
+        print(f"  Evaluated {idx+1}/{eval_size} | "
+              f"Exact: {correct_exact/total*100:.1f}% | "
+              f"Category: {correct_category/total*100:.1f}%")
+# Final metrics
+print("\n" + "=" * 70)
+print("EVALUATION RESULTS")
+print("=" * 70)
+exact_acc = correct_exact / total * 100
+partial_acc = correct_partial / total * 100
+category_acc = correct_category / total * 100
+chapter_acc = correct_chapter / total * 100
+print(f"  Exact Match Accuracy:    {exact_acc:.2f}% ({correct_exact}/{total})")
+print(f"  Partial Match Accuracy:  {partial_acc:.2f}% ({correct_partial}/{total})")
+print(f"  Category (3-char) Acc:   {category_acc:.2f}% ({correct_category}/{total})")
+print(f"  Chapter (1st letter):    {chapter_acc:.2f}% ({correct_chapter}/{total})")
+# Log to trackio
+trackio.log({
+    "eval/exact_match_accuracy": exact_acc,
+    "eval/partial_match_accuracy": partial_acc,
+    "eval/category_accuracy": category_acc,
+    "eval/chapter_accuracy": chapter_acc,
+    "eval/total_samples": total,
+})
+# Error analysis: which chapters have lowest accuracy
+print("\n--- Per-Chapter Accuracy ---")
+chapter_stats = {}
+for r in results:
+    ch = r["gt_code"][0] if r["gt_code"] != "NONE" else "?"
+    if ch not in chapter_stats:
+        chapter_stats[ch] = {"total": 0, "correct": 0}
+    chapter_stats[ch]["total"] += 1
+    if r["exact_match"]:
+        chapter_stats[ch]["correct"] += 1
+for ch in sorted(chapter_stats.keys()):
+    s = chapter_stats[ch]
+    acc = s["correct"] / s["total"] * 100 if s["total"] > 0 else 0
+    print(f"  Chapter {ch}: {acc:.1f}% ({s['correct']}/{s['total']})")
+# Save results
+with open(os.path.join(OUTPUT_DIR, "eval_results.json"), "w") as f:
+    json.dump({
+        "exact_match_accuracy": exact_acc,
+        "partial_match_accuracy": partial_acc,
+        "category_accuracy": category_acc,
+        "chapter_accuracy": chapter_acc,
+        "total_evaluated": total,
+        "per_chapter": chapter_stats,
+    }, f, indent=2)
+# Sample predictions
+print("\n--- Sample Predictions ---")
+for r in results[:10]:
+    status = "✅" if r["exact_match"] else ("🟡" if r["category_match"] else "❌")
+    print(f"  {status} GT: {r['gt_code']:<12} Pred: {r['pred_code']}")
+trackio.finish()
+print("\n" + "=" * 70)
+print(f"Model saved to Hub: https://hf.co/{HUB_MODEL_ID}")
+print(f"Training dashboard: trackio")
+print("=" * 70)