Qwen2.5-VL-7B-Medical-LoRA
A QLoRA fine-tuned adapter for Qwen2.5-VL-7B-Instruct optimized for medical document understanding and clinical information extraction from document images.
This adapter was trained on 1,000 medical document samples (PathVQA + MTSamples) using 4-bit NF4 quantization with LoRA rank 64, targeting all attention and MLP layers. It extracts diagnoses, medications with dosages, lab values with reference ranges, vital signs, procedures, and clinical abbreviations from scanned/photographed clinical documents.
GitHub Repository: sarathi-aiml/quantization-pipe
Key Results at a Glance
| Configuration | VRAM | Avg Latency | Min Latency | Max Latency | Medical Accuracy | vs FP16 VRAM | vs FP16 Speed |
|---|---|---|---|---|---|---|---|
| FP16 Baseline | 15,820 MB | 31.10s | 22.58s | 48.74s | N/A | -- | -- |
| NF4 Quantized (recommended) | 5,664 MB | 15.24s | 11.76s | 23.25s | 83.5% | -64.2% | 2.04x faster |
| NF4 + LoRA (this adapter) | 11,336 MB | 14.70s | 12.01s | 23.46s | 73.7% | -28.3% | 2.12x faster |
Recommendation: For production use, deploy the base model with NF4 quantization (no LoRA). It achieves 83.5% accuracy, requires only 5.7 GB VRAM, and runs 2x faster than FP16. See Analysis for details.
Quick Start
Load with LoRA Adapter (NF4 Quantized)
import torch
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration, BitsAndBytesConfig
from peft import PeftModel
from qwen_vl_utils import process_vision_info
from PIL import Image
# NF4 quantization config (same used during training)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
# Load base model with NF4 quantization
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct",
quantization_config=bnb_config,
device_map="auto",
)
# Load and merge LoRA adapter
model = PeftModel.from_pretrained(model, "sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA")
model = model.merge_and_unload()
model.eval()
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
# Inference
image = Image.open("medical_document.png")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Extract all medical diagnoses, medications with dosages, and vital signs from this document."},
],
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text], images=image_inputs, videos=video_inputs,
padding=True, return_tensors="pt",
).to(model.device)
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=1024)
generated_ids = output_ids[0][inputs["input_ids"].shape[1]:]
response = processor.decode(generated_ids, skip_special_tokens=True)
print(response)
Use Base Model Only (Higher Accuracy, Recommended for Production)
# Same as above but skip the PeftModel step:
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct",
quantization_config=bnb_config,
device_map="auto",
)
model.eval()
# ... rest of inference code identical
Sample Use Cases with Example Prompts
Use Case 1: Emergency Department Triage
Upload a referral letter or patient assessment and extract critical information in seconds.
prompt = """Extract all medical diagnoses, medications with dosages, and vital signs
from this patient assessment document."""
Sample output (from actual model inference on test image):
**Medical Diagnoses:**
1. Acute coronary syndrome, NSTEMI - Troponin I 2.4 ng/mL
2. Hypertension, uncontrolled
3. Type 2 Diabetes Mellitus (HbA1c 8.2%)
4. Chronic Kidney Disease, Stage 3 (eGFR 42 mL/min)
**Medications Ordered:**
- Aspirin 325 mg PO STAT, then 81 mg PO QD
- Heparin 5000 units IV bolus, then 1000 units/hr
- Metoprolol 25 mg PO BID
- Lisinopril 10 mg PO QD
- Metformin 1000 mg PO BID
- Atorvastatin 80 mg PO QHS
**Vital Signs:**
- BP: 158/94 mmHg
- HR: 102 bpm
- RR: 22/min
- Temp: 98.6 F
- SpO2: 93% on RA
- Weight: 198 lbs
Use Case 2: Radiology Report Parsing
Extract findings, measurements, and impressions from radiology reports.
prompt = """List all findings from this radiology report including measurements,
locations, and the radiologist's impression."""
Sample output:
### Findings:
**Lungs:**
- 3.2 x 2.8 cm spiculated mass in the right upper lobe (series 4, image 87)
- Ipsilateral hilar lymphadenopathy measuring 1.8 cm
- Left lung clear. No pleural effusion.
**Mediastinum:**
- Subcarinal lymph node enlarged at 2.1 cm
- No pericardial effusion. Heart size normal.
**Bones:**
- Lytic lesion in T8 vertebral body, 1.2 cm, suspicious for metastatic disease
### Radiologist's Impression:
1. Right upper lobe mass highly suspicious for primary bronchogenic carcinoma. LUNG-RADS 4B.
2. Ipsilateral hilar and subcarinal lymphadenopathy suggesting nodal metastasis (N2 disease).
3. T8 lytic lesion concerning for osseous metastasis.
### Recommendation:
PET/CT and tissue biopsy recommended.
Use Case 3: Prescription Extraction
Extract structured medication information from prescription images.
prompt = """Extract all prescribed medications with their dosages, directions,
quantities, and refill counts from this prescription."""
Sample output:
1. **Lisinopril 20 mg tablets**
- Dosage: 1 tablet by mouth once daily in the morning
- Quantity: #30
- Refills: 5
2. **Amlodipine 5 mg tablets**
- Dosage: 1 tablet by mouth once daily
- Quantity: #30
- Refills: 5
3. **Metformin 500 mg tablets**
- Dosage: 1 tablet by mouth twice daily with meals
- Quantity: #60
- Refills: 3
4. **Atorvastatin 40 mg tablets**
- Dosage: 1 tablet by mouth at bedtime
- Quantity: #30
- Refills: 5
5. **Pantoprazole 40 mg delayed-release tablets**
- Dosage: 1 tablet by mouth once daily 30 minutes before breakfast
- Quantity: #30
- Refills: 2
Use Case 4: Laboratory Results Parsing
Extract lab values with units, reference ranges, and abnormal flags.
prompt = """Extract all lab values with their results, units, reference ranges,
and flags from this laboratory report."""
Sample output (excerpt):
1. **Glucose, Fasting**: 142 mg/dL (Ref: 70-100) - Flag: H (High)
2. **BUN**: 28 mg/dL (Ref: 7-20) - Flag: H (High)
3. **Creatinine**: 1.8 mg/dL (Ref: 0.6-1.2) - Flag: H (High)
4. **eGFR**: 38 mL/min (Ref: >60) - Flag: L (Low)
5. **Sodium**: 139 mEq/L (Ref: 136-145)
6. **Potassium**: 5.4 mEq/L (Ref: 3.5-5.1) - Flag: H (High)
7. **Chloride**: 101 mEq/L (Ref: 98-106)
8. **CO2**: 20 mEq/L (Ref: 23-29) - Flag: L (Low)
9. **Calcium**: 9.2 mg/dL (Ref: 8.5-10.5)
10. **Total Protein**: 6.8 g/dL (Ref: 6.0-8.3)
Use Case 5: Discharge Summary Processing
Extract diagnoses, medications, and follow-up instructions from discharge documents.
prompt = """Extract all diagnoses, medications with dosages, and follow-up
instructions from this discharge summary."""
Sample output:
### Diagnoses:
1. Community-acquired pneumonia (CAP), right lower lobe
2. Acute on chronic systolic heart failure (EF 30%)
3. Atrial fibrillation with RVR
4. COPD exacerbation
### Discharge Medications:
- Metoprolol Succinate: 50 mg PO QD
- Furosemide: 40 mg PO BID
- Potassium Chloride: 20 mEq PO BID
- Apixaban: 5 mg PO BID
- Lisinopril: 5 mg PO QD
- Albuterol MDI: 2 puffs Q4-6H PRN
- Prednisone taper: 40 mg x 5d, 20 mg x 5d, 10 mg x 5d
- Levofloxacin: 750 mg PO QD x 3 more days
### Follow-Up Instructions:
- PCP in 1 week
- Cardiology in 2 weeks
- Repeat CXR in 6 weeks
- Echo in 3 months
Use Case 6: ICU Flowsheet Extraction
Parse complex ICU documentation with ventilator settings, drips, and hemodynamics.
prompt = """Extract all ventilator settings, hemodynamic parameters, drip rates,
lab values, intake/output totals, and diagnoses from this ICU flowsheet."""
Sample output (from actual evaluation -- base model achieved 100% accuracy on this case):
### Ventilator Settings:
- Mode: SIMV/PS | FiO2: 0.45 | PEEP: 8 cmH2O
- Vt: 450 mL | RR set: 14 | PS: 10 cmH2O
### Hemodynamics (06:00):
- HR: 112 bpm (sinus tach) | MAP: 62 mmHg
- CVP: 14 mmHg | ScvO2: 68%
- Norepinephrine: 0.15 mcg/kg/min | Vasopressin: 0.04 units/min
### ABG (05:30):
- pH 7.32, pCO2 48, pO2 72, HCO3 24, BE -2
- P/F Ratio: 160 (moderate ARDS)
### I&O (last 12h):
- IN: 2850 mL | OUT: 1200 mL | NET: +1650 mL
- UOP: 25 mL/hr (oliguric)
### Diagnoses:
- Septic shock 2/2 pneumonia, moderate ARDS
- AKI Stage 2 (Cr 2.8, baseline 0.9)
- DIC (plt 62K, fibrinogen 128, D-dimer >20)
Use Case 7: Medication Reconciliation
Extract complete medication lists with indications and comorbidity mapping.
prompt = """Extract every medication with its exact dosage, frequency, and indication.
Also list all comorbidities and relevant lab values."""
Use Case 8: Surgical Operative Note Parsing
prompt = """Extract the diagnoses, procedure details, operative findings with measurements,
estimated blood loss, medications with dosages, and postoperative orders."""
Use Case 9: Custom Clinical Analysis
prompt = """Identify all drug interactions risks from the medications listed in this document.
Flag any medications that may need dose adjustment based on the lab values shown."""
Use Case 10: Batch Processing for Clinical Research
import requests
from pathlib import Path
API_URL = "http://localhost:8000"
for doc_path in Path("clinical_documents/").glob("*.png"):
with open(doc_path, "rb") as f:
response = requests.post(
f"{API_URL}/extract",
files={"file": (doc_path.name, f, "image/png")}
)
result = response.json()
print(f"{doc_path.name}: {result['inference_latency_seconds']:.1f}s")
print(result["extracted_content"][:500])
Model Details
Architecture
| Property | Value |
|---|---|
| Base model | Qwen2.5-VL-7B-Instruct |
| Base model parameters | 4,882,615,296 |
| Adapter type | LoRA (Low-Rank Adaptation) |
| Quantization | BitsAndBytes NF4 with double quantization |
| Compute dtype | bfloat16 |
| Adapter file size | 727 MB (safetensors) |
| Model type | Vision-Language (image-text-to-text) |
| Architecture | Transformer decoder with ViT vision encoder |
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 64 |
| Alpha | 128 |
| Alpha/Rank ratio | 2.0 |
| Dropout | 0.05 |
| Bias | none |
| Task type | CAUSAL_LM |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Number of target modules | 7 (all attention + MLP linear layers) |
| Trainable parameters | 190,357,504 |
| Total parameters | 4,882,615,296 |
| Trainable percentage | 3.9% |
| Frozen parameters | 4,692,257,792 (96.1%) |
| Vision encoder | Frozen (390 parameters) |
| use_dora | False |
| use_rslora | False |
| PEFT type | LORA |
Quantization Configuration
| Parameter | Value |
|---|---|
| Method | bitsandbytes NF4 (NormalFloat 4-bit) |
| load_in_4bit | True |
| bnb_4bit_quant_type | nf4 |
| bnb_4bit_compute_dtype | bfloat16 |
| bnb_4bit_use_double_quant | True |
| Quantized model parameters | 4,692,257,792 |
| AWQ attempted | Yes (failed: PytorchGELUTanh removed in transformers 4.57) |
Training
Training Data
| Dataset | Source | Raw Samples | Formatted | Used in Training |
|---|---|---|---|---|
| PathVQA | flaviagiammarino/path-vqa | 3,000 | 431 | ~88 (subsampled) |
| MTSamples | rungalileo/medical_transcription_40 | 4,499 | 4,465 | ~912 (subsampled) |
| PubMedVision | FreedomIntelligence/PubMedVision | 2,000 | 0 (filtered) | 0 |
| Total | 9,499 | 4,896 | 1,000 |
| Split | Samples | Percentage |
|---|---|---|
| Train | 3,916 (1,000 used) | 80% |
| Validation | 490 (100 used) | 10% |
| Test | 490 | 10% |
- Split seed: 42
- Subsample seed: 42
- MTSamples: medical transcriptions rendered as text-on-image (PIL) to simulate scanned documents
- PathVQA: pathology visual question-answering pairs
- PubMedVision: excluded (answers < 50 chars after filtering)
Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 2 |
| Per-device batch size | 4 |
| Gradient accumulation steps | 4 |
| Effective batch size | 16 |
| Optimizer steps per epoch | 62 |
| Total optimizer steps | 124 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine with warmup |
| Warmup ratio | 0.05 (6 warmup steps) |
| Weight decay | 0.01 |
| Max gradient norm | 1.0 |
| Max sequence length | 512 tokens |
| Max image dimension | 256px (resized with LANCZOS) |
| Training precision | bfloat16 mixed precision |
| Gradient checkpointing | Enabled (use_reentrant=False) |
| Dataloader workers | 4 |
| Dataloader prefetch factor | 4 |
| Save strategy | End of training only |
| Evaluation strategy | Disabled (for speed) |
| Report to | None |
| Response truncation | 600 characters max |
Training Results
| Metric | Value |
|---|---|
| Total training time | 7,704.32 seconds (128.4 minutes) |
| Samples per second | 0.26 |
| Steps per second | 0.016 |
| Total FLOPs | 4.877 x 10^16 |
| Initial loss (step 5) | 1.754 |
| Final loss (average) | 0.996 |
| Minimum step loss | 0.730 (step 120) |
| Loss reduction | 43.2% |
| Peak training VRAM | ~18,300 MB |
Complete Loss Trajectory
| Step | Loss | Learning Rate | % Progress |
|---|---|---|---|
| 5 | 1.754 | 1.87e-4 | 4.0% |
| 10 | 1.454 | 1.97e-4 | 8.1% |
| 15 | 1.315 | 2.00e-4 | 12.1% |
| 20 | 1.271 | 1.98e-4 | 16.1% |
| 25 | 1.196 | 1.94e-4 | 20.2% |
| 30 | 1.135 | 1.88e-4 | 24.2% |
| 35 | 1.133 | 1.81e-4 | 28.2% |
| 40 | 1.119 | 1.73e-4 | 32.3% |
| 45 | 1.058 | 1.63e-4 | 36.3% |
| 50 | 1.039 | 1.53e-4 | 40.3% |
| 55 | 1.048 | 1.43e-4 | 44.4% |
| 60 | 1.037 | 1.32e-4 | 48.4% |
| 65 | 0.951 | 1.21e-4 | 52.4% |
| 70 | 0.933 | 1.10e-4 | 56.5% |
| 75 | 0.847 | 9.92e-5 | 60.5% |
| 80 | 0.767 | 8.82e-5 | 64.5% |
| 85 | 0.860 | 7.77e-5 | 68.5% |
| 90 | 0.852 | 6.76e-5 | 72.6% |
| 95 | 0.844 | 5.82e-5 | 76.6% |
| 100 | 0.809 | 4.96e-5 | 80.6% |
| 105 | 0.813 | 4.18e-5 | 84.7% |
| 110 | 0.846 | 3.50e-5 | 88.7% |
| 115 | 0.789 | 2.93e-5 | 92.7% |
| 120 | 0.730 | 2.47e-5 | 96.8% |
| Final | 0.996 (avg) | -- | 100% |
Loss Curve Visualization
Loss
1.8 |*
1.7 |
1.6 | .
1.5 | *
1.4 | .
1.3 | *
1.2 | * .
1.1 | * * .
1.0 | * * * . .
0.9 | * . * . .
0.8 | * . * * . *
0.7 | *
+---+---+---+---+---+---+---+---+---+---+---+----> Steps
0 10 20 30 40 50 60 70 80 90 100 110 120
|<-------- Epoch 1 --------->|<-------- Epoch 2 -------->|
Evaluation -- Complete Metrics
Evaluation Methodology
- 5 synthetic clinical document images rendered with monospace fonts on white background
- Each document has ground-truth expected terms (medical terms, drug names, diagnoses) and expected values (dosages, measurements, lab numbers)
- Term Accuracy = (terms found in response) / (total expected terms) x 100
- Value Accuracy = (values found in response) / (total expected values) x 100
- Combined Accuracy = (Term Accuracy + Value Accuracy) / 2
- Matching is case-insensitive substring search
Per-Case Detailed Results
Test Case 1: Medication Reconciliation Form
11 active medications, 8 comorbidities, 3 drug allergies, 6 lab values
| Metric | Base NF4 | Fine-Tuned | Delta |
|---|---|---|---|
| Term Accuracy | 80.0% (16/20) | 80.0% (16/20) | 0.0% |
| Value Accuracy | 68.0% (17/25) | 68.0% (17/25) | 0.0% |
| Combined Accuracy | 74.0% | 74.0% | 0.0% |
| Latency | 24.57s | 23.66s | -0.91s |
Terms missed (both models): CKD, COPD, Tiotropium, anaphylaxis Values missed (both models): 18 mcg, BNP 580, BUN 32, Cr 1.6, EF 35%, INR 2.4, PRN, eGFR 38
Test Case 2: ICU Flowsheet
Ventilator settings, hemodynamics, 6 active drips, ABG values, I&O totals
| Metric | Base NF4 | Fine-Tuned | Delta |
|---|---|---|---|
| Term Accuracy | 100.0% (19/19) | 94.7% (18/19) | -5.3% |
| Value Accuracy | 100.0% (30/30) | 50.0% (15/30) | -50.0% |
| Combined Accuracy | 100.0% | 72.4% | -27.6% |
| Latency | 22.62s | 23.75s | +1.13s |
Fine-tuned missed terms: coagulopathy Fine-tuned missed values: CPOT: 1, CVP: 14, D-dimer >20, FiO2: 0.45, HCO3 24, HR: 112, MAP: 62, PEEP: 8, RASS: -3, ScvO2: 68%, fibrinogen 128, pCO2 48, pH 7.32, pO2 72, plt 62K
Test Case 3: Cardiology Consultation Note
Echo measurements, cardiac catheterization, ECG interpretation, GDMT medication plan
| Metric | Base NF4 | Fine-Tuned | Delta |
|---|---|---|---|
| Term Accuracy | 71.4% (20/28) | 71.4% (20/28) | 0.0% |
| Value Accuracy | 69.2% (18/26) | 69.2% (18/26) | 0.0% |
| Combined Accuracy | 70.3% | 70.3% | 0.0% |
| Latency | 23.78s | 23.88s | +0.10s |
Terms missed (both): CHF, CRT-D, DAPT, Dapagliflozin, Eplerenone, NSR, Rosuvastatin, Ticagrelor Values missed (both): 0.25 cm2, 1.8 L/min/m2, 10 mg, 20 mg, 81 mg, 90 mg, QD, QHS
Test Case 4: Complex Lab Panel
CBC (7 tests), coagulation (5 tests), hepatic function (8 tests) with critical values
| Metric | Base NF4 | Fine-Tuned | Delta |
|---|---|---|---|
| Term Accuracy | 81.8% (18/22) | 59.1% (13/22) | -22.7% |
| Value Accuracy | 84.0% (21/25) | 64.0% (16/25) | -20.0% |
| Combined Accuracy | 82.9% | 61.5% | -21.4% |
| Latency | 23.89s | 23.81s | -0.08s |
Base missed terms: Albumin, Bilirubin, LDH, neutropenic Fine-tuned missed terms: ALP, ALT, AST, Albumin, Bilirubin, GGT, LDH, SGOT, SGPT (9 terms) Fine-tuned missed values: 142, 198, 2.4, 245, 3.6, 312, 4.8, 890, U/L (9 values)
Test Case 5: Surgical Operative Note
Laparoscopic cholecystectomy converted to open, 7 post-op medication orders
| Metric | Base NF4 | Fine-Tuned | Delta |
|---|---|---|---|
| Term Accuracy | 85.7% (18/21) | 85.7% (18/21) | 0.0% |
| Value Accuracy | 94.7% (18/19) | 94.7% (18/19) | 0.0% |
| Combined Accuracy | 90.2% | 90.2% | 0.0% |
| Latency | 17.37s | 16.84s | -0.53s |
Terms missed (both): ETT, FACS, cholelithiasis Values missed (both): Q1H
Aggregate Accuracy Summary
| Metric | Base NF4 | Fine-Tuned | Delta |
|---|---|---|---|
| Avg Term Accuracy | 83.79% | 78.19% | -5.60 pp |
| Avg Value Accuracy | 83.19% | 69.19% | -14.00 pp |
| Avg Combined Accuracy | 83.49% | 73.70% | -9.79 pp |
| Avg Eval Latency | 22.45s | 22.39s | -0.06s |
| Min Eval Latency | 17.37s | 16.84s | -0.53s |
| Max Eval Latency | 24.57s | 23.88s | -0.69s |
| Total Terms Evaluated | 110 | 110 | -- |
| Total Values Evaluated | 125 | 125 | -- |
| Terms Found (total) | 91 | 85 | -6 |
| Values Found (total) | 104 | 84 | -20 |
VRAM Metrics (All Configurations)
| Configuration | Allocated (MB) | Reserved (MB) | Peak Allocated (MB) | vs FP16 |
|---|---|---|---|---|
| FP16 Baseline (post-load) | 15,820.09 | 17,120.00 | 16,860.07 | -- |
| NF4 Base (post-load) | 5,664.12 | 7,018.00 | 6,704.10 | -64.2% |
| NF4 Base (eval peak) | 5,671.03 | 7,508.00 | 7,388.28 | -64.1% |
| NF4+LoRA merged (post-load) | 11,335.83 | 14,806.00 | 12,894.35 | -28.3% |
| NF4+LoRA merged (eval peak) | 11,335.84 | 14,806.00 | 12,894.35 | -28.3% |
| Training peak | ~18,300 | -- | -- | +15.7% |
Latency Metrics (Standard 5-Image Benchmark)
All latencies measured with torch.cuda.synchronize() for accuracy.
| Image | FP16 | NF4 Base | NF4+LoRA | NF4 Speedup | LoRA Speedup |
|---|---|---|---|---|---|
| Patient Diagnosis | 28.41s | 15.40s | 13.24s | 1.84x | 2.15x |
| Radiology Report | 22.58s | 11.76s | 12.01s | 1.92x | 1.88x |
| Prescription | 23.83s | 13.67s | 12.75s | 1.74x | 1.87x |
| Lab Results | 48.74s | 23.25s | 23.46s | 2.10x | 2.08x |
| Discharge Summary | 31.96s | 12.13s | 12.05s | 2.63x | 2.65x |
| Statistic | FP16 | NF4 Base | NF4+LoRA |
|---|---|---|---|
| Average | 31.10s | 15.24s | 14.70s |
| Minimum | 22.58s | 11.76s | 12.01s |
| Maximum | 48.74s | 23.25s | 23.46s |
| Median (P50) | 28.41s | 13.67s | 12.75s |
| Throughput | 0.032 img/s | 0.066 img/s | 0.068 img/s |
Model Load Time
| Configuration | Load Time |
|---|---|
| FP16 | 88.74s |
| NF4 Base | 92.65s |
| NF4+LoRA (merge_and_unload) | 92.73s |
Improvement Summary vs FP16 Baseline
| Metric | NF4 Base | NF4+LoRA |
|---|---|---|
| VRAM reduction | -64.2% | -28.3% |
| Latency reduction | -51.0% | -52.7% |
| Throughput increase | 2.06x | 2.12x |
| Memory: GB needed | 5.7 GB | 11.3 GB |
| Min GPU requirement | 8 GB | 16 GB |
Why Base Model Outperforms Fine-Tuned
The fine-tuned model scores 9.8 percentage points below the base model (73.7% vs 83.5%). Three factors explain this:
1. Training Data Distribution Mismatch
| Training Data | Evaluation Data |
|---|---|
| Narrative medical transcriptions | Structured clinical forms |
| Rendered text paragraphs | Dense tables with abbreviations |
| "SUBJECTIVE: Patient presents with..." | "HR: 112 |
| Long-form medical reports | ICU flowsheets, lab panels |
| Free text descriptions | Medication reconciliation grids |
2. Strong Zero-Shot Baseline
Qwen2.5-VL-7B-Instruct scores 83.5% on medical extraction without any fine-tuning. Notable:
- 100% accuracy on ICU flowsheet (all 19 terms + all 30 values extracted correctly)
- 90.2% accuracy on surgical operative notes
- Only struggles with abbreviations not explicitly written out (CRT-D, DAPT, NSR)
3. Partial Catastrophic Forgetting
Fine-tuning on narrative text degraded performance on structured/tabular formats:
- ICU Flowsheet: 100% -> 72.4% (-27.6%)
- Complex Lab Panel: 82.9% -> 61.5% (-21.4%)
- The model lost ability to reliably extract dense numerical values from tabular layouts
Path to Improvement
To surpass the base model, fine-tune with:
- Actual scanned clinical forms (ICU flowsheets, MAR sheets)
- Structured lab reports with tabular layouts
- Medication reconciliation forms from EHR systems
- Cardiology/radiology reports with dense measurements
- At least 5,000+ domain-matched samples
API Deployment
A production-ready REST API is included in the GitHub repository.
Endpoints
| Endpoint | Method | Description | Avg Latency |
|---|---|---|---|
/health |
GET | GPU status, VRAM usage, model readiness | <10ms |
/extract |
POST | Extract medical info with default prompt | 27.69s |
/analyze |
POST | Custom prompt analysis | 11.09s |
/benchmark |
GET | Return all benchmark JSON data | <10ms |
API Test Results
| Test | Status | Details |
|---|---|---|
| GET /health | PASS | GPU: NVIDIA GB10, VRAM: 5,673 MB |
| POST /extract | PASS | 7/7 medical terms found, 27.69s latency, 1,352 char response |
| POST /analyze | PASS | 5/5 medication terms found, 11.09s latency, custom prompt reflected |
| GET /benchmark | PASS | 8 benchmark files returned |
| POST /extract (bad input) | PASS | Correctly returned HTTP 400 |
| Total | 5/5 PASS | 0 failures |
Docker Deployment
docker build -t medical-vision-pipeline .
docker run --gpus all -p 8000:8000 -p 7860:7860 medical-vision-pipeline
Quick API Test
# Start server
python -m uvicorn api.server:app --host 0.0.0.0 --port 8000
# Test extraction
curl -X POST http://localhost:8000/extract \
-F "file=@medical_document.png" | python -m json.tool
Intended Use
Direct Use Cases
| Use Case | Description | Example Prompt |
|---|---|---|
| Medical Records Digitization | Convert paper/scanned clinical docs to structured data | "Extract all medical information from this clinical document" |
| ED Triage Support | Rapid extraction from referral letters | "Extract diagnoses, medications, and allergies" |
| Prescription Processing | Parse medication orders | "List all medications with dosages, quantities, and refills" |
| Lab Report Parsing | Extract values with flags | "Extract all lab values with results, units, and reference ranges" |
| Discharge Planning | Process discharge summaries | "Extract diagnoses, discharge medications, and follow-up instructions" |
| ICU Documentation | Parse complex flowsheets | "Extract ventilator settings, drip rates, and hemodynamic parameters" |
| Surgical Note Processing | Parse operative reports | "Extract procedure details, findings, EBL, and post-op orders" |
| Insurance Pre-Authorization | Extract diagnosis and treatment details | "List all diagnoses and procedures with their codes" |
| Clinical Research Screening | Screen records for study eligibility | "Identify all cardiovascular diagnoses and medications in this record" |
| Medical Education | Teaching aid for clinical document reading | "Explain the key findings in this radiology report" |
Out-of-Scope Use
- Not for clinical decision-making: Outputs require human review by qualified clinicians
- Not a diagnostic tool: Cannot replace physician judgment
- Not for patient-facing applications without appropriate clinical oversight
- Not validated for handwritten documents: Trained on rendered/printed text only
- Not validated for non-English documents: Training data is English only
- Not for real-time critical care monitoring: Latency is 15-25 seconds per image
Bias, Risks, and Limitations
- Trained on English-only medical documents; other languages are unsupported
- Training data biased toward US medical terminology and abbreviations (BID, QHS, PRN, etc.)
- May hallucinate medical terms or values not present in the source document
- Accuracy varies significantly by document type (100% on ICU flowsheets vs 70.3% on cardiology notes)
- Validated only on synthetic test images (text rendered on white background), not real clinical scans
- Not validated against real-world clinical documents with noise, handwriting, stamps, or poor scan quality
- Should never replace human review in any clinical workflow
- Model may generate medically plausible but incorrect information
- No validation for medication interaction detection or clinical decision support
Compute Infrastructure
Hardware
| Component | Details |
|---|---|
| Platform | NVIDIA DGX Spark |
| GPU | NVIDIA GB10 (Blackwell architecture, sm_121) |
| GPU Count | 1 |
| Compute Capability | 12.1 |
| Total VRAM | 119.7 GB (unified memory) |
| CPU Architecture | aarch64 (ARM) |
| OS | Linux 6.14.0-1015-nvidia |
Training Compute
| Metric | Value |
|---|---|
| Training time | 128.4 minutes (2h 8m) |
| GPU utilization | ~96% during training |
| Peak training VRAM | ~18.3 GB |
| Total FLOPs | 4.877 x 10^16 |
| Samples per second | 0.26 |
| Steps per second | 0.016 |
| Seconds per optimizer step | ~62s |
Inference Compute
| Metric | NF4 Base | NF4+LoRA |
|---|---|---|
| VRAM required | 5,664 MB | 11,336 MB |
| Avg latency | 15.24s | 14.70s |
| Throughput | 0.066 img/s | 0.068 img/s |
| Min GPU VRAM | 8 GB | 16 GB |
Model Download
| Metric | Value |
|---|---|
| Base model size | 15.46 GB |
| Base model download time | 390.48s |
| Adapter size | 727 MB |
| Total model files | 5 safetensors shards + adapter |
Framework Versions
| Package | Version |
|---|---|
| PEFT | 0.18.0 |
| Transformers | 4.57.6 |
| PyTorch | 2.11.0.dev20260206+cu128 |
| BitsAndBytes | 0.49.1 |
| Accelerate | 1.12.0 |
| Datasets | 4.4.2 |
| Pillow | 12.1.0 |
| qwen-vl-utils | latest |
| HuggingFace Hub | 0.36.2 |
| Python | 3.10.19 |
| CUDA | 12.8 |
Citation
If you use this model or pipeline, please cite:
@misc{qwen25vl-medical-lora-2026,
title={Qwen2.5-VL-7B-Medical-LoRA: QLoRA Fine-tuning for Medical Document Extraction},
author={Sarathi Balakrishnan},
year={2026},
url={https://huggingface.co/sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA},
note={QLoRA adapter (r=64) for Qwen2.5-VL-7B-Instruct, NF4 quantized, trained on PathVQA + MTSamples}
}
Base model citation:
@article{Qwen2.5-VL,
title={Qwen2.5-VL},
author={Qwen Team},
year={2025},
url={https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct}
}
- Downloads last month
- 14
Model tree for sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA
Base model
Qwen/Qwen2.5-VL-7B-InstructDatasets used to train sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA
Evaluation results
- Combined Accuracy (Base NF4)self-reported83.490
- Combined Accuracy (Fine-Tuned)self-reported73.700
- Term Accuracy (Base NF4)self-reported83.790
- Value Accuracy (Base NF4)self-reported83.190
- Term Accuracy (Fine-Tuned)self-reported78.190
- Value Accuracy (Fine-Tuned)self-reported69.190
- Avg Inference Latency NF4 (seconds)self-reported15.240
- Avg Inference Latency NF4+LoRA (seconds)self-reported14.700
- Avg Inference Latency FP16 (seconds)self-reported31.100