Qwen2.5-VL-7B-Medical-LoRA

A QLoRA fine-tuned adapter for Qwen2.5-VL-7B-Instruct optimized for medical document understanding and clinical information extraction from document images.

This adapter was trained on 1,000 medical document samples (PathVQA + MTSamples) using 4-bit NF4 quantization with LoRA rank 64, targeting all attention and MLP layers. It extracts diagnoses, medications with dosages, lab values with reference ranges, vital signs, procedures, and clinical abbreviations from scanned/photographed clinical documents.

GitHub Repository: sarathi-aiml/quantization-pipe

Key Results at a Glance

Configuration	VRAM	Avg Latency	Min Latency	Max Latency	Medical Accuracy	vs FP16 VRAM	vs FP16 Speed
FP16 Baseline	15,820 MB	31.10s	22.58s	48.74s	N/A	--	--
NF4 Quantized (recommended)	5,664 MB	15.24s	11.76s	23.25s	83.5%	-64.2%	2.04x faster
NF4 + LoRA (this adapter)	11,336 MB	14.70s	12.01s	23.46s	73.7%	-28.3%	2.12x faster

Recommendation: For production use, deploy the base model with NF4 quantization (no LoRA). It achieves 83.5% accuracy, requires only 5.7 GB VRAM, and runs 2x faster than FP16. See Analysis for details.

Quick Start

Load with LoRA Adapter (NF4 Quantized)

import torch
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration, BitsAndBytesConfig
from peft import PeftModel
from qwen_vl_utils import process_vision_info
from PIL import Image

# NF4 quantization config (same used during training)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load base model with NF4 quantization
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
)

# Load and merge LoRA adapter
model = PeftModel.from_pretrained(model, "sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA")
model = model.merge_and_unload()
model.eval()

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")

# Inference
image = Image.open("medical_document.png")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Extract all medical diagnoses, medications with dosages, and vital signs from this document."},
        ],
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text], images=image_inputs, videos=video_inputs,
    padding=True, return_tensors="pt",
).to(model.device)

with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=1024)

generated_ids = output_ids[0][inputs["input_ids"].shape[1]:]
response = processor.decode(generated_ids, skip_special_tokens=True)
print(response)

Use Base Model Only (Higher Accuracy, Recommended for Production)

# Same as above but skip the PeftModel step:
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
)
model.eval()
# ... rest of inference code identical

Sample Use Cases with Example Prompts

Use Case 1: Emergency Department Triage

Upload a referral letter or patient assessment and extract critical information in seconds.

prompt = """Extract all medical diagnoses, medications with dosages, and vital signs
from this patient assessment document."""

Sample output (from actual model inference on test image):

**Medical Diagnoses:**
1. Acute coronary syndrome, NSTEMI - Troponin I 2.4 ng/mL
2. Hypertension, uncontrolled
3. Type 2 Diabetes Mellitus (HbA1c 8.2%)
4. Chronic Kidney Disease, Stage 3 (eGFR 42 mL/min)

**Medications Ordered:**
- Aspirin 325 mg PO STAT, then 81 mg PO QD
- Heparin 5000 units IV bolus, then 1000 units/hr
- Metoprolol 25 mg PO BID
- Lisinopril 10 mg PO QD
- Metformin 1000 mg PO BID
- Atorvastatin 80 mg PO QHS

**Vital Signs:**
- BP: 158/94 mmHg
- HR: 102 bpm
- RR: 22/min
- Temp: 98.6 F
- SpO2: 93% on RA
- Weight: 198 lbs

Use Case 2: Radiology Report Parsing

Extract findings, measurements, and impressions from radiology reports.

prompt = """List all findings from this radiology report including measurements,
locations, and the radiologist's impression."""

Sample output:

### Findings:
**Lungs:**
- 3.2 x 2.8 cm spiculated mass in the right upper lobe (series 4, image 87)
- Ipsilateral hilar lymphadenopathy measuring 1.8 cm
- Left lung clear. No pleural effusion.

**Mediastinum:**
- Subcarinal lymph node enlarged at 2.1 cm
- No pericardial effusion. Heart size normal.

**Bones:**
- Lytic lesion in T8 vertebral body, 1.2 cm, suspicious for metastatic disease

### Radiologist's Impression:
1. Right upper lobe mass highly suspicious for primary bronchogenic carcinoma. LUNG-RADS 4B.
2. Ipsilateral hilar and subcarinal lymphadenopathy suggesting nodal metastasis (N2 disease).
3. T8 lytic lesion concerning for osseous metastasis.

### Recommendation:
PET/CT and tissue biopsy recommended.

Use Case 3: Prescription Extraction

Extract structured medication information from prescription images.

prompt = """Extract all prescribed medications with their dosages, directions,
quantities, and refill counts from this prescription."""

Sample output:

1. **Lisinopril 20 mg tablets**
   - Dosage: 1 tablet by mouth once daily in the morning
   - Quantity: #30
   - Refills: 5

2. **Amlodipine 5 mg tablets**
   - Dosage: 1 tablet by mouth once daily
   - Quantity: #30
   - Refills: 5

3. **Metformin 500 mg tablets**
   - Dosage: 1 tablet by mouth twice daily with meals
   - Quantity: #60
   - Refills: 3

4. **Atorvastatin 40 mg tablets**
   - Dosage: 1 tablet by mouth at bedtime
   - Quantity: #30
   - Refills: 5

5. **Pantoprazole 40 mg delayed-release tablets**
   - Dosage: 1 tablet by mouth once daily 30 minutes before breakfast
   - Quantity: #30
   - Refills: 2

Use Case 4: Laboratory Results Parsing

Extract lab values with units, reference ranges, and abnormal flags.

prompt = """Extract all lab values with their results, units, reference ranges,
and flags from this laboratory report."""

Sample output (excerpt):

1. **Glucose, Fasting**: 142 mg/dL (Ref: 70-100) - Flag: H (High)
2. **BUN**: 28 mg/dL (Ref: 7-20) - Flag: H (High)
3. **Creatinine**: 1.8 mg/dL (Ref: 0.6-1.2) - Flag: H (High)
4. **eGFR**: 38 mL/min (Ref: >60) - Flag: L (Low)
5. **Sodium**: 139 mEq/L (Ref: 136-145)
6. **Potassium**: 5.4 mEq/L (Ref: 3.5-5.1) - Flag: H (High)
7. **Chloride**: 101 mEq/L (Ref: 98-106)
8. **CO2**: 20 mEq/L (Ref: 23-29) - Flag: L (Low)
9. **Calcium**: 9.2 mg/dL (Ref: 8.5-10.5)
10. **Total Protein**: 6.8 g/dL (Ref: 6.0-8.3)

Use Case 5: Discharge Summary Processing

Extract diagnoses, medications, and follow-up instructions from discharge documents.

prompt = """Extract all diagnoses, medications with dosages, and follow-up
instructions from this discharge summary."""

Sample output:

### Diagnoses:
1. Community-acquired pneumonia (CAP), right lower lobe
2. Acute on chronic systolic heart failure (EF 30%)
3. Atrial fibrillation with RVR
4. COPD exacerbation

### Discharge Medications:
- Metoprolol Succinate: 50 mg PO QD
- Furosemide: 40 mg PO BID
- Potassium Chloride: 20 mEq PO BID
- Apixaban: 5 mg PO BID
- Lisinopril: 5 mg PO QD
- Albuterol MDI: 2 puffs Q4-6H PRN
- Prednisone taper: 40 mg x 5d, 20 mg x 5d, 10 mg x 5d
- Levofloxacin: 750 mg PO QD x 3 more days

### Follow-Up Instructions:
- PCP in 1 week
- Cardiology in 2 weeks
- Repeat CXR in 6 weeks
- Echo in 3 months

Use Case 6: ICU Flowsheet Extraction

Parse complex ICU documentation with ventilator settings, drips, and hemodynamics.

prompt = """Extract all ventilator settings, hemodynamic parameters, drip rates,
lab values, intake/output totals, and diagnoses from this ICU flowsheet."""

Sample output (from actual evaluation -- base model achieved 100% accuracy on this case):

### Ventilator Settings:
- Mode: SIMV/PS | FiO2: 0.45 | PEEP: 8 cmH2O
- Vt: 450 mL | RR set: 14 | PS: 10 cmH2O

### Hemodynamics (06:00):
- HR: 112 bpm (sinus tach) | MAP: 62 mmHg
- CVP: 14 mmHg | ScvO2: 68%
- Norepinephrine: 0.15 mcg/kg/min | Vasopressin: 0.04 units/min

### ABG (05:30):
- pH 7.32, pCO2 48, pO2 72, HCO3 24, BE -2
- P/F Ratio: 160 (moderate ARDS)

### I&O (last 12h):
- IN: 2850 mL | OUT: 1200 mL | NET: +1650 mL
- UOP: 25 mL/hr (oliguric)

### Diagnoses:
- Septic shock 2/2 pneumonia, moderate ARDS
- AKI Stage 2 (Cr 2.8, baseline 0.9)
- DIC (plt 62K, fibrinogen 128, D-dimer >20)

Use Case 7: Medication Reconciliation

Extract complete medication lists with indications and comorbidity mapping.

prompt = """Extract every medication with its exact dosage, frequency, and indication.
Also list all comorbidities and relevant lab values."""

Use Case 8: Surgical Operative Note Parsing

prompt = """Extract the diagnoses, procedure details, operative findings with measurements,
estimated blood loss, medications with dosages, and postoperative orders."""

Use Case 9: Custom Clinical Analysis

prompt = """Identify all drug interactions risks from the medications listed in this document.
Flag any medications that may need dose adjustment based on the lab values shown."""

Use Case 10: Batch Processing for Clinical Research

import requests
from pathlib import Path

API_URL = "http://localhost:8000"

for doc_path in Path("clinical_documents/").glob("*.png"):
    with open(doc_path, "rb") as f:
        response = requests.post(
            f"{API_URL}/extract",
            files={"file": (doc_path.name, f, "image/png")}
        )
    result = response.json()
    print(f"{doc_path.name}: {result['inference_latency_seconds']:.1f}s")
    print(result["extracted_content"][:500])

Model Details

Architecture

Property	Value
Base model	Qwen2.5-VL-7B-Instruct
Base model parameters	4,882,615,296
Adapter type	LoRA (Low-Rank Adaptation)
Quantization	BitsAndBytes NF4 with double quantization
Compute dtype	bfloat16
Adapter file size	727 MB (safetensors)
Model type	Vision-Language (image-text-to-text)
Architecture	Transformer decoder with ViT vision encoder

LoRA Configuration

Parameter	Value
Rank (r)	64
Alpha	128
Alpha/Rank ratio	2.0
Dropout	0.05
Bias	none
Task type	CAUSAL_LM
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Number of target modules	7 (all attention + MLP linear layers)
Trainable parameters	190,357,504
Total parameters	4,882,615,296
Trainable percentage	3.9%
Frozen parameters	4,692,257,792 (96.1%)
Vision encoder	Frozen (390 parameters)
use_dora	False
use_rslora	False
PEFT type	LORA

Quantization Configuration

Parameter	Value
Method	bitsandbytes NF4 (NormalFloat 4-bit)
load_in_4bit	True
bnb_4bit_quant_type	nf4
bnb_4bit_compute_dtype	bfloat16
bnb_4bit_use_double_quant	True
Quantized model parameters	4,692,257,792
AWQ attempted	Yes (failed: PytorchGELUTanh removed in transformers 4.57)

Training

Training Data

Dataset	Source	Raw Samples	Formatted	Used in Training
PathVQA	flaviagiammarino/path-vqa	3,000	431	~88 (subsampled)
MTSamples	rungalileo/medical_transcription_40	4,499	4,465	~912 (subsampled)
PubMedVision	FreedomIntelligence/PubMedVision	2,000	0 (filtered)	0
Total		9,499	4,896	1,000

Split	Samples	Percentage
Train	3,916 (1,000 used)	80%
Validation	490 (100 used)	10%
Test	490	10%

Split seed: 42
Subsample seed: 42
MTSamples: medical transcriptions rendered as text-on-image (PIL) to simulate scanned documents
PathVQA: pathology visual question-answering pairs
PubMedVision: excluded (answers < 50 chars after filtering)

Training Hyperparameters

Parameter	Value
Epochs	2
Per-device batch size	4
Gradient accumulation steps	4
Effective batch size	16
Optimizer steps per epoch	62
Total optimizer steps	124
Learning rate	2e-4
LR scheduler	Cosine with warmup
Warmup ratio	0.05 (6 warmup steps)
Weight decay	0.01
Max gradient norm	1.0
Max sequence length	512 tokens
Max image dimension	256px (resized with LANCZOS)
Training precision	bfloat16 mixed precision
Gradient checkpointing	Enabled (use_reentrant=False)
Dataloader workers	4
Dataloader prefetch factor	4
Save strategy	End of training only
Evaluation strategy	Disabled (for speed)
Report to	None
Response truncation	600 characters max

Training Results

Metric	Value
Total training time	7,704.32 seconds (128.4 minutes)
Samples per second	0.26
Steps per second	0.016
Total FLOPs	4.877 x 10^16
Initial loss (step 5)	1.754
Final loss (average)	0.996
Minimum step loss	0.730 (step 120)
Loss reduction	43.2%
Peak training VRAM	~18,300 MB

Complete Loss Trajectory

Step	Loss	Learning Rate	% Progress
5	1.754	1.87e-4	4.0%
10	1.454	1.97e-4	8.1%
15	1.315	2.00e-4	12.1%
20	1.271	1.98e-4	16.1%
25	1.196	1.94e-4	20.2%
30	1.135	1.88e-4	24.2%
35	1.133	1.81e-4	28.2%
40	1.119	1.73e-4	32.3%
45	1.058	1.63e-4	36.3%
50	1.039	1.53e-4	40.3%
55	1.048	1.43e-4	44.4%
60	1.037	1.32e-4	48.4%
65	0.951	1.21e-4	52.4%
70	0.933	1.10e-4	56.5%
75	0.847	9.92e-5	60.5%
80	0.767	8.82e-5	64.5%
85	0.860	7.77e-5	68.5%
90	0.852	6.76e-5	72.6%
95	0.844	5.82e-5	76.6%
100	0.809	4.96e-5	80.6%
105	0.813	4.18e-5	84.7%
110	0.846	3.50e-5	88.7%
115	0.789	2.93e-5	92.7%
120	0.730	2.47e-5	96.8%
Final	0.996 (avg)	--	100%

Loss Curve Visualization

Loss
1.8 |*
1.7 |
1.6 | .
1.5 | *
1.4 |   .
1.3 |   *
1.2 |     * .
1.1 |       * * .
1.0 |           * * * .   .
0.9 |                 * .   * .   .
0.8 |                   * .   * * . *
0.7 |                             *
    +---+---+---+---+---+---+---+---+---+---+---+----> Steps
    0   10  20  30  40  50  60  70  80  90 100 110 120
    |<-------- Epoch 1 --------->|<-------- Epoch 2 -------->|

Evaluation -- Complete Metrics

Evaluation Methodology

5 synthetic clinical document images rendered with monospace fonts on white background
Each document has ground-truth expected terms (medical terms, drug names, diagnoses) and expected values (dosages, measurements, lab numbers)
Term Accuracy = (terms found in response) / (total expected terms) x 100
Value Accuracy = (values found in response) / (total expected values) x 100
Combined Accuracy = (Term Accuracy + Value Accuracy) / 2
Matching is case-insensitive substring search

Per-Case Detailed Results

Test Case 1: Medication Reconciliation Form

11 active medications, 8 comorbidities, 3 drug allergies, 6 lab values

Metric	Base NF4	Fine-Tuned	Delta
Term Accuracy	80.0% (16/20)	80.0% (16/20)	0.0%
Value Accuracy	68.0% (17/25)	68.0% (17/25)	0.0%
Combined Accuracy	74.0%	74.0%	0.0%
Latency	24.57s	23.66s	-0.91s

Terms missed (both models): CKD, COPD, Tiotropium, anaphylaxis Values missed (both models): 18 mcg, BNP 580, BUN 32, Cr 1.6, EF 35%, INR 2.4, PRN, eGFR 38

Test Case 2: ICU Flowsheet

Ventilator settings, hemodynamics, 6 active drips, ABG values, I&O totals

Metric	Base NF4	Fine-Tuned	Delta
Term Accuracy	100.0% (19/19)	94.7% (18/19)	-5.3%
Value Accuracy	100.0% (30/30)	50.0% (15/30)	-50.0%
Combined Accuracy	100.0%	72.4%	-27.6%
Latency	22.62s	23.75s	+1.13s

Fine-tuned missed terms: coagulopathy Fine-tuned missed values: CPOT: 1, CVP: 14, D-dimer >20, FiO2: 0.45, HCO3 24, HR: 112, MAP: 62, PEEP: 8, RASS: -3, ScvO2: 68%, fibrinogen 128, pCO2 48, pH 7.32, pO2 72, plt 62K

Test Case 3: Cardiology Consultation Note

Echo measurements, cardiac catheterization, ECG interpretation, GDMT medication plan

Metric	Base NF4	Fine-Tuned	Delta
Term Accuracy	71.4% (20/28)	71.4% (20/28)	0.0%
Value Accuracy	69.2% (18/26)	69.2% (18/26)	0.0%
Combined Accuracy	70.3%	70.3%	0.0%
Latency	23.78s	23.88s	+0.10s

Terms missed (both): CHF, CRT-D, DAPT, Dapagliflozin, Eplerenone, NSR, Rosuvastatin, Ticagrelor Values missed (both): 0.25 cm2, 1.8 L/min/m2, 10 mg, 20 mg, 81 mg, 90 mg, QD, QHS

Test Case 4: Complex Lab Panel

CBC (7 tests), coagulation (5 tests), hepatic function (8 tests) with critical values

Metric	Base NF4	Fine-Tuned	Delta
Term Accuracy	81.8% (18/22)	59.1% (13/22)	-22.7%
Value Accuracy	84.0% (21/25)	64.0% (16/25)	-20.0%
Combined Accuracy	82.9%	61.5%	-21.4%
Latency	23.89s	23.81s	-0.08s

Base missed terms: Albumin, Bilirubin, LDH, neutropenic Fine-tuned missed terms: ALP, ALT, AST, Albumin, Bilirubin, GGT, LDH, SGOT, SGPT (9 terms) Fine-tuned missed values: 142, 198, 2.4, 245, 3.6, 312, 4.8, 890, U/L (9 values)

Test Case 5: Surgical Operative Note

Laparoscopic cholecystectomy converted to open, 7 post-op medication orders

Metric	Base NF4	Fine-Tuned	Delta
Term Accuracy	85.7% (18/21)	85.7% (18/21)	0.0%
Value Accuracy	94.7% (18/19)	94.7% (18/19)	0.0%
Combined Accuracy	90.2%	90.2%	0.0%
Latency	17.37s	16.84s	-0.53s

Terms missed (both): ETT, FACS, cholelithiasis Values missed (both): Q1H

Aggregate Accuracy Summary

Metric	Base NF4	Fine-Tuned	Delta
Avg Term Accuracy	83.79%	78.19%	-5.60 pp
Avg Value Accuracy	83.19%	69.19%	-14.00 pp
Avg Combined Accuracy	83.49%	73.70%	-9.79 pp
Avg Eval Latency	22.45s	22.39s	-0.06s
Min Eval Latency	17.37s	16.84s	-0.53s
Max Eval Latency	24.57s	23.88s	-0.69s
Total Terms Evaluated	110	110	--
Total Values Evaluated	125	125	--
Terms Found (total)	91	85	-6
Values Found (total)	104	84	-20

VRAM Metrics (All Configurations)

Configuration	Allocated (MB)	Reserved (MB)	Peak Allocated (MB)	vs FP16
FP16 Baseline (post-load)	15,820.09	17,120.00	16,860.07	--
NF4 Base (post-load)	5,664.12	7,018.00	6,704.10	-64.2%
NF4 Base (eval peak)	5,671.03	7,508.00	7,388.28	-64.1%
NF4+LoRA merged (post-load)	11,335.83	14,806.00	12,894.35	-28.3%
NF4+LoRA merged (eval peak)	11,335.84	14,806.00	12,894.35	-28.3%
Training peak	~18,300	--	--	+15.7%

Latency Metrics (Standard 5-Image Benchmark)

All latencies measured with torch.cuda.synchronize() for accuracy.

Image	FP16	NF4 Base	NF4+LoRA	NF4 Speedup	LoRA Speedup
Patient Diagnosis	28.41s	15.40s	13.24s	1.84x	2.15x
Radiology Report	22.58s	11.76s	12.01s	1.92x	1.88x
Prescription	23.83s	13.67s	12.75s	1.74x	1.87x
Lab Results	48.74s	23.25s	23.46s	2.10x	2.08x
Discharge Summary	31.96s	12.13s	12.05s	2.63x	2.65x

Statistic	FP16	NF4 Base	NF4+LoRA
Average	31.10s	15.24s	14.70s
Minimum	22.58s	11.76s	12.01s
Maximum	48.74s	23.25s	23.46s
Median (P50)	28.41s	13.67s	12.75s
Throughput	0.032 img/s	0.066 img/s	0.068 img/s

Model Load Time

Configuration	Load Time
FP16	88.74s
NF4 Base	92.65s
NF4+LoRA (merge_and_unload)	92.73s

Improvement Summary vs FP16 Baseline

Metric	NF4 Base	NF4+LoRA
VRAM reduction	-64.2%	-28.3%
Latency reduction	-51.0%	-52.7%
Throughput increase	2.06x	2.12x
Memory: GB needed	5.7 GB	11.3 GB
Min GPU requirement	8 GB	16 GB

Why Base Model Outperforms Fine-Tuned

The fine-tuned model scores 9.8 percentage points below the base model (73.7% vs 83.5%). Three factors explain this:

1. Training Data Distribution Mismatch

Training Data	Evaluation Data
Narrative medical transcriptions	Structured clinical forms
Rendered text paragraphs	Dense tables with abbreviations
"SUBJECTIVE: Patient presents with..."	"HR: 112
Long-form medical reports	ICU flowsheets, lab panels
Free text descriptions	Medication reconciliation grids

2. Strong Zero-Shot Baseline

Qwen2.5-VL-7B-Instruct scores 83.5% on medical extraction without any fine-tuning. Notable:

100% accuracy on ICU flowsheet (all 19 terms + all 30 values extracted correctly)
90.2% accuracy on surgical operative notes
Only struggles with abbreviations not explicitly written out (CRT-D, DAPT, NSR)

3. Partial Catastrophic Forgetting

Fine-tuning on narrative text degraded performance on structured/tabular formats:

ICU Flowsheet: 100% -> 72.4% (-27.6%)
Complex Lab Panel: 82.9% -> 61.5% (-21.4%)
The model lost ability to reliably extract dense numerical values from tabular layouts

Path to Improvement

To surpass the base model, fine-tune with:

Actual scanned clinical forms (ICU flowsheets, MAR sheets)
Structured lab reports with tabular layouts
Medication reconciliation forms from EHR systems
Cardiology/radiology reports with dense measurements
At least 5,000+ domain-matched samples

API Deployment

A production-ready REST API is included in the GitHub repository.

Endpoints

Endpoint	Method	Description	Avg Latency
`/health`	GET	GPU status, VRAM usage, model readiness	<10ms
`/extract`	POST	Extract medical info with default prompt	27.69s
`/analyze`	POST	Custom prompt analysis	11.09s
`/benchmark`	GET	Return all benchmark JSON data	<10ms

API Test Results

Test	Status	Details
GET /health	PASS	GPU: NVIDIA GB10, VRAM: 5,673 MB
POST /extract	PASS	7/7 medical terms found, 27.69s latency, 1,352 char response
POST /analyze	PASS	5/5 medication terms found, 11.09s latency, custom prompt reflected
GET /benchmark	PASS	8 benchmark files returned
POST /extract (bad input)	PASS	Correctly returned HTTP 400
Total	5/5 PASS	0 failures

Docker Deployment

docker build -t medical-vision-pipeline .
docker run --gpus all -p 8000:8000 -p 7860:7860 medical-vision-pipeline

Quick API Test

# Start server
python -m uvicorn api.server:app --host 0.0.0.0 --port 8000

# Test extraction
curl -X POST http://localhost:8000/extract \
  -F "file=@medical_document.png" | python -m json.tool

Intended Use

Direct Use Cases

Use Case	Description	Example Prompt
Medical Records Digitization	Convert paper/scanned clinical docs to structured data	"Extract all medical information from this clinical document"
ED Triage Support	Rapid extraction from referral letters	"Extract diagnoses, medications, and allergies"
Prescription Processing	Parse medication orders	"List all medications with dosages, quantities, and refills"
Lab Report Parsing	Extract values with flags	"Extract all lab values with results, units, and reference ranges"
Discharge Planning	Process discharge summaries	"Extract diagnoses, discharge medications, and follow-up instructions"
ICU Documentation	Parse complex flowsheets	"Extract ventilator settings, drip rates, and hemodynamic parameters"
Surgical Note Processing	Parse operative reports	"Extract procedure details, findings, EBL, and post-op orders"
Insurance Pre-Authorization	Extract diagnosis and treatment details	"List all diagnoses and procedures with their codes"
Clinical Research Screening	Screen records for study eligibility	"Identify all cardiovascular diagnoses and medications in this record"
Medical Education	Teaching aid for clinical document reading	"Explain the key findings in this radiology report"

Out-of-Scope Use

Not for clinical decision-making: Outputs require human review by qualified clinicians
Not a diagnostic tool: Cannot replace physician judgment
Not for patient-facing applications without appropriate clinical oversight
Not validated for handwritten documents: Trained on rendered/printed text only
Not validated for non-English documents: Training data is English only
Not for real-time critical care monitoring: Latency is 15-25 seconds per image

Bias, Risks, and Limitations

Trained on English-only medical documents; other languages are unsupported
Training data biased toward US medical terminology and abbreviations (BID, QHS, PRN, etc.)
May hallucinate medical terms or values not present in the source document
Accuracy varies significantly by document type (100% on ICU flowsheets vs 70.3% on cardiology notes)
Validated only on synthetic test images (text rendered on white background), not real clinical scans
Not validated against real-world clinical documents with noise, handwriting, stamps, or poor scan quality
Should never replace human review in any clinical workflow
Model may generate medically plausible but incorrect information
No validation for medication interaction detection or clinical decision support

Compute Infrastructure

Hardware

Component	Details
Platform	NVIDIA DGX Spark
GPU	NVIDIA GB10 (Blackwell architecture, sm_121)
GPU Count	1
Compute Capability	12.1
Total VRAM	119.7 GB (unified memory)
CPU Architecture	aarch64 (ARM)
OS	Linux 6.14.0-1015-nvidia

Training Compute

Metric	Value
Training time	128.4 minutes (2h 8m)
GPU utilization	~96% during training
Peak training VRAM	~18.3 GB
Total FLOPs	4.877 x 10^16
Samples per second	0.26
Steps per second	0.016
Seconds per optimizer step	~62s

Inference Compute

Metric	NF4 Base	NF4+LoRA
VRAM required	5,664 MB	11,336 MB
Avg latency	15.24s	14.70s
Throughput	0.066 img/s	0.068 img/s
Min GPU VRAM	8 GB	16 GB

Model Download

Metric	Value
Base model size	15.46 GB
Base model download time	390.48s
Adapter size	727 MB
Total model files	5 safetensors shards + adapter

Framework Versions

Package	Version
PEFT	0.18.0
Transformers	4.57.6
PyTorch	2.11.0.dev20260206+cu128
BitsAndBytes	0.49.1
Accelerate	1.12.0
Datasets	4.4.2
Pillow	12.1.0
qwen-vl-utils	latest
HuggingFace Hub	0.36.2
Python	3.10.19
CUDA	12.8

Citation

If you use this model or pipeline, please cite:

@misc{qwen25vl-medical-lora-2026,
  title={Qwen2.5-VL-7B-Medical-LoRA: QLoRA Fine-tuning for Medical Document Extraction},
  author={Sarathi Balakrishnan},
  year={2026},
  url={https://huggingface.co/sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA},
  note={QLoRA adapter (r=64) for Qwen2.5-VL-7B-Instruct, NF4 quantized, trained on PathVQA + MTSamples}
}

Base model citation:

@article{Qwen2.5-VL,
  title={Qwen2.5-VL},
  author={Qwen Team},
  year={2025},
  url={https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct}
}

Downloads last month: 14

Model tree for sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Adapter

(245)

this model

Datasets used to train sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA

Evaluation results

Combined Accuracy (Base NF4)
self-reported

83.490
Combined Accuracy (Fine-Tuned)
self-reported

73.700
Term Accuracy (Base NF4)
self-reported

83.790
Value Accuracy (Base NF4)
self-reported

83.190
Term Accuracy (Fine-Tuned)
self-reported

78.190
Value Accuracy (Fine-Tuned)
self-reported

69.190
Avg Inference Latency NF4 (seconds)
self-reported

15.240
Avg Inference Latency NF4+LoRA (seconds)
self-reported

14.700
Avg Inference Latency FP16 (seconds)
self-reported

31.100