Qwen2.5-VL-7B-Medical-LoRA

A QLoRA fine-tuned adapter for Qwen2.5-VL-7B-Instruct optimized for medical document understanding and clinical information extraction from document images.

This adapter was trained on 1,000 medical document samples (PathVQA + MTSamples) using 4-bit NF4 quantization with LoRA rank 64, targeting all attention and MLP layers. It extracts diagnoses, medications with dosages, lab values with reference ranges, vital signs, procedures, and clinical abbreviations from scanned/photographed clinical documents.

GitHub Repository: sarathi-aiml/quantization-pipe


Key Results at a Glance

Configuration VRAM Avg Latency Min Latency Max Latency Medical Accuracy vs FP16 VRAM vs FP16 Speed
FP16 Baseline 15,820 MB 31.10s 22.58s 48.74s N/A -- --
NF4 Quantized (recommended) 5,664 MB 15.24s 11.76s 23.25s 83.5% -64.2% 2.04x faster
NF4 + LoRA (this adapter) 11,336 MB 14.70s 12.01s 23.46s 73.7% -28.3% 2.12x faster

Recommendation: For production use, deploy the base model with NF4 quantization (no LoRA). It achieves 83.5% accuracy, requires only 5.7 GB VRAM, and runs 2x faster than FP16. See Analysis for details.


Quick Start

Load with LoRA Adapter (NF4 Quantized)

import torch
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration, BitsAndBytesConfig
from peft import PeftModel
from qwen_vl_utils import process_vision_info
from PIL import Image

# NF4 quantization config (same used during training)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load base model with NF4 quantization
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
)

# Load and merge LoRA adapter
model = PeftModel.from_pretrained(model, "sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA")
model = model.merge_and_unload()
model.eval()

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")

# Inference
image = Image.open("medical_document.png")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Extract all medical diagnoses, medications with dosages, and vital signs from this document."},
        ],
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text], images=image_inputs, videos=video_inputs,
    padding=True, return_tensors="pt",
).to(model.device)

with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=1024)

generated_ids = output_ids[0][inputs["input_ids"].shape[1]:]
response = processor.decode(generated_ids, skip_special_tokens=True)
print(response)

Use Base Model Only (Higher Accuracy, Recommended for Production)

# Same as above but skip the PeftModel step:
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
)
model.eval()
# ... rest of inference code identical

Sample Use Cases with Example Prompts

Use Case 1: Emergency Department Triage

Upload a referral letter or patient assessment and extract critical information in seconds.

prompt = """Extract all medical diagnoses, medications with dosages, and vital signs
from this patient assessment document."""

Sample output (from actual model inference on test image):

**Medical Diagnoses:**
1. Acute coronary syndrome, NSTEMI - Troponin I 2.4 ng/mL
2. Hypertension, uncontrolled
3. Type 2 Diabetes Mellitus (HbA1c 8.2%)
4. Chronic Kidney Disease, Stage 3 (eGFR 42 mL/min)

**Medications Ordered:**
- Aspirin 325 mg PO STAT, then 81 mg PO QD
- Heparin 5000 units IV bolus, then 1000 units/hr
- Metoprolol 25 mg PO BID
- Lisinopril 10 mg PO QD
- Metformin 1000 mg PO BID
- Atorvastatin 80 mg PO QHS

**Vital Signs:**
- BP: 158/94 mmHg
- HR: 102 bpm
- RR: 22/min
- Temp: 98.6 F
- SpO2: 93% on RA
- Weight: 198 lbs

Use Case 2: Radiology Report Parsing

Extract findings, measurements, and impressions from radiology reports.

prompt = """List all findings from this radiology report including measurements,
locations, and the radiologist's impression."""

Sample output:

### Findings:
**Lungs:**
- 3.2 x 2.8 cm spiculated mass in the right upper lobe (series 4, image 87)
- Ipsilateral hilar lymphadenopathy measuring 1.8 cm
- Left lung clear. No pleural effusion.

**Mediastinum:**
- Subcarinal lymph node enlarged at 2.1 cm
- No pericardial effusion. Heart size normal.

**Bones:**
- Lytic lesion in T8 vertebral body, 1.2 cm, suspicious for metastatic disease

### Radiologist's Impression:
1. Right upper lobe mass highly suspicious for primary bronchogenic carcinoma. LUNG-RADS 4B.
2. Ipsilateral hilar and subcarinal lymphadenopathy suggesting nodal metastasis (N2 disease).
3. T8 lytic lesion concerning for osseous metastasis.

### Recommendation:
PET/CT and tissue biopsy recommended.

Use Case 3: Prescription Extraction

Extract structured medication information from prescription images.

prompt = """Extract all prescribed medications with their dosages, directions,
quantities, and refill counts from this prescription."""

Sample output:

1. **Lisinopril 20 mg tablets**
   - Dosage: 1 tablet by mouth once daily in the morning
   - Quantity: #30
   - Refills: 5

2. **Amlodipine 5 mg tablets**
   - Dosage: 1 tablet by mouth once daily
   - Quantity: #30
   - Refills: 5

3. **Metformin 500 mg tablets**
   - Dosage: 1 tablet by mouth twice daily with meals
   - Quantity: #60
   - Refills: 3

4. **Atorvastatin 40 mg tablets**
   - Dosage: 1 tablet by mouth at bedtime
   - Quantity: #30
   - Refills: 5

5. **Pantoprazole 40 mg delayed-release tablets**
   - Dosage: 1 tablet by mouth once daily 30 minutes before breakfast
   - Quantity: #30
   - Refills: 2

Use Case 4: Laboratory Results Parsing

Extract lab values with units, reference ranges, and abnormal flags.

prompt = """Extract all lab values with their results, units, reference ranges,
and flags from this laboratory report."""

Sample output (excerpt):

1. **Glucose, Fasting**: 142 mg/dL (Ref: 70-100) - Flag: H (High)
2. **BUN**: 28 mg/dL (Ref: 7-20) - Flag: H (High)
3. **Creatinine**: 1.8 mg/dL (Ref: 0.6-1.2) - Flag: H (High)
4. **eGFR**: 38 mL/min (Ref: >60) - Flag: L (Low)
5. **Sodium**: 139 mEq/L (Ref: 136-145)
6. **Potassium**: 5.4 mEq/L (Ref: 3.5-5.1) - Flag: H (High)
7. **Chloride**: 101 mEq/L (Ref: 98-106)
8. **CO2**: 20 mEq/L (Ref: 23-29) - Flag: L (Low)
9. **Calcium**: 9.2 mg/dL (Ref: 8.5-10.5)
10. **Total Protein**: 6.8 g/dL (Ref: 6.0-8.3)

Use Case 5: Discharge Summary Processing

Extract diagnoses, medications, and follow-up instructions from discharge documents.

prompt = """Extract all diagnoses, medications with dosages, and follow-up
instructions from this discharge summary."""

Sample output:

### Diagnoses:
1. Community-acquired pneumonia (CAP), right lower lobe
2. Acute on chronic systolic heart failure (EF 30%)
3. Atrial fibrillation with RVR
4. COPD exacerbation

### Discharge Medications:
- Metoprolol Succinate: 50 mg PO QD
- Furosemide: 40 mg PO BID
- Potassium Chloride: 20 mEq PO BID
- Apixaban: 5 mg PO BID
- Lisinopril: 5 mg PO QD
- Albuterol MDI: 2 puffs Q4-6H PRN
- Prednisone taper: 40 mg x 5d, 20 mg x 5d, 10 mg x 5d
- Levofloxacin: 750 mg PO QD x 3 more days

### Follow-Up Instructions:
- PCP in 1 week
- Cardiology in 2 weeks
- Repeat CXR in 6 weeks
- Echo in 3 months

Use Case 6: ICU Flowsheet Extraction

Parse complex ICU documentation with ventilator settings, drips, and hemodynamics.

prompt = """Extract all ventilator settings, hemodynamic parameters, drip rates,
lab values, intake/output totals, and diagnoses from this ICU flowsheet."""

Sample output (from actual evaluation -- base model achieved 100% accuracy on this case):

### Ventilator Settings:
- Mode: SIMV/PS | FiO2: 0.45 | PEEP: 8 cmH2O
- Vt: 450 mL | RR set: 14 | PS: 10 cmH2O

### Hemodynamics (06:00):
- HR: 112 bpm (sinus tach) | MAP: 62 mmHg
- CVP: 14 mmHg | ScvO2: 68%
- Norepinephrine: 0.15 mcg/kg/min | Vasopressin: 0.04 units/min

### ABG (05:30):
- pH 7.32, pCO2 48, pO2 72, HCO3 24, BE -2
- P/F Ratio: 160 (moderate ARDS)

### I&O (last 12h):
- IN: 2850 mL | OUT: 1200 mL | NET: +1650 mL
- UOP: 25 mL/hr (oliguric)

### Diagnoses:
- Septic shock 2/2 pneumonia, moderate ARDS
- AKI Stage 2 (Cr 2.8, baseline 0.9)
- DIC (plt 62K, fibrinogen 128, D-dimer >20)

Use Case 7: Medication Reconciliation

Extract complete medication lists with indications and comorbidity mapping.

prompt = """Extract every medication with its exact dosage, frequency, and indication.
Also list all comorbidities and relevant lab values."""

Use Case 8: Surgical Operative Note Parsing

prompt = """Extract the diagnoses, procedure details, operative findings with measurements,
estimated blood loss, medications with dosages, and postoperative orders."""

Use Case 9: Custom Clinical Analysis

prompt = """Identify all drug interactions risks from the medications listed in this document.
Flag any medications that may need dose adjustment based on the lab values shown."""

Use Case 10: Batch Processing for Clinical Research

import requests
from pathlib import Path

API_URL = "http://localhost:8000"

for doc_path in Path("clinical_documents/").glob("*.png"):
    with open(doc_path, "rb") as f:
        response = requests.post(
            f"{API_URL}/extract",
            files={"file": (doc_path.name, f, "image/png")}
        )
    result = response.json()
    print(f"{doc_path.name}: {result['inference_latency_seconds']:.1f}s")
    print(result["extracted_content"][:500])

Model Details

Architecture

Property Value
Base model Qwen2.5-VL-7B-Instruct
Base model parameters 4,882,615,296
Adapter type LoRA (Low-Rank Adaptation)
Quantization BitsAndBytes NF4 with double quantization
Compute dtype bfloat16
Adapter file size 727 MB (safetensors)
Model type Vision-Language (image-text-to-text)
Architecture Transformer decoder with ViT vision encoder

LoRA Configuration

Parameter Value
Rank (r) 64
Alpha 128
Alpha/Rank ratio 2.0
Dropout 0.05
Bias none
Task type CAUSAL_LM
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Number of target modules 7 (all attention + MLP linear layers)
Trainable parameters 190,357,504
Total parameters 4,882,615,296
Trainable percentage 3.9%
Frozen parameters 4,692,257,792 (96.1%)
Vision encoder Frozen (390 parameters)
use_dora False
use_rslora False
PEFT type LORA

Quantization Configuration

Parameter Value
Method bitsandbytes NF4 (NormalFloat 4-bit)
load_in_4bit True
bnb_4bit_quant_type nf4
bnb_4bit_compute_dtype bfloat16
bnb_4bit_use_double_quant True
Quantized model parameters 4,692,257,792
AWQ attempted Yes (failed: PytorchGELUTanh removed in transformers 4.57)

Training

Training Data

Dataset Source Raw Samples Formatted Used in Training
PathVQA flaviagiammarino/path-vqa 3,000 431 ~88 (subsampled)
MTSamples rungalileo/medical_transcription_40 4,499 4,465 ~912 (subsampled)
PubMedVision FreedomIntelligence/PubMedVision 2,000 0 (filtered) 0
Total 9,499 4,896 1,000
Split Samples Percentage
Train 3,916 (1,000 used) 80%
Validation 490 (100 used) 10%
Test 490 10%
  • Split seed: 42
  • Subsample seed: 42
  • MTSamples: medical transcriptions rendered as text-on-image (PIL) to simulate scanned documents
  • PathVQA: pathology visual question-answering pairs
  • PubMedVision: excluded (answers < 50 chars after filtering)

Training Hyperparameters

Parameter Value
Epochs 2
Per-device batch size 4
Gradient accumulation steps 4
Effective batch size 16
Optimizer steps per epoch 62
Total optimizer steps 124
Learning rate 2e-4
LR scheduler Cosine with warmup
Warmup ratio 0.05 (6 warmup steps)
Weight decay 0.01
Max gradient norm 1.0
Max sequence length 512 tokens
Max image dimension 256px (resized with LANCZOS)
Training precision bfloat16 mixed precision
Gradient checkpointing Enabled (use_reentrant=False)
Dataloader workers 4
Dataloader prefetch factor 4
Save strategy End of training only
Evaluation strategy Disabled (for speed)
Report to None
Response truncation 600 characters max

Training Results

Metric Value
Total training time 7,704.32 seconds (128.4 minutes)
Samples per second 0.26
Steps per second 0.016
Total FLOPs 4.877 x 10^16
Initial loss (step 5) 1.754
Final loss (average) 0.996
Minimum step loss 0.730 (step 120)
Loss reduction 43.2%
Peak training VRAM ~18,300 MB

Complete Loss Trajectory

Step Loss Learning Rate % Progress
5 1.754 1.87e-4 4.0%
10 1.454 1.97e-4 8.1%
15 1.315 2.00e-4 12.1%
20 1.271 1.98e-4 16.1%
25 1.196 1.94e-4 20.2%
30 1.135 1.88e-4 24.2%
35 1.133 1.81e-4 28.2%
40 1.119 1.73e-4 32.3%
45 1.058 1.63e-4 36.3%
50 1.039 1.53e-4 40.3%
55 1.048 1.43e-4 44.4%
60 1.037 1.32e-4 48.4%
65 0.951 1.21e-4 52.4%
70 0.933 1.10e-4 56.5%
75 0.847 9.92e-5 60.5%
80 0.767 8.82e-5 64.5%
85 0.860 7.77e-5 68.5%
90 0.852 6.76e-5 72.6%
95 0.844 5.82e-5 76.6%
100 0.809 4.96e-5 80.6%
105 0.813 4.18e-5 84.7%
110 0.846 3.50e-5 88.7%
115 0.789 2.93e-5 92.7%
120 0.730 2.47e-5 96.8%
Final 0.996 (avg) -- 100%

Loss Curve Visualization

Loss
1.8 |*
1.7 |
1.6 | .
1.5 | *
1.4 |   .
1.3 |   *
1.2 |     * .
1.1 |       * * .
1.0 |           * * * .   .
0.9 |                 * .   * .   .
0.8 |                   * .   * * . *
0.7 |                             *
    +---+---+---+---+---+---+---+---+---+---+---+----> Steps
    0   10  20  30  40  50  60  70  80  90 100 110 120
    |<-------- Epoch 1 --------->|<-------- Epoch 2 -------->|

Evaluation -- Complete Metrics

Evaluation Methodology

  • 5 synthetic clinical document images rendered with monospace fonts on white background
  • Each document has ground-truth expected terms (medical terms, drug names, diagnoses) and expected values (dosages, measurements, lab numbers)
  • Term Accuracy = (terms found in response) / (total expected terms) x 100
  • Value Accuracy = (values found in response) / (total expected values) x 100
  • Combined Accuracy = (Term Accuracy + Value Accuracy) / 2
  • Matching is case-insensitive substring search

Per-Case Detailed Results

Test Case 1: Medication Reconciliation Form

11 active medications, 8 comorbidities, 3 drug allergies, 6 lab values

Metric Base NF4 Fine-Tuned Delta
Term Accuracy 80.0% (16/20) 80.0% (16/20) 0.0%
Value Accuracy 68.0% (17/25) 68.0% (17/25) 0.0%
Combined Accuracy 74.0% 74.0% 0.0%
Latency 24.57s 23.66s -0.91s

Terms missed (both models): CKD, COPD, Tiotropium, anaphylaxis Values missed (both models): 18 mcg, BNP 580, BUN 32, Cr 1.6, EF 35%, INR 2.4, PRN, eGFR 38

Test Case 2: ICU Flowsheet

Ventilator settings, hemodynamics, 6 active drips, ABG values, I&O totals

Metric Base NF4 Fine-Tuned Delta
Term Accuracy 100.0% (19/19) 94.7% (18/19) -5.3%
Value Accuracy 100.0% (30/30) 50.0% (15/30) -50.0%
Combined Accuracy 100.0% 72.4% -27.6%
Latency 22.62s 23.75s +1.13s

Fine-tuned missed terms: coagulopathy Fine-tuned missed values: CPOT: 1, CVP: 14, D-dimer >20, FiO2: 0.45, HCO3 24, HR: 112, MAP: 62, PEEP: 8, RASS: -3, ScvO2: 68%, fibrinogen 128, pCO2 48, pH 7.32, pO2 72, plt 62K

Test Case 3: Cardiology Consultation Note

Echo measurements, cardiac catheterization, ECG interpretation, GDMT medication plan

Metric Base NF4 Fine-Tuned Delta
Term Accuracy 71.4% (20/28) 71.4% (20/28) 0.0%
Value Accuracy 69.2% (18/26) 69.2% (18/26) 0.0%
Combined Accuracy 70.3% 70.3% 0.0%
Latency 23.78s 23.88s +0.10s

Terms missed (both): CHF, CRT-D, DAPT, Dapagliflozin, Eplerenone, NSR, Rosuvastatin, Ticagrelor Values missed (both): 0.25 cm2, 1.8 L/min/m2, 10 mg, 20 mg, 81 mg, 90 mg, QD, QHS

Test Case 4: Complex Lab Panel

CBC (7 tests), coagulation (5 tests), hepatic function (8 tests) with critical values

Metric Base NF4 Fine-Tuned Delta
Term Accuracy 81.8% (18/22) 59.1% (13/22) -22.7%
Value Accuracy 84.0% (21/25) 64.0% (16/25) -20.0%
Combined Accuracy 82.9% 61.5% -21.4%
Latency 23.89s 23.81s -0.08s

Base missed terms: Albumin, Bilirubin, LDH, neutropenic Fine-tuned missed terms: ALP, ALT, AST, Albumin, Bilirubin, GGT, LDH, SGOT, SGPT (9 terms) Fine-tuned missed values: 142, 198, 2.4, 245, 3.6, 312, 4.8, 890, U/L (9 values)

Test Case 5: Surgical Operative Note

Laparoscopic cholecystectomy converted to open, 7 post-op medication orders

Metric Base NF4 Fine-Tuned Delta
Term Accuracy 85.7% (18/21) 85.7% (18/21) 0.0%
Value Accuracy 94.7% (18/19) 94.7% (18/19) 0.0%
Combined Accuracy 90.2% 90.2% 0.0%
Latency 17.37s 16.84s -0.53s

Terms missed (both): ETT, FACS, cholelithiasis Values missed (both): Q1H

Aggregate Accuracy Summary

Metric Base NF4 Fine-Tuned Delta
Avg Term Accuracy 83.79% 78.19% -5.60 pp
Avg Value Accuracy 83.19% 69.19% -14.00 pp
Avg Combined Accuracy 83.49% 73.70% -9.79 pp
Avg Eval Latency 22.45s 22.39s -0.06s
Min Eval Latency 17.37s 16.84s -0.53s
Max Eval Latency 24.57s 23.88s -0.69s
Total Terms Evaluated 110 110 --
Total Values Evaluated 125 125 --
Terms Found (total) 91 85 -6
Values Found (total) 104 84 -20

VRAM Metrics (All Configurations)

Configuration Allocated (MB) Reserved (MB) Peak Allocated (MB) vs FP16
FP16 Baseline (post-load) 15,820.09 17,120.00 16,860.07 --
NF4 Base (post-load) 5,664.12 7,018.00 6,704.10 -64.2%
NF4 Base (eval peak) 5,671.03 7,508.00 7,388.28 -64.1%
NF4+LoRA merged (post-load) 11,335.83 14,806.00 12,894.35 -28.3%
NF4+LoRA merged (eval peak) 11,335.84 14,806.00 12,894.35 -28.3%
Training peak ~18,300 -- -- +15.7%

Latency Metrics (Standard 5-Image Benchmark)

All latencies measured with torch.cuda.synchronize() for accuracy.

Image FP16 NF4 Base NF4+LoRA NF4 Speedup LoRA Speedup
Patient Diagnosis 28.41s 15.40s 13.24s 1.84x 2.15x
Radiology Report 22.58s 11.76s 12.01s 1.92x 1.88x
Prescription 23.83s 13.67s 12.75s 1.74x 1.87x
Lab Results 48.74s 23.25s 23.46s 2.10x 2.08x
Discharge Summary 31.96s 12.13s 12.05s 2.63x 2.65x
Statistic FP16 NF4 Base NF4+LoRA
Average 31.10s 15.24s 14.70s
Minimum 22.58s 11.76s 12.01s
Maximum 48.74s 23.25s 23.46s
Median (P50) 28.41s 13.67s 12.75s
Throughput 0.032 img/s 0.066 img/s 0.068 img/s

Model Load Time

Configuration Load Time
FP16 88.74s
NF4 Base 92.65s
NF4+LoRA (merge_and_unload) 92.73s

Improvement Summary vs FP16 Baseline

Metric NF4 Base NF4+LoRA
VRAM reduction -64.2% -28.3%
Latency reduction -51.0% -52.7%
Throughput increase 2.06x 2.12x
Memory: GB needed 5.7 GB 11.3 GB
Min GPU requirement 8 GB 16 GB

Why Base Model Outperforms Fine-Tuned

The fine-tuned model scores 9.8 percentage points below the base model (73.7% vs 83.5%). Three factors explain this:

1. Training Data Distribution Mismatch

Training Data Evaluation Data
Narrative medical transcriptions Structured clinical forms
Rendered text paragraphs Dense tables with abbreviations
"SUBJECTIVE: Patient presents with..." "HR: 112
Long-form medical reports ICU flowsheets, lab panels
Free text descriptions Medication reconciliation grids

2. Strong Zero-Shot Baseline

Qwen2.5-VL-7B-Instruct scores 83.5% on medical extraction without any fine-tuning. Notable:

  • 100% accuracy on ICU flowsheet (all 19 terms + all 30 values extracted correctly)
  • 90.2% accuracy on surgical operative notes
  • Only struggles with abbreviations not explicitly written out (CRT-D, DAPT, NSR)

3. Partial Catastrophic Forgetting

Fine-tuning on narrative text degraded performance on structured/tabular formats:

  • ICU Flowsheet: 100% -> 72.4% (-27.6%)
  • Complex Lab Panel: 82.9% -> 61.5% (-21.4%)
  • The model lost ability to reliably extract dense numerical values from tabular layouts

Path to Improvement

To surpass the base model, fine-tune with:

  • Actual scanned clinical forms (ICU flowsheets, MAR sheets)
  • Structured lab reports with tabular layouts
  • Medication reconciliation forms from EHR systems
  • Cardiology/radiology reports with dense measurements
  • At least 5,000+ domain-matched samples

API Deployment

A production-ready REST API is included in the GitHub repository.

Endpoints

Endpoint Method Description Avg Latency
/health GET GPU status, VRAM usage, model readiness <10ms
/extract POST Extract medical info with default prompt 27.69s
/analyze POST Custom prompt analysis 11.09s
/benchmark GET Return all benchmark JSON data <10ms

API Test Results

Test Status Details
GET /health PASS GPU: NVIDIA GB10, VRAM: 5,673 MB
POST /extract PASS 7/7 medical terms found, 27.69s latency, 1,352 char response
POST /analyze PASS 5/5 medication terms found, 11.09s latency, custom prompt reflected
GET /benchmark PASS 8 benchmark files returned
POST /extract (bad input) PASS Correctly returned HTTP 400
Total 5/5 PASS 0 failures

Docker Deployment

docker build -t medical-vision-pipeline .
docker run --gpus all -p 8000:8000 -p 7860:7860 medical-vision-pipeline

Quick API Test

# Start server
python -m uvicorn api.server:app --host 0.0.0.0 --port 8000

# Test extraction
curl -X POST http://localhost:8000/extract \
  -F "file=@medical_document.png" | python -m json.tool

Intended Use

Direct Use Cases

Use Case Description Example Prompt
Medical Records Digitization Convert paper/scanned clinical docs to structured data "Extract all medical information from this clinical document"
ED Triage Support Rapid extraction from referral letters "Extract diagnoses, medications, and allergies"
Prescription Processing Parse medication orders "List all medications with dosages, quantities, and refills"
Lab Report Parsing Extract values with flags "Extract all lab values with results, units, and reference ranges"
Discharge Planning Process discharge summaries "Extract diagnoses, discharge medications, and follow-up instructions"
ICU Documentation Parse complex flowsheets "Extract ventilator settings, drip rates, and hemodynamic parameters"
Surgical Note Processing Parse operative reports "Extract procedure details, findings, EBL, and post-op orders"
Insurance Pre-Authorization Extract diagnosis and treatment details "List all diagnoses and procedures with their codes"
Clinical Research Screening Screen records for study eligibility "Identify all cardiovascular diagnoses and medications in this record"
Medical Education Teaching aid for clinical document reading "Explain the key findings in this radiology report"

Out-of-Scope Use

  • Not for clinical decision-making: Outputs require human review by qualified clinicians
  • Not a diagnostic tool: Cannot replace physician judgment
  • Not for patient-facing applications without appropriate clinical oversight
  • Not validated for handwritten documents: Trained on rendered/printed text only
  • Not validated for non-English documents: Training data is English only
  • Not for real-time critical care monitoring: Latency is 15-25 seconds per image

Bias, Risks, and Limitations

  • Trained on English-only medical documents; other languages are unsupported
  • Training data biased toward US medical terminology and abbreviations (BID, QHS, PRN, etc.)
  • May hallucinate medical terms or values not present in the source document
  • Accuracy varies significantly by document type (100% on ICU flowsheets vs 70.3% on cardiology notes)
  • Validated only on synthetic test images (text rendered on white background), not real clinical scans
  • Not validated against real-world clinical documents with noise, handwriting, stamps, or poor scan quality
  • Should never replace human review in any clinical workflow
  • Model may generate medically plausible but incorrect information
  • No validation for medication interaction detection or clinical decision support

Compute Infrastructure

Hardware

Component Details
Platform NVIDIA DGX Spark
GPU NVIDIA GB10 (Blackwell architecture, sm_121)
GPU Count 1
Compute Capability 12.1
Total VRAM 119.7 GB (unified memory)
CPU Architecture aarch64 (ARM)
OS Linux 6.14.0-1015-nvidia

Training Compute

Metric Value
Training time 128.4 minutes (2h 8m)
GPU utilization ~96% during training
Peak training VRAM ~18.3 GB
Total FLOPs 4.877 x 10^16
Samples per second 0.26
Steps per second 0.016
Seconds per optimizer step ~62s

Inference Compute

Metric NF4 Base NF4+LoRA
VRAM required 5,664 MB 11,336 MB
Avg latency 15.24s 14.70s
Throughput 0.066 img/s 0.068 img/s
Min GPU VRAM 8 GB 16 GB

Model Download

Metric Value
Base model size 15.46 GB
Base model download time 390.48s
Adapter size 727 MB
Total model files 5 safetensors shards + adapter

Framework Versions

Package Version
PEFT 0.18.0
Transformers 4.57.6
PyTorch 2.11.0.dev20260206+cu128
BitsAndBytes 0.49.1
Accelerate 1.12.0
Datasets 4.4.2
Pillow 12.1.0
qwen-vl-utils latest
HuggingFace Hub 0.36.2
Python 3.10.19
CUDA 12.8

Citation

If you use this model or pipeline, please cite:

@misc{qwen25vl-medical-lora-2026,
  title={Qwen2.5-VL-7B-Medical-LoRA: QLoRA Fine-tuning for Medical Document Extraction},
  author={Sarathi Balakrishnan},
  year={2026},
  url={https://huggingface.co/sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA},
  note={QLoRA adapter (r=64) for Qwen2.5-VL-7B-Instruct, NF4 quantized, trained on PathVQA + MTSamples}
}

Base model citation:

@article{Qwen2.5-VL,
  title={Qwen2.5-VL},
  author={Qwen Team},
  year={2025},
  url={https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct}
}
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA

Adapter
(245)
this model

Datasets used to train sarathi-balakrishnan/Qwen2.5-VL-7B-Medical-LoRA

Evaluation results