Qwen2.5-VL-7B-Instruct MCAT LoRA Adapter

A LoRA fine-tuned adapter for unsloth/Qwen2.5-VL-7B-Instruct on the MCAT examination dataset β€” 1,610 questions across 7 official MCAT practice test sets covering the four MCAT sections: Biological and Biochemical Foundations (BB), Critical Analysis and Reasoning Skills (CARS), Chemical and Physical Foundations (CP), and Psychological, Social, and Biological Foundations (PS).

Model Details

  • Developed by: James Oon (@jamezoon), SUTD MSTR-DAIE Deep Learning Project
  • Model type: Vision-Language Causal LM with LoRA adapter (PEFT)
  • Base model: unsloth/Qwen2.5-VL-7B-Instruct (dense, 7B parameters, BF16)
  • Language: English
  • License: Follows base model license (Qwen2.5)
  • Adapter size: ~182 MB (adapter_model.safetensors)

Intended Use

MCAT examination multiple-choice question answering. Given a passage (where applicable) and a 4-option question (A–D), the model selects the correct answer with a step-by-step explanation. Sections covered:

  • BB β€” Biological and Biochemical Foundations of Living Systems
  • CARS β€” Critical Analysis and Reasoning Skills
  • CP β€” Chemical and Physical Foundations of Biological Systems
  • PS β€” Psychological, Social, and Biological Foundations of Behavior

Not intended for real clinical or academic decision-making. This is a research/educational model.

How to Get Started

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer
from peft import PeftModel
import torch

base_model_id = "Qwen/Qwen2.5-VL-7B-Instruct"
adapter_id = "jamezoon/qwen2.5-vl-7b-instruct-mcat-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

messages = [
    {
        "role": "system",
        "content": "You are a helpful tutor for students preparing for the MCAT. "
                   "Answer the following multiple choice question by thinking step by step, then give the answer."
    },
    {
        "role": "user",
        "content": (
            "Passage: During a study of enzyme kinetics, researchers measured the rate of reaction "
            "at varying substrate concentrations in the presence and absence of an inhibitor.\n\n"
            "Question: Which of the following best describes competitive inhibition?\n"
            "Options: A. Vmax decreases, Km unchanged  "
            "B. Vmax unchanged, Km increases  "
            "C. Both Vmax and Km decrease  "
            "D. Both Vmax and Km increase\n"
            "Think step by step. Then respond in the format:\n"
            "Explanation: ...\nAnswer: <one of A, B, C, D>"
        ),
    },
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Details

Dataset

  • MCAT Practice Exams β€” 1,449 training samples, 161 validation samples (90/10 split)
  • 7 official MCAT practice test sets, ~230 questions each
  • 4 MCAT sections: BB, CARS, CP, PS
  • Each sample: optional passage + question + 4 options + correct answer (A–D)
  • 557 questions include associated images (text-only training β€” images not used in this adapter)
  • Formatted as chat messages with system/user/assistant roles

Training Procedure

Hyperparameter Value
Training steps 546
Epochs 3
Per-device batch size 2
Gradient accumulation 4 (effective batch = 8)
Learning rate 2e-5
LR scheduler Cosine
Warmup ratio 0.03
Max sequence length 2048 tokens
Precision BF16
Optimizer AdamW
Train loss (final) 0.2804
Eval loss (final) 0.4477

LoRA Configuration

Parameter Value
Rank (r) 16
Alpha (Ξ±) 16
Dropout 0.0
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters ~47.6M (0.57% of 7B)
Bias none

Hardware & Training Time

  • Hardware: NVIDIA GB10 Grace Blackwell (NVIDIA DGX Spark, Node 1), 121 GB unified CPU+GPU memory
  • Training duration: ~1.1 hours (3,839 seconds, 546 steps)
  • Throughput: ~1.13 samples/sec, ~6–7 tok/s inference
  • Framework: PyTorch 2.10 (nv25.11), HuggingFace Transformers 5.5.0, PEFT 0.18.1, TRL 0.26.1, Unsloth 2026.4.2

Technical Notes

Training was performed in text-only mode (vision inputs disabled) due to format incompatibility between UnslothVisionDataCollator and the MCAT dataset structure. The base model Qwen2.5-VL requires Qwen2_5_VLForConditionalGeneration for inference β€” AutoModelForCausalLM does not support this config class.

GB10 Blackwell constraints applied during training:

  • PYTORCH_JIT=0, TORCHDYNAMO_DISABLE=1 (nvrtc JIT unsupported on sm_121)
  • BF16 only β€” no 4-bit quantization (bitsandbytes NF4 causes silent OOM on GB10)
  • attn_implementation="eager" (Flash Attention 3 incompatible with Blackwell)

Evaluation

Evaluated on 7 official MCAT practice test sets (~230 questions each), covering all 4 MCAT sections.

Overall: 56.3% (907/1610) β€” Macro F1: 0.566

Per-Test-Set Results

Test Set Questions Correct Accuracy Macro F1 Avg Latency Avg tok/s
test_set_01 230 129 56.1% 0.555 2.81s 9.86
test_set_02 230 135 58.7% 0.588 2.69s 9.77
test_set_03 230 126 54.8% 0.552 2.28s 9.59
test_set_04 230 130 56.5% 0.566 2.73s 9.78
test_set_05 230 129 56.1% 0.566 2.43s 9.74
test_set_06 230 129 56.1% 0.571 2.71s 9.53
test_set_07 230 129 56.1% 0.566 2.18s 9.65

Per-Section Results

Section Description Questions Correct Accuracy
PS Psychological, Social, and Biological Foundations 413 259 62.7%
BB Biological and Biochemical Foundations 413 252 61.0%
CARS Critical Analysis and Reasoning Skills 371 190 51.2%
CP Chemical and Physical Foundations 413 206 49.9%

Key findings:

  • Strongest on PS (62.7%) and BB (61.0%) β€” content-heavy science sections benefit most from fine-tuning
  • CARS (51.2%) and CP (49.9%) are harder β€” CARS requires reading comprehension of humanities passages, CP requires quantitative reasoning
  • Consistent performance across all 7 test sets (54.8%–58.7%), suggesting stable generalization

Citation

If you use this adapter, please cite the MCAT practice materials and the SUTD project:

@misc{oon2026mcat,
  title     = {Qwen2.5-VL-7B-Instruct MCAT LoRA Adapter},
  author    = {Oon, James},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/jamezoon/qwen2.5-vl-7b-instruct-mcat-lora}
}

Framework Versions

  • Unsloth 2026.4.2
  • PEFT 0.18.1
  • Transformers 5.5.0
  • TRL 0.26.1
  • PyTorch 2.10.0 (nv25.11) + CUDA 13.0
Downloads last month
44
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jamezoon/qwen2.5-vl-7b-instruct-mcat-lora

Adapter
(245)
this model