Qwen2.5-VL-7B-Instruct MCAT LoRA Adapter

A LoRA fine-tuned adapter for unsloth/Qwen2.5-VL-7B-Instruct on the MCAT examination dataset — 1,610 questions across 7 official MCAT practice test sets covering the four MCAT sections: Biological and Biochemical Foundations (BB), Critical Analysis and Reasoning Skills (CARS), Chemical and Physical Foundations (CP), and Psychological, Social, and Biological Foundations (PS).

Model Details

Developed by: James Oon (@jamezoon), SUTD MSTR-DAIE Deep Learning Project
Model type: Vision-Language Causal LM with LoRA adapter (PEFT)
Base model: unsloth/Qwen2.5-VL-7B-Instruct (dense, 7B parameters, BF16)
Language: English
License: Follows base model license (Qwen2.5)
Adapter size: ~182 MB (adapter_model.safetensors)

Intended Use

MCAT examination multiple-choice question answering. Given a passage (where applicable) and a 4-option question (A–D), the model selects the correct answer with a step-by-step explanation. Sections covered:

BB — Biological and Biochemical Foundations of Living Systems
CARS — Critical Analysis and Reasoning Skills
CP — Chemical and Physical Foundations of Biological Systems
PS — Psychological, Social, and Biological Foundations of Behavior

Not intended for real clinical or academic decision-making. This is a research/educational model.

How to Get Started

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer
from peft import PeftModel
import torch

base_model_id = "Qwen/Qwen2.5-VL-7B-Instruct"
adapter_id = "jamezoon/qwen2.5-vl-7b-instruct-mcat-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

messages = [
    {
        "role": "system",
        "content": "You are a helpful tutor for students preparing for the MCAT. "
                   "Answer the following multiple choice question by thinking step by step, then give the answer."
    },
    {
        "role": "user",
        "content": (
            "Passage: During a study of enzyme kinetics, researchers measured the rate of reaction "
            "at varying substrate concentrations in the presence and absence of an inhibitor.\n\n"
            "Question: Which of the following best describes competitive inhibition?\n"
            "Options: A. Vmax decreases, Km unchanged  "
            "B. Vmax unchanged, Km increases  "
            "C. Both Vmax and Km decrease  "
            "D. Both Vmax and Km increase\n"
            "Think step by step. Then respond in the format:\n"
            "Explanation: ...\nAnswer: <one of A, B, C, D>"
        ),
    },
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Details

Dataset

MCAT Practice Exams — 1,449 training samples, 161 validation samples (90/10 split)
7 official MCAT practice test sets, ~230 questions each
4 MCAT sections: BB, CARS, CP, PS
Each sample: optional passage + question + 4 options + correct answer (A–D)
557 questions include associated images (text-only training — images not used in this adapter)
Formatted as chat messages with system/user/assistant roles

Training Procedure

Hyperparameter	Value
Training steps	546
Epochs	3
Per-device batch size	2
Gradient accumulation	4 (effective batch = 8)
Learning rate	2e-5
LR scheduler	Cosine
Warmup ratio	0.03
Max sequence length	2048 tokens
Precision	BF16
Optimizer	AdamW
Train loss (final)	0.2804
Eval loss (final)	0.4477

LoRA Configuration

Parameter	Value
Rank (r)	16
Alpha (α)	16
Dropout	0.0
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters	~47.6M (0.57% of 7B)
Bias	none

Hardware & Training Time

Hardware: NVIDIA GB10 Grace Blackwell (NVIDIA DGX Spark, Node 1), 121 GB unified CPU+GPU memory
Training duration: ~1.1 hours (3,839 seconds, 546 steps)
Throughput: ~1.13 samples/sec, ~6–7 tok/s inference
Framework: PyTorch 2.10 (nv25.11), HuggingFace Transformers 5.5.0, PEFT 0.18.1, TRL 0.26.1, Unsloth 2026.4.2

Technical Notes

Training was performed in text-only mode (vision inputs disabled) due to format incompatibility between UnslothVisionDataCollator and the MCAT dataset structure. The base model Qwen2.5-VL requires Qwen2_5_VLForConditionalGeneration for inference — AutoModelForCausalLM does not support this config class.

GB10 Blackwell constraints applied during training:

PYTORCH_JIT=0, TORCHDYNAMO_DISABLE=1 (nvrtc JIT unsupported on sm_121)
BF16 only — no 4-bit quantization (bitsandbytes NF4 causes silent OOM on GB10)
attn_implementation="eager" (Flash Attention 3 incompatible with Blackwell)

Evaluation

Evaluated on 7 official MCAT practice test sets (~230 questions each), covering all 4 MCAT sections.

Overall: 56.3% (907/1610) — Macro F1: 0.566

Per-Test-Set Results

Test Set	Questions	Correct	Accuracy	Macro F1	Avg Latency	Avg tok/s
test_set_01	230	129	56.1%	0.555	2.81s	9.86
test_set_02	230	135	58.7%	0.588	2.69s	9.77
test_set_03	230	126	54.8%	0.552	2.28s	9.59
test_set_04	230	130	56.5%	0.566	2.73s	9.78
test_set_05	230	129	56.1%	0.566	2.43s	9.74
test_set_06	230	129	56.1%	0.571	2.71s	9.53
test_set_07	230	129	56.1%	0.566	2.18s	9.65

Per-Section Results

Section	Description	Questions	Correct	Accuracy
PS	Psychological, Social, and Biological Foundations	413	259	62.7%
BB	Biological and Biochemical Foundations	413	252	61.0%
CARS	Critical Analysis and Reasoning Skills	371	190	51.2%
CP	Chemical and Physical Foundations	413	206	49.9%

Key findings:

Strongest on PS (62.7%) and BB (61.0%) — content-heavy science sections benefit most from fine-tuning
CARS (51.2%) and CP (49.9%) are harder — CARS requires reading comprehension of humanities passages, CP requires quantitative reasoning
Consistent performance across all 7 test sets (54.8%–58.7%), suggesting stable generalization

Citation

If you use this adapter, please cite the MCAT practice materials and the SUTD project:

@misc{oon2026mcat,
  title     = {Qwen2.5-VL-7B-Instruct MCAT LoRA Adapter},
  author    = {Oon, James},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/jamezoon/qwen2.5-vl-7b-instruct-mcat-lora}
}

Framework Versions

Unsloth 2026.4.2
PEFT 0.18.1
Transformers 5.5.0
TRL 0.26.1
PyTorch 2.10.0 (nv25.11) + CUDA 13.0

Downloads last month: 44

Model tree for jamezoon/qwen2.5-vl-7b-instruct-mcat-lora

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Adapter

(245)

this model