Qwen2.5-VL-7B-Instruct MCAT LoRA Adapter
A LoRA fine-tuned adapter for unsloth/Qwen2.5-VL-7B-Instruct on the MCAT examination dataset β 1,610 questions across 7 official MCAT practice test sets covering the four MCAT sections: Biological and Biochemical Foundations (BB), Critical Analysis and Reasoning Skills (CARS), Chemical and Physical Foundations (CP), and Psychological, Social, and Biological Foundations (PS).
Model Details
- Developed by: James Oon (@jamezoon), SUTD MSTR-DAIE Deep Learning Project
- Model type: Vision-Language Causal LM with LoRA adapter (PEFT)
- Base model:
unsloth/Qwen2.5-VL-7B-Instruct(dense, 7B parameters, BF16) - Language: English
- License: Follows base model license (Qwen2.5)
- Adapter size: ~182 MB (
adapter_model.safetensors)
Intended Use
MCAT examination multiple-choice question answering. Given a passage (where applicable) and a 4-option question (AβD), the model selects the correct answer with a step-by-step explanation. Sections covered:
- BB β Biological and Biochemical Foundations of Living Systems
- CARS β Critical Analysis and Reasoning Skills
- CP β Chemical and Physical Foundations of Biological Systems
- PS β Psychological, Social, and Biological Foundations of Behavior
Not intended for real clinical or academic decision-making. This is a research/educational model.
How to Get Started
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer
from peft import PeftModel
import torch
base_model_id = "Qwen/Qwen2.5-VL-7B-Instruct"
adapter_id = "jamezoon/qwen2.5-vl-7b-instruct-mcat-lora"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
messages = [
{
"role": "system",
"content": "You are a helpful tutor for students preparing for the MCAT. "
"Answer the following multiple choice question by thinking step by step, then give the answer."
},
{
"role": "user",
"content": (
"Passage: During a study of enzyme kinetics, researchers measured the rate of reaction "
"at varying substrate concentrations in the presence and absence of an inhibitor.\n\n"
"Question: Which of the following best describes competitive inhibition?\n"
"Options: A. Vmax decreases, Km unchanged "
"B. Vmax unchanged, Km increases "
"C. Both Vmax and Km decrease "
"D. Both Vmax and Km increase\n"
"Think step by step. Then respond in the format:\n"
"Explanation: ...\nAnswer: <one of A, B, C, D>"
),
},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Training Details
Dataset
- MCAT Practice Exams β 1,449 training samples, 161 validation samples (90/10 split)
- 7 official MCAT practice test sets, ~230 questions each
- 4 MCAT sections: BB, CARS, CP, PS
- Each sample: optional passage + question + 4 options + correct answer (AβD)
- 557 questions include associated images (text-only training β images not used in this adapter)
- Formatted as chat messages with system/user/assistant roles
Training Procedure
| Hyperparameter | Value |
|---|---|
| Training steps | 546 |
| Epochs | 3 |
| Per-device batch size | 2 |
| Gradient accumulation | 4 (effective batch = 8) |
| Learning rate | 2e-5 |
| LR scheduler | Cosine |
| Warmup ratio | 0.03 |
| Max sequence length | 2048 tokens |
| Precision | BF16 |
| Optimizer | AdamW |
| Train loss (final) | 0.2804 |
| Eval loss (final) | 0.4477 |
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha (Ξ±) | 16 |
| Dropout | 0.0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable parameters | ~47.6M (0.57% of 7B) |
| Bias | none |
Hardware & Training Time
- Hardware: NVIDIA GB10 Grace Blackwell (NVIDIA DGX Spark, Node 1), 121 GB unified CPU+GPU memory
- Training duration: ~1.1 hours (3,839 seconds, 546 steps)
- Throughput: ~1.13 samples/sec, ~6β7 tok/s inference
- Framework: PyTorch 2.10 (nv25.11), HuggingFace Transformers 5.5.0, PEFT 0.18.1, TRL 0.26.1, Unsloth 2026.4.2
Technical Notes
Training was performed in text-only mode (vision inputs disabled) due to format incompatibility between UnslothVisionDataCollator and the MCAT dataset structure. The base model Qwen2.5-VL requires Qwen2_5_VLForConditionalGeneration for inference β AutoModelForCausalLM does not support this config class.
GB10 Blackwell constraints applied during training:
PYTORCH_JIT=0,TORCHDYNAMO_DISABLE=1(nvrtc JIT unsupported on sm_121)- BF16 only β no 4-bit quantization (bitsandbytes NF4 causes silent OOM on GB10)
attn_implementation="eager"(Flash Attention 3 incompatible with Blackwell)
Evaluation
Evaluated on 7 official MCAT practice test sets (~230 questions each), covering all 4 MCAT sections.
Overall: 56.3% (907/1610) β Macro F1: 0.566
Per-Test-Set Results
| Test Set | Questions | Correct | Accuracy | Macro F1 | Avg Latency | Avg tok/s |
|---|---|---|---|---|---|---|
| test_set_01 | 230 | 129 | 56.1% | 0.555 | 2.81s | 9.86 |
| test_set_02 | 230 | 135 | 58.7% | 0.588 | 2.69s | 9.77 |
| test_set_03 | 230 | 126 | 54.8% | 0.552 | 2.28s | 9.59 |
| test_set_04 | 230 | 130 | 56.5% | 0.566 | 2.73s | 9.78 |
| test_set_05 | 230 | 129 | 56.1% | 0.566 | 2.43s | 9.74 |
| test_set_06 | 230 | 129 | 56.1% | 0.571 | 2.71s | 9.53 |
| test_set_07 | 230 | 129 | 56.1% | 0.566 | 2.18s | 9.65 |
Per-Section Results
| Section | Description | Questions | Correct | Accuracy |
|---|---|---|---|---|
| PS | Psychological, Social, and Biological Foundations | 413 | 259 | 62.7% |
| BB | Biological and Biochemical Foundations | 413 | 252 | 61.0% |
| CARS | Critical Analysis and Reasoning Skills | 371 | 190 | 51.2% |
| CP | Chemical and Physical Foundations | 413 | 206 | 49.9% |
Key findings:
- Strongest on PS (62.7%) and BB (61.0%) β content-heavy science sections benefit most from fine-tuning
- CARS (51.2%) and CP (49.9%) are harder β CARS requires reading comprehension of humanities passages, CP requires quantitative reasoning
- Consistent performance across all 7 test sets (54.8%β58.7%), suggesting stable generalization
Citation
If you use this adapter, please cite the MCAT practice materials and the SUTD project:
@misc{oon2026mcat,
title = {Qwen2.5-VL-7B-Instruct MCAT LoRA Adapter},
author = {Oon, James},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/jamezoon/qwen2.5-vl-7b-instruct-mcat-lora}
}
Framework Versions
- Unsloth 2026.4.2
- PEFT 0.18.1
- Transformers 5.5.0
- TRL 0.26.1
- PyTorch 2.10.0 (nv25.11) + CUDA 13.0
- Downloads last month
- 44
Model tree for jamezoon/qwen2.5-vl-7b-instruct-mcat-lora
Base model
Qwen/Qwen2.5-VL-7B-Instruct