Qwen3.5-4B MCAT LoRA Adapter
A LoRA fine-tuned adapter for unsloth/Qwen3.5-4B on the MCAT examination dataset — 1,610 questions across 7 official MCAT practice test sets covering the four MCAT sections: Biological and Biochemical Foundations (BB), Critical Analysis and Reasoning Skills (CARS), Chemical and Physical Foundations (CP), and Psychological, Social, and Biological Foundations (PS).
Model Details
- Developed by: James Oon (@jamezoon), SUTD MSTR-DAIE Deep Learning Project
- Model type: Causal LM with LoRA adapter (PEFT)
- Base model:
Qwen/Qwen3.5-4B(dense, 4B parameters, BF16) - Language: English
- License: Follows base model license (Qwen3.5)
- Adapter size: ~82 MB (
adapter_model.safetensors)
Intended Use
MCAT examination multiple-choice question answering. Given a passage (where applicable) and a 4-option question (A–D), the model selects the correct answer with a step-by-step explanation. Sections covered:
- BB — Biological and Biochemical Foundations of Living Systems
- CARS — Critical Analysis and Reasoning Skills
- CP — Chemical and Physical Foundations of Biological Systems
- PS — Psychological, Social, and Biological Foundations of Behavior
Not intended for real clinical or academic decision-making. This is a research/educational model.
How to Get Started
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model_id = "Qwen/Qwen3.5-4B"
adapter_id = "jamezoon/qwen3.5-4b-mcat-lora"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
messages = [
{
"role": "system",
"content": "You are a helpful tutor for students preparing for the MCAT. "
"Answer the following multiple choice question by thinking step by step, then give the answer."
},
{
"role": "user",
"content": (
"Passage: During a study of enzyme kinetics, researchers measured the rate of reaction "
"at varying substrate concentrations in the presence and absence of an inhibitor.\n\n"
"Question: Which of the following best describes competitive inhibition?\n"
"Options: A. Vmax decreases, Km unchanged "
"B. Vmax unchanged, Km increases "
"C. Both Vmax and Km decrease "
"D. Both Vmax and Km increase\n"
"Think step by step. Then respond in the format:\n"
"Explanation: ...\nAnswer: <one of A, B, C, D>"
),
},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Training Details
Dataset
- MCAT Practice Exams — 1,449 training samples, 161 validation samples (90/10 split)
- 7 official MCAT practice test sets, ~230 questions each
- 4 MCAT sections: BB, CARS, CP, PS
- Each sample: optional passage + question + 4 options + correct answer (A–D)
- Formatted as chat messages with system/user/assistant roles
Training Procedure
| Hyperparameter | Value |
|---|---|
| Training steps | 546 |
| Epochs | 3 |
| Per-device batch size | 2 |
| Gradient accumulation | 4 (effective batch = 8) |
| Learning rate | 2e-5 |
| LR scheduler | Cosine |
| Warmup ratio | 0.03 |
| Max sequence length | 2048 tokens |
| Precision | BF16 |
| Optimizer | AdamW |
| Train loss (final) | 0.8092 |
| Eval loss (final) | 0.3952 |
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha (α) | 16 |
| Dropout | 0.0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable parameters | ~21.2M (0.47% of 4B) |
| Bias | none |
Hardware & Training Time
- Hardware: NVIDIA GB10 Grace Blackwell (NVIDIA DGX Spark, Node 2), 121 GB unified CPU+GPU memory
- Training duration: ~2.5 hours (9,148 seconds, 546 steps at ~16.8s/step)
- Throughput: ~0.48 samples/sec, ~15.7 tok/s inference
- Framework: PyTorch 2.10 (nv25.11), HuggingFace Transformers 5.5.0, PEFT 0.18.1, TRL 0.26.1, Unsloth 2026.4.2
Technical Notes
Qwen3.5-4B uses a hybrid GatedDeltaNet architecture (linear attention layers interleaved with standard softmax attention). Without the native CUDA kernels (causal-conv1d, flash-linear-attention), GatedDeltaNet falls back to a PyTorch CPU implementation, resulting in significantly slower training (~16.8s/step vs ~6s/step for Qwen2.5-VL-7B).
GB10 Blackwell constraints applied during training:
PYTORCH_JIT=0,TORCHDYNAMO_DISABLE=1(nvrtc JIT unsupported on sm_121)- BF16 only — no 4-bit quantization (bitsandbytes NF4 causes silent OOM on GB10)
attn_implementation="eager"(Flash Attention 3 incompatible with Blackwell)
Evaluation
Evaluated on 7 official MCAT practice test sets (~230 questions each), covering all 4 MCAT sections.
Evaluation in progress — results will be updated upon completion.
Companion Model
- jamezoon/qwen2.5-vl-7b-instruct-mcat-lora — Qwen2.5-VL-7B-Instruct fine-tuned on the same MCAT dataset. Overall accuracy: 56.3% across all 7 test sets.
Citation
If you use this adapter, please cite the MCAT practice materials and the SUTD project:
@misc{oon2026mcat_qwen35,
title = {Qwen3.5-4B MCAT LoRA Adapter},
author = {Oon, James},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/jamezoon/qwen3.5-4b-mcat-lora}
}
Framework Versions
- Unsloth 2026.4.2
- PEFT 0.18.1
- Transformers 5.5.0
- TRL 0.26.1
- PyTorch 2.10.0 (nv25.11) + CUDA 13.0
- Downloads last month
- 22