Qwen3.5-4B MCAT LoRA Adapter

A LoRA fine-tuned adapter for unsloth/Qwen3.5-4B on the MCAT examination dataset — 1,610 questions across 7 official MCAT practice test sets covering the four MCAT sections: Biological and Biochemical Foundations (BB), Critical Analysis and Reasoning Skills (CARS), Chemical and Physical Foundations (CP), and Psychological, Social, and Biological Foundations (PS).

Model Details

Developed by: James Oon (@jamezoon), SUTD MSTR-DAIE Deep Learning Project
Model type: Causal LM with LoRA adapter (PEFT)
Base model: Qwen/Qwen3.5-4B (dense, 4B parameters, BF16)
Language: English
License: Follows base model license (Qwen3.5)
Adapter size: ~82 MB (adapter_model.safetensors)

Intended Use

MCAT examination multiple-choice question answering. Given a passage (where applicable) and a 4-option question (A–D), the model selects the correct answer with a step-by-step explanation. Sections covered:

BB — Biological and Biochemical Foundations of Living Systems
CARS — Critical Analysis and Reasoning Skills
CP — Chemical and Physical Foundations of Biological Systems
PS — Psychological, Social, and Biological Foundations of Behavior

Not intended for real clinical or academic decision-making. This is a research/educational model.

How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model_id = "Qwen/Qwen3.5-4B"
adapter_id = "jamezoon/qwen3.5-4b-mcat-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

messages = [
    {
        "role": "system",
        "content": "You are a helpful tutor for students preparing for the MCAT. "
                   "Answer the following multiple choice question by thinking step by step, then give the answer."
    },
    {
        "role": "user",
        "content": (
            "Passage: During a study of enzyme kinetics, researchers measured the rate of reaction "
            "at varying substrate concentrations in the presence and absence of an inhibitor.\n\n"
            "Question: Which of the following best describes competitive inhibition?\n"
            "Options: A. Vmax decreases, Km unchanged  "
            "B. Vmax unchanged, Km increases  "
            "C. Both Vmax and Km decrease  "
            "D. Both Vmax and Km increase\n"
            "Think step by step. Then respond in the format:\n"
            "Explanation: ...\nAnswer: <one of A, B, C, D>"
        ),
    },
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Details

Dataset

MCAT Practice Exams — 1,449 training samples, 161 validation samples (90/10 split)
7 official MCAT practice test sets, ~230 questions each
4 MCAT sections: BB, CARS, CP, PS
Each sample: optional passage + question + 4 options + correct answer (A–D)
Formatted as chat messages with system/user/assistant roles

Training Procedure

Hyperparameter	Value
Training steps	546
Epochs	3
Per-device batch size	2
Gradient accumulation	4 (effective batch = 8)
Learning rate	2e-5
LR scheduler	Cosine
Warmup ratio	0.03
Max sequence length	2048 tokens
Precision	BF16
Optimizer	AdamW
Train loss (final)	0.8092
Eval loss (final)	0.3952

LoRA Configuration

Parameter	Value
Rank (r)	16
Alpha (α)	16
Dropout	0.0
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters	~21.2M (0.47% of 4B)
Bias	none

Hardware & Training Time

Hardware: NVIDIA GB10 Grace Blackwell (NVIDIA DGX Spark, Node 2), 121 GB unified CPU+GPU memory
Training duration: ~2.5 hours (9,148 seconds, 546 steps at ~16.8s/step)
Throughput: ~0.48 samples/sec, ~15.7 tok/s inference
Framework: PyTorch 2.10 (nv25.11), HuggingFace Transformers 5.5.0, PEFT 0.18.1, TRL 0.26.1, Unsloth 2026.4.2

Technical Notes

Qwen3.5-4B uses a hybrid GatedDeltaNet architecture (linear attention layers interleaved with standard softmax attention). Without the native CUDA kernels (causal-conv1d, flash-linear-attention), GatedDeltaNet falls back to a PyTorch CPU implementation, resulting in significantly slower training (~16.8s/step vs ~6s/step for Qwen2.5-VL-7B).

GB10 Blackwell constraints applied during training:

PYTORCH_JIT=0, TORCHDYNAMO_DISABLE=1 (nvrtc JIT unsupported on sm_121)
BF16 only — no 4-bit quantization (bitsandbytes NF4 causes silent OOM on GB10)
attn_implementation="eager" (Flash Attention 3 incompatible with Blackwell)

Evaluation

Evaluated on 7 official MCAT practice test sets (~230 questions each), covering all 4 MCAT sections.

Evaluation in progress — results will be updated upon completion.

Companion Model

jamezoon/qwen2.5-vl-7b-instruct-mcat-lora — Qwen2.5-VL-7B-Instruct fine-tuned on the same MCAT dataset. Overall accuracy: 56.3% across all 7 test sets.

Citation

If you use this adapter, please cite the MCAT practice materials and the SUTD project:

@misc{oon2026mcat_qwen35,
  title     = {Qwen3.5-4B MCAT LoRA Adapter},
  author    = {Oon, James},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/jamezoon/qwen3.5-4b-mcat-lora}
}

Framework Versions

Unsloth 2026.4.2
PEFT 0.18.1
Transformers 5.5.0
TRL 0.26.1
PyTorch 2.10.0 (nv25.11) + CUDA 13.0

Downloads last month: 22

Model tree for jamezoon/qwen3.5-4b-mcat-lora

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Adapter

(100)

this model