Qwen3.5-4B MCAT LoRA Adapter

A LoRA fine-tuned adapter for unsloth/Qwen3.5-4B on the MCAT examination dataset — 1,610 questions across 7 official MCAT practice test sets covering the four MCAT sections: Biological and Biochemical Foundations (BB), Critical Analysis and Reasoning Skills (CARS), Chemical and Physical Foundations (CP), and Psychological, Social, and Biological Foundations (PS).

Model Details

  • Developed by: James Oon (@jamezoon), SUTD MSTR-DAIE Deep Learning Project
  • Model type: Causal LM with LoRA adapter (PEFT)
  • Base model: Qwen/Qwen3.5-4B (dense, 4B parameters, BF16)
  • Language: English
  • License: Follows base model license (Qwen3.5)
  • Adapter size: ~82 MB (adapter_model.safetensors)

Intended Use

MCAT examination multiple-choice question answering. Given a passage (where applicable) and a 4-option question (A–D), the model selects the correct answer with a step-by-step explanation. Sections covered:

  • BB — Biological and Biochemical Foundations of Living Systems
  • CARS — Critical Analysis and Reasoning Skills
  • CP — Chemical and Physical Foundations of Biological Systems
  • PS — Psychological, Social, and Biological Foundations of Behavior

Not intended for real clinical or academic decision-making. This is a research/educational model.

How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model_id = "Qwen/Qwen3.5-4B"
adapter_id = "jamezoon/qwen3.5-4b-mcat-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

messages = [
    {
        "role": "system",
        "content": "You are a helpful tutor for students preparing for the MCAT. "
                   "Answer the following multiple choice question by thinking step by step, then give the answer."
    },
    {
        "role": "user",
        "content": (
            "Passage: During a study of enzyme kinetics, researchers measured the rate of reaction "
            "at varying substrate concentrations in the presence and absence of an inhibitor.\n\n"
            "Question: Which of the following best describes competitive inhibition?\n"
            "Options: A. Vmax decreases, Km unchanged  "
            "B. Vmax unchanged, Km increases  "
            "C. Both Vmax and Km decrease  "
            "D. Both Vmax and Km increase\n"
            "Think step by step. Then respond in the format:\n"
            "Explanation: ...\nAnswer: <one of A, B, C, D>"
        ),
    },
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Details

Dataset

  • MCAT Practice Exams — 1,449 training samples, 161 validation samples (90/10 split)
  • 7 official MCAT practice test sets, ~230 questions each
  • 4 MCAT sections: BB, CARS, CP, PS
  • Each sample: optional passage + question + 4 options + correct answer (A–D)
  • Formatted as chat messages with system/user/assistant roles

Training Procedure

Hyperparameter Value
Training steps 546
Epochs 3
Per-device batch size 2
Gradient accumulation 4 (effective batch = 8)
Learning rate 2e-5
LR scheduler Cosine
Warmup ratio 0.03
Max sequence length 2048 tokens
Precision BF16
Optimizer AdamW
Train loss (final) 0.8092
Eval loss (final) 0.3952

LoRA Configuration

Parameter Value
Rank (r) 16
Alpha (α) 16
Dropout 0.0
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters ~21.2M (0.47% of 4B)
Bias none

Hardware & Training Time

  • Hardware: NVIDIA GB10 Grace Blackwell (NVIDIA DGX Spark, Node 2), 121 GB unified CPU+GPU memory
  • Training duration: ~2.5 hours (9,148 seconds, 546 steps at ~16.8s/step)
  • Throughput: ~0.48 samples/sec, ~15.7 tok/s inference
  • Framework: PyTorch 2.10 (nv25.11), HuggingFace Transformers 5.5.0, PEFT 0.18.1, TRL 0.26.1, Unsloth 2026.4.2

Technical Notes

Qwen3.5-4B uses a hybrid GatedDeltaNet architecture (linear attention layers interleaved with standard softmax attention). Without the native CUDA kernels (causal-conv1d, flash-linear-attention), GatedDeltaNet falls back to a PyTorch CPU implementation, resulting in significantly slower training (~16.8s/step vs ~6s/step for Qwen2.5-VL-7B).

GB10 Blackwell constraints applied during training:

  • PYTORCH_JIT=0, TORCHDYNAMO_DISABLE=1 (nvrtc JIT unsupported on sm_121)
  • BF16 only — no 4-bit quantization (bitsandbytes NF4 causes silent OOM on GB10)
  • attn_implementation="eager" (Flash Attention 3 incompatible with Blackwell)

Evaluation

Evaluated on 7 official MCAT practice test sets (~230 questions each), covering all 4 MCAT sections.

Evaluation in progress — results will be updated upon completion.

Companion Model

Citation

If you use this adapter, please cite the MCAT practice materials and the SUTD project:

@misc{oon2026mcat_qwen35,
  title     = {Qwen3.5-4B MCAT LoRA Adapter},
  author    = {Oon, James},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/jamezoon/qwen3.5-4b-mcat-lora}
}

Framework Versions

  • Unsloth 2026.4.2
  • PEFT 0.18.1
  • Transformers 5.5.0
  • TRL 0.26.1
  • PyTorch 2.10.0 (nv25.11) + CUDA 13.0
Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jamezoon/qwen3.5-4b-mcat-lora

Finetuned
Qwen/Qwen3.5-4B
Adapter
(100)
this model