Safetensors
qwen2
leave-one-out
loo-domain-knowledge
max_seq_length=16384
lr=2e-5
batch_size=1
grad_accum=16
epochs=1
qlora
quantize=4bit_nf4
lora_rank=64
lora_alpha=128
lora_dropout=0.05
completion_only_loss
eval_size=0.1
cosine_schedule
warmup=0.05
bf16
ddp_workers=2
ray_job=raysubmit_A55M5NnZckrXmfWN
Qwen2.5-7B-Instruct SFT — LOO Domain Knowledge
Fine-tuned from Qwen/Qwen2.5-7B-Instruct using QLoRA (4-bit NF4 quantization + LoRA adapters, merged before upload).
This is the SFT stage of a leave-one-out (LOO) experiment: the model is trained on Game24 and AIME trajectories, deliberately excluding domain knowledge (GPQA) data. The held-out domain is later used to measure cross-domain transfer.
Training Configuration
- Learning rate: 2e-5 (cosine schedule, 5% warmup)
- Batch size: 1 per device, gradient accumulation 16 (effective batch size 32 with 2 workers)
- Epochs: 1
- Max sequence length: 16384
- Precision: bf16
- Weight decay: 0.01
QLoRA
- Quantization: 4-bit NF4 with double quantization
- LoRA rank: 64
- LoRA alpha: 128
- LoRA dropout: 0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Loss
- completion_only_loss: prompt tokens are masked; loss is computed only on assistant completion tokens
- Dataset is converted from
messagestoprompt/completionformat before training
Datasets
Trained on two datasets (domain knowledge held out):
| Dataset | Domain |
|---|---|
tinyllms/game24-trajectories |
Game of 24 — arithmetic reasoning |
tinyllms/aime-1983-2023-trajectories |
AIME — competition math |
Examples exceeding max_seq_len are filtered out. A 10% holdout is used for evaluation (eval runs every 10 steps).
Leave-One-Out Design
| Domain | Role |
|---|---|
| Game24 | Train |
| AIME | Train |
| Domain Knowledge (GPQA) | Held out |
The GRPO stage follows using tinyllms/qwen2.5-7b-instruct-grpo-loo-domain-knowledge, trained on the same two datasets. Transfer is measured by evaluating on GPQA Diamond.
Infrastructure
- GPU: 2x NVIDIA H100 80GB (DDP)
- Framework: TRL 0.29 + Ray Train
- Tracking: Weights & Biases (project:
pocket-sheet-sft) - Ray Job ID: raysubmit_A55M5NnZckrXmfWN
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support