Safetensors
qwen2
max_seq_length=16384
lr=2e-5
batch_size=1
grad_accum=16
epochs=3
qlora
quantize=4bit_nf4
lora_rank=64
lora_alpha=128
lora_dropout=0.05
completion_only_loss
eval_size=0.1
cosine_schedule
warmup=0.05
bf16
run_name=sft-v4.5
hub_model_id=tinyllms/qwen2.5-7b-instruct-sft-aime
ray_job=raysubmit_iVkMK5RLqcXEJPeA
Qwen2.5-7B-Instruct SFT โ AIME (16384)
Fine-tuned from Qwen/Qwen2.5-7B-Instruct using QLoRA (4-bit NF4 quantization + LoRA adapters, merged before upload).
Training Configuration
- Learning rate: 2e-5 (cosine schedule, 5% warmup)
- Batch size: 1 per device, gradient accumulation 16 (effective batch size 16)
- Epochs: 3
- Max sequence length: 16384
- Precision: bf16
- Weight decay: 0.01
QLoRA
- Quantization: 4-bit NF4 with double quantization
- LoRA rank: 64
- LoRA alpha: 128
- LoRA dropout: 0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Loss
- completion_only_loss: prompt tokens are masked; loss is computed only on assistant completion tokens
- Dataset is converted from
messagestoprompt/completionformat before training
Dataset
Trained on tinyllms/aime-1983-2023-trajectories. Examples exceeding max_seq_len are filtered out. A 10% holdout is used for evaluation (eval runs every 10 steps).
Infrastructure
- GPU: NVIDIA H100 80GB
- Framework: TRL 0.29 + Ray Train
- Tracking: Weights & Biases (project:
pocket-sheet-sft) - Run name: sft-v4.5
- Ray Job ID: raysubmit_iVkMK5RLqcXEJPeA
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support