Qwen2.5-7B-Instruct SFT โ€” AIME (16384)

Fine-tuned from Qwen/Qwen2.5-7B-Instruct using QLoRA (4-bit NF4 quantization + LoRA adapters, merged before upload).

Training Configuration

  • Learning rate: 2e-5 (cosine schedule, 5% warmup)
  • Batch size: 1 per device, gradient accumulation 16 (effective batch size 16)
  • Epochs: 3
  • Max sequence length: 16384
  • Precision: bf16
  • Weight decay: 0.01

QLoRA

  • Quantization: 4-bit NF4 with double quantization
  • LoRA rank: 64
  • LoRA alpha: 128
  • LoRA dropout: 0.05
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Loss

  • completion_only_loss: prompt tokens are masked; loss is computed only on assistant completion tokens
  • Dataset is converted from messages to prompt/completion format before training

Dataset

Trained on tinyllms/aime-1983-2023-trajectories. Examples exceeding max_seq_len are filtered out. A 10% holdout is used for evaluation (eval runs every 10 steps).

Infrastructure

  • GPU: NVIDIA H100 80GB
  • Framework: TRL 0.29 + Ray Train
  • Tracking: Weights & Biases (project: pocket-sheet-sft)
  • Run name: sft-v4.5
  • Ray Job ID: raysubmit_iVkMK5RLqcXEJPeA
Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tinyllms/qwen2.5-7b-instruct-sft-aime

Base model

Qwen/Qwen2.5-7B
Finetuned
(3204)
this model
Finetunes
1 model

Dataset used to train tinyllms/qwen2.5-7b-instruct-sft-aime