Qwen2.5-7B-Instruct SFT

Fine-tuned from Qwen/Qwen2.5-7B-Instruct using QLoRA (4-bit NF4 quantization + LoRA adapters, merged before upload).

Training Configuration

Learning rate: 2e-5 (cosine schedule, 5% warmup)
Batch size: 2 per device, gradient accumulation 8 (effective batch size 16)
Epochs: 3
Max sequence length: 8192
Precision: bf16
Weight decay: 0.01

QLoRA

Quantization: 4-bit NF4 with double quantization
LoRA rank: 64
LoRA alpha: 128
LoRA dropout: 0.05
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Loss

completion_only_loss: prompt tokens are masked; loss is computed only on assistant completion tokens
Dataset is converted from messages to prompt/completion format before training

Dataset

Trained on tinyllms/game24-trajectories. Examples exceeding max_seq_len are filtered out. A 10% holdout is used for evaluation (eval runs every 10 steps).

Infrastructure

GPU: NVIDIA H100 80GB
Framework: TRL 0.29 + Ray Train
Tracking: Weights & Biases (project: pocket-sheet-sft)

The training information for this model is provided here: https://wandb.ai/psr-labs/pocket-sheet-sft/runs/22t137yk

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tinyllms/qwen2.5-7b-instruct-sft-game24-qlora

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(3209)

this model

tinyllms
/

qwen2.5-7b-instruct-sft-game24-qlora