Qwen2.5-1.5B-GSM8K-SFT

Fine-tuned Qwen/Qwen2.5-1.5B on GSM8K for math reasoning with a structured answer format.

Training Details

Parameter	Value
Base model	`Qwen/Qwen2.5-1.5B`
Method	SFT + LoRA
LoRA r / alpha	32 / 16
LoRA targets	all-linear
Epochs	1
Learning rate	0.0002
Batch size	8 × 4 (grad accum)
Training examples	1024
Precision	bf16

Answer Format

The model is trained to end responses with:

The answer is: {number}.

System Prompt

You are a helpful math assistant. Solve the problem step by step, then give your final answer as a single number on the last line in exact format
The answer is: {number}.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tripathysagar/Qwen2.5-1.5B-GSM8K-SFT

Base model

Qwen/Qwen2.5-1.5B

Adapter

(510)

this model

tripathysagar
/

Qwen2.5-1.5B-GSM8K-SFT

Qwen2.5-1.5B-GSM8K-SFT

Training Details

Answer Format

System Prompt

Model tree for tripathysagar/Qwen2.5-1.5B-GSM8K-SFT

Dataset used to train tripathysagar/Qwen2.5-1.5B-GSM8K-SFT