Qwen2.5-7B-Instruct — SDFT on Tool Use (Step 1000, Best)

Best checkpoint from SDFT (Self-Distillation Fine-Tuning) reproduction of "Self-Distillation Enables Continual Learning".

Results

Metric	Base	This Model	Paper
Greedy Accuracy	54.4%	64.7%	70.6%
Pass@1	52.6%	56.2%	—
Pass@5	61.5%	70.1%	—
Pass@10	64.4%	74.4%	—
Pass@50	70.6%	79.4%	—

Parameter	Value
Base model	Qwen/Qwen2.5-7B-Instruct
Method	On-policy Self-Distillation (SDFT)
Dataset	ToolAlpaca (4046 train, 68 test)
Learning rate	1e-5
Batch size	32
Epochs	2
EMA alpha	0.01
Step	1000 (best of 1011)
Hardware	L40S 48GB

Safetensors

Model size

333k params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Finetuned

this model