Qwen2.5-7B-Instruct SDFT — Tool Use (Step 1011)

This model is a Self-Distillation Fine-Tuned (SDFT) version of Qwen/Qwen2.5-7B-Instruct, trained on the ToolAlpaca tool-use dataset.

SDFT is an on-policy learning method from "Self-Distillation Enables Continual Learning" that acquires new skills while preserving prior capabilities, significantly reducing catastrophic forgetting compared to standard SFT.

Training Details

Parameter Value
Base model Qwen/Qwen2.5-7B-Instruct
Method SDFT (On-Policy Self-Distillation)
Dataset ToolAlpaca (4,046 training examples)
Training step 1011 / 1011
Learning rate 2e-5 (cosine schedule, 10% warmup)
Batch size 32 (gradient accumulation)
Epochs 1
Precision bf16
Max prompt length 1024
Max completion length 1024
EMA alpha 0.01
Hardware 1x NVIDIA L40S 48GB
Training time ~42 hours (full run)

Evaluation Results

Tool-Use Accuracy (ToolAlpaca test set, 68 examples)

Metric Base Model This Model (Step 1011)
Greedy Accuracy 54.4% 57.4%
pass@1 52.6% 49.9%
pass@5 61.5% 64.6%
pass@10 64.3% 70.0%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Ayushnangia/qwen2.5-7b-instruct-sdft-tooluse-step-1011")
tokenizer = AutoTokenizer.from_pretrained("Ayushnangia/qwen2.5-7b-instruct-sdft-tooluse-step-1011")

messages = [{"role": "user", "content": "Your tool-use prompt here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

All Checkpoints

Citation

@article{shenfeld2025selfdistillation,
  title={Self-Distillation Enables Continual Learning},
  author={Shenfeld, Idan and others},
  journal={arXiv preprint arXiv:2601.19897},
  year={2025}
}
Downloads last month
4
Safetensors
Model size
333k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ayushnangia-sdft/qwen2.5-7b-instruct-sdft-tooluse-step-1011

Base model

Qwen/Qwen2.5-7B
Finetuned
(3210)
this model

Paper for ayushnangia-sdft/qwen2.5-7b-instruct-sdft-tooluse-step-1011