Qwen2.5-7B-Instruct — SFT on Tool Use (lr=5e-5, bs=32, 1 epoch)

SDFT checkpoints: <a href="https://huggingface.co/Ayushnangia/qwen2.5-7b-instruct-sdft-tooluse-step-1000" rel="nofollow">Ayushnangia/qwen2.5-7b-instruct-sdft-tooluse-step-1000 (best, 64.7%)
Eval results: <a href="https://huggingface.co/datasets/Ayushnangia/sdft-reproduction-eval-results" rel="nofollow">Ayushnangia/sdft-reproduction-eval-results
Code: github.com/ayushnangia/Self-Distillation

SFT baseline for reproducing "Self-Distillation Enables Continual Learning".

Training Details

Parameter	Value
Base model	Qwen/Qwen2.5-7B-Instruct
Method	Supervised Fine-Tuning (SFT)
Dataset	ToolAlpaca (4046 train, 68 test)
Learning rate	5e-5
Batch size	32 (gradient accumulation)
Epochs	1
Seed	42
DeepSpeed	ZeRO-2 + CPU offload
Hardware	L40S 48GB

Evaluation

Not yet evaluated. Greedy accuracy + pass@k pending.

Paper's SFT baseline: 63.2% greedy accuracy on tool-use.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Ayushnangia/qwen2.5-7b-instruct-sft-tooluse-lr5e-5-bs32-ep1")
tokenizer = AutoTokenizer.from_pretrained("Ayushnangia/qwen2.5-7b-instruct-sft-tooluse-lr5e-5-bs32-ep1")