Qwen2.5-7B-Instruct β SDFT on Tool Use (Step 1000, Best)
Best checkpoint from SDFT (Self-Distillation Fine-Tuning) reproduction of "Self-Distillation Enables Continual Learning".
Results
| Metric |
Base |
This Model |
Paper |
| Greedy Accuracy |
54.4% |
64.7% |
70.6% |
| Pass@1 |
52.6% |
56.2% |
β |
| Pass@5 |
61.5% |
70.1% |
β |
| Pass@10 |
64.4% |
74.4% |
β |
| Pass@50 |
70.6% |
79.4% |
β |
Training Details
| Parameter |
Value |
| Base model |
Qwen/Qwen2.5-7B-Instruct |
| Method |
On-policy Self-Distillation (SDFT) |
| Dataset |
ToolAlpaca (4046 train, 68 test) |
| Learning rate |
1e-5 |
| Batch size |
32 |
| Epochs |
2 |
| EMA alpha |
0.01 |
| Step |
1000 (best of 1011) |
| Hardware |
L40S 48GB |
All Checkpoints
Related