SmolLM2-135M-Reasoning-SFT1K-Champion ππ
Double Champion: ARC-E=58.00% and PIQA=67.36%
This model was trained with only 1,000 SFT samples on SmolLM2-135M-Instruct, achieving the best ARC-E and PIQA scores among 162+ evaluated models in our comprehensive optimization study.
Key Finding: Less Training = Better Generalization
The most important discovery: minimal SFT (1K samples) preserves base model knowledge while still teaching the reasoning format ( tags). This produces better benchmark scores than models trained with 100K+ samples.
Benchmark Results (0-shot)
| Metric | This Model | Baseline | Best DPO (exp133) |
|---|---|---|---|
| ARC-Challenge (acc_norm) | 28.33% | 27.73% | 29.35% |
| ARC-Easy (acc) | 58.00% π | 54.12% | 57.87% |
| GSM8K (flex_extract) | 0.15% | 0.00% | 0.00% |
| HellaSwag (acc_norm) | 43.04% | 42.99% | 43.29% |
| PIQA (acc_norm) | 67.36% π | 66.92% | 67.30% |
Training Details
- Base: HuggingFaceTB/SmolLM2-135M-Instruct
- Method: SFT with 1,000 mixed samples (50% reasoning, 50% non-reasoning)
- Epochs: 1
- Learning rate: 3e-4 (cosine schedule)
- Batch: 8 Γ 16 gradient accumulation
- GPU: MetaX C500 (64GB)
Implications
This result suggests that for tiny models (135M), aggressive SFT causes catastrophic forgetting of pretrained knowledge. The optimal strategy is minimal SFT followed by targeted preference optimization.
- Downloads last month
- 118
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for lldois/SmolLM2-135M-Reasoning-SFT1K-Champion
Base model
HuggingFaceTB/SmolLM2-135M Quantized
HuggingFaceTB/SmolLM2-135M-Instruct