SmolLM2-135M-Reasoning-SFT1K-Champion πŸ†πŸ†

Double Champion: ARC-E=58.00% and PIQA=67.36%

This model was trained with only 1,000 SFT samples on SmolLM2-135M-Instruct, achieving the best ARC-E and PIQA scores among 162+ evaluated models in our comprehensive optimization study.

Key Finding: Less Training = Better Generalization

The most important discovery: minimal SFT (1K samples) preserves base model knowledge while still teaching the reasoning format ( tags). This produces better benchmark scores than models trained with 100K+ samples.

Benchmark Results (0-shot)

Metric This Model Baseline Best DPO (exp133)
ARC-Challenge (acc_norm) 28.33% 27.73% 29.35%
ARC-Easy (acc) 58.00% πŸ† 54.12% 57.87%
GSM8K (flex_extract) 0.15% 0.00% 0.00%
HellaSwag (acc_norm) 43.04% 42.99% 43.29%
PIQA (acc_norm) 67.36% πŸ† 66.92% 67.30%

Training Details

  • Base: HuggingFaceTB/SmolLM2-135M-Instruct
  • Method: SFT with 1,000 mixed samples (50% reasoning, 50% non-reasoning)
  • Epochs: 1
  • Learning rate: 3e-4 (cosine schedule)
  • Batch: 8 Γ— 16 gradient accumulation
  • GPU: MetaX C500 (64GB)

Implications

This result suggests that for tiny models (135M), aggressive SFT causes catastrophic forgetting of pretrained knowledge. The optimal strategy is minimal SFT followed by targeted preference optimization.

Downloads last month
118
Safetensors
Model size
0.1B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lldois/SmolLM2-135M-Reasoning-SFT1K-Champion

Finetuned
(310)
this model