SmolLM2-135M-Reasoning-SFT1K-Champion 🏆🏆

Double Champion: ARC-E=58.00% and PIQA=67.36%

This model was trained with only 1,000 SFT samples on SmolLM2-135M-Instruct, achieving the best ARC-E and PIQA scores among 162+ evaluated models in our comprehensive optimization study.

Key Finding: Less Training = Better Generalization

The most important discovery: minimal SFT (1K samples) preserves base model knowledge while still teaching the reasoning format ( tags). This produces better benchmark scores than models trained with 100K+ samples.

Benchmark Results (0-shot)

Metric	This Model	Baseline	Best DPO (exp133)
ARC-Challenge (acc_norm)	28.33%	27.73%	29.35%
ARC-Easy (acc)	58.00% 🏆	54.12%	57.87%
GSM8K (flex_extract)	0.15%	0.00%	0.00%
HellaSwag (acc_norm)	43.04%	42.99%	43.29%
PIQA (acc_norm)	67.36% 🏆	66.92%	67.30%

Training Details

Base: HuggingFaceTB/SmolLM2-135M-Instruct
Method: SFT with 1,000 mixed samples (50% reasoning, 50% non-reasoning)
Epochs: 1
Learning rate: 3e-4 (cosine schedule)
Batch: 8 × 16 gradient accumulation
GPU: MetaX C500 (64GB)

Implications

This result suggests that for tiny models (135M), aggressive SFT causes catastrophic forgetting of pretrained knowledge. The optimal strategy is minimal SFT followed by targeted preference optimization.

Downloads last month: 118

Safetensors

Model size

0.1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lldois/SmolLM2-135M-Reasoning-SFT1K-Champion

Base model

HuggingFaceTB/SmolLM2-135M

Quantized

HuggingFaceTB/SmolLM2-135M-Instruct

Finetuned

(310)

this model