Algorithmic SFT vs Distillation
Collection
10 LoRA adapters + 6 datasets. Algo template SFT vs QwQ distillation on Qwen2.5-1.5B-Instruct across 4 reasoning domains. โข 16 items โข Updated
LoRA adapter for Qwen/Qwen2.5-1.5B-Instruct fine-tuned on cellular automata via QwQ-32B Distillation.
Part of the Algorithmic SFT vs Distillation experiment studying whether deterministic algorithmic templates teach procedural reasoning more effectively than distillation from large reasoning models.
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Method | QwQ-32B Distillation |
| Framework | LLaMA-Factory (SFT stage) |
| LoRA rank | 64 |
| LoRA target | all linear layers |
| Learning rate | 1e-4 |
| Epochs | 3 |
| Batch size | 1 (grad accum 16) |
| Cutoff length | 32,768 tokens |
| Training data | 5,000 QwQ-32B reasoning traces (d5, filtered for correctness). Teacher solve rate: 28.0% |
| Split | Accuracy |
|---|---|
| Test (in-distribution) | 40.4% |
| Harder variant | 4.8% |
| Structural OOD | 22.4% |
Distillation substantially weaker than algorithmic SFT on this domain (40.4% vs 94.6% test).
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base, "reasoning-degeneration-dev/algo-sft-cellular-automata-distill-qwq")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")