Algorithmic SFT vs Distillation
Collection
10 LoRA adapters + 6 datasets. Algo template SFT vs QwQ distillation on Qwen2.5-1.5B-Instruct across 4 reasoning domains. โข 16 items โข Updated
LoRA adapter for Qwen/Qwen2.5-1.5B-Instruct fine-tuned on conlang morphology via Algorithmic Template SFT.
Part of the Algorithmic SFT vs Distillation experiment studying whether deterministic algorithmic templates teach procedural reasoning more effectively than distillation from large reasoning models.
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Method | Algorithmic Template SFT |
| Framework | LLaMA-Factory (SFT stage) |
| LoRA rank | 64 |
| LoRA target | all linear layers |
| Learning rate | 1e-4 |
| Epochs | 3 |
| Batch size | 4 (grad accum 4) |
| Cutoff length | 32,768 tokens |
| Training data | 5,000 deterministic ordered-affix-attachment traces (d5+d7: 7,289 unique questions, 2-4 features, phonological rules) |
| Split | Accuracy |
|---|---|
| Test (in-distribution) | 98.6% |
| Harder variant | 95.8% |
| Structural OOD | 94.2% (held-out root vocabulary) |
Near-perfect generalization. Diverse training data (7K unique questions) was critical โ same algorithm with only 18 unique questions (d1) scored 16%.
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base, "reasoning-degeneration-dev/algo-sft-conlang-morphology-ordered-rules-d5d7")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")