Formal Logic โ€” QwQ Distillation

LoRA adapter for Qwen/Qwen2.5-1.5B-Instruct fine-tuned on formal logic via QwQ-32B Distillation.

Part of the Algorithmic SFT vs Distillation experiment studying whether deterministic algorithmic templates teach procedural reasoning more effectively than distillation from large reasoning models.

Training

Parameter Value
Base model Qwen/Qwen2.5-1.5B-Instruct
Method QwQ-32B Distillation
Framework LLaMA-Factory (SFT stage)
LoRA rank 64
LoRA target all linear layers
Learning rate 1e-4
Epochs 3
Batch size 1 (grad accum 16)
Cutoff length 32,768 tokens
Training data 5,000 QwQ-32B reasoning traces (d5, filtered). Teacher solve rate: 92.0%

Evaluation (v3, MAX_TOKENS=32768)

Split Accuracy
Test (in-distribution) 87.4%
Harder variant 85.6%
Structural OOD 71.2%

Notes

Strongest distillation result. Competitive in-distribution but 20pp gap on OOD vs algo SFT.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base, "reasoning-degeneration-dev/algo-sft-formal-logic-distill-qwq")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")

Related Datasets

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for reasoning-degeneration-dev/algo-sft-formal-logic-distill-qwq

Adapter
(825)
this model

Collection including reasoning-degeneration-dev/algo-sft-formal-logic-distill-qwq