Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT — GGUF

GGUF quantizations of reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT for local, mobile, and edge deployment via llama.cpp and compatible runtimes.

A 30B Thinking teacher compressed 50x into a model that fits on a smartwatch.

Available Quantizations

File	Quant	Size	Use Case
`qwen3-0.6b-distilled-30b-thinking-sft-f16.gguf`	F16	~1.3 GB	Full precision reference
`qwen3-0.6b-distilled-30b-thinking-sft-Q8_0.gguf`	Q8_0	~700 MB	Near-lossless, desktop/laptop
`qwen3-0.6b-distilled-30b-thinking-sft-Q5_K_M.gguf`	Q5_K_M	~500 MB	Balanced, mobile
`qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf`	Q4_K_M	~400 MB	Smallest, IoT/edge/smartwatch

Recommended: Q5_K_M for mobile, Q4_K_M for maximum compression.

About the Model

Two-stage build:

Stage 1 — Thinking Teacher Distillation: Qwen3-0.6B distilled from Qwen3-30B-A3B-Thinking on 6,122 STEM chain-of-thought samples. The Thinking variant teacher produces extended reasoning traces with higher-entropy distributions, transferring richer deliberation structure into the student. Proof-weighted cross-entropy (2.5x → 1.5x on derivation tokens) + KL divergence at T=2.0.

Stage 2 — Legal SFT: Supervised fine-tuning on Alignment-Lab-AI/Lawyer-Instruct at conservative learning rate (5e-6) to layer legal reasoning on top of the STEM backbone without overwriting it.

Attribute	Value
Base model	Qwen/Qwen3-0.6B
Teacher model	Qwen/Qwen3-30B-A3B-Thinking-2507
Compression	50x parameters, ~75x with Q4_K_M
Developer	Reaperdoesntrun / Convergent Intelligence LLC: Research Division

Usage

llama.cpp CLI

./llama-cli -m qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf \
  -p "### Instruction:\nWhat is promissory estoppel?\n\n### Response:\n" \
  -n 512 --temp 0.0

llama.cpp Python

from llama_cpp import Llama

llm = Llama(model_path="qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf", n_ctx=1024)

output = llm(
    "### Instruction:\nProve that the square root of 2 is irrational.\n\n### Response:\n",
    max_tokens=512,
    temperature=0.0,
)
print(output["choices"][0]["text"])

Ollama

echo 'FROM ./qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf' > Modelfile
ollama create stem-legal-tiny -f Modelfile
ollama run stem-legal-tiny "Explain the difference between a felony and a misdemeanor."

LM Studio

Download any GGUF file from this repo and load directly in LM Studio.

Prompt Formats

STEM derivation (Stage 1):

Solve the following problem carefully and show a rigorous derivation.

Problem:
[Your problem]

Proof:

Instruction-following (Stage 2):

### Instruction:
[Your question]

### Response:

Limitations

0.6B is a hard capacity constraint. The model trades depth for deployability — it will make errors that larger models avoid. Multi-step proofs beyond ~8 steps degrade. Legal reasoning covers general concepts but lacks nuance. Always verify critical outputs. This is not a substitute for formal proof verification, licensed legal counsel, or professional analysis.

Source Model

Full training methodology, hyperparameters, and the two-stage pipeline are documented in:

reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT

Mathematical Foundations

This is a GGUF-quantized variant. The mathematical foundations (Discrepancy Calculus, Topological Knowledge Distillation) are documented in the source model's card. The discrepancy operator $Df(x)$ and BV decomposition that inform the training pipeline are preserved through quantization — the structural boundaries detected by DISC during training are baked into the weights, not dependent on precision.

Related Models

Model	Description
Qwen3-0.6B-STEM-Proof-Distilled-Thinking	Stage 1 only — pure STEM backbone
Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT	Full precision source model
Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF	Larger 1.7B variant GGUF

Citation

@misc{colca2026thinking06bgguf,
  title={Qwen3-0.6B Distilled Thinking SFT: 50x Compression GGUF for Edge Deployment},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF},
  note={Convergent Intelligence LLC: Research Division}
}

Convergent Intelligence LLC: Research Division "Where classical analysis fails to see, we begin."

Convergent Intelligence Portfolio

Part of the Qwen3 0.6B Distillation Series by Convergent Intelligence LLC: Research Division

Mathematical Foundations

Related Models

Model	Downloads	Format
Qwen3-0.6B-Distilled-30B-A3B	36	HF
Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT	33	HF

Top Models from Our Lab

Model	Downloads
Qwen3-1.7B-Thinking-Distil	501
LFM2.5-1.2B-Distilled-SFT	342
Qwen3-1.7B-Coder-Distilled-SFT	302
Qwen3-1.7B-Coder-Distilled-SFT-GGUF	194
Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF	175

Total Portfolio: 41 models | 2,781 total downloads

Last updated: 2026-03-28 12:49 UTC

DistilQwen Collection

This model is part of the DistilQwen proof-weighted distillation series. Collection: 9 models | 2,788 downloads

Teacher Variant Comparison

Teacher	Student Size	Strength	Models
Qwen3-30B-A3B (Instruct)	1.7B	Instruction following, structured output, legal reasoning	3 (833 DL)
Qwen3-30B-A3B (Thinking)	0.6B	Extended deliberation, higher-entropy distributions, proof derivation	3 (779 DL) ← this model
Qwen3-30B-A3B (Coder)	1.7B	Structured decomposition, STEM derivation, logical inference	2 (825 DL)

Methodology

The only BF16 collection in the portfolio. While the broader Convergent Intelligence catalog (43 models, 12,000+ downloads) was trained on CPU at FP32 for $24 total compute, the DistilQwen series was trained on H100 at BF16 with a 30B-parameter teacher. Same methodology, premium hardware. This is what happens when you give the pipeline real compute.

All models use proof-weighted knowledge distillation: 55% cross-entropy with decaying proof weights (2.5× → 1.5×), 45% KL divergence at T=2.0. The proof weight amplifies loss on reasoning-critical tokens, forcing the student to allocate capacity to structural understanding rather than surface-level pattern matching.

Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)