Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT โ GGUF
GGUF quantizations of reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT for local, mobile, and edge deployment via llama.cpp and compatible runtimes.
A 30B Thinking teacher compressed 50x into a model that fits on a smartwatch.
Available Quantizations
| File | Quant | Size | Use Case |
|---|---|---|---|
qwen3-0.6b-distilled-30b-thinking-sft-f16.gguf |
F16 | ~1.3 GB | Full precision reference |
qwen3-0.6b-distilled-30b-thinking-sft-Q8_0.gguf |
Q8_0 | ~700 MB | Near-lossless, desktop/laptop |
qwen3-0.6b-distilled-30b-thinking-sft-Q5_K_M.gguf |
Q5_K_M | ~500 MB | Balanced, mobile |
qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf |
Q4_K_M | ~400 MB | Smallest, IoT/edge/smartwatch |
Recommended: Q5_K_M for mobile, Q4_K_M for maximum compression.
About the Model
Two-stage build:
Stage 1 โ Thinking Teacher Distillation: Qwen3-0.6B distilled from Qwen3-30B-A3B-Thinking on 6,122 STEM chain-of-thought samples. The Thinking variant teacher produces extended reasoning traces with higher-entropy distributions, transferring richer deliberation structure into the student. Proof-weighted cross-entropy (2.5x โ 1.5x on derivation tokens) + KL divergence at T=2.0.
Stage 2 โ Legal SFT: Supervised fine-tuning on Alignment-Lab-AI/Lawyer-Instruct at conservative learning rate (5e-6) to layer legal reasoning on top of the STEM backbone without overwriting it.
| Attribute | Value |
|---|---|
| Base model | Qwen/Qwen3-0.6B |
| Teacher model | Qwen/Qwen3-30B-A3B-Thinking-2507 |
| Compression | 50x parameters, ~75x with Q4_K_M |
| Developer | Reaperdoesntrun / Convergent Intelligence LLC: Research Division |
Usage
llama.cpp CLI
./llama-cli -m qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf \
-p "### Instruction:\nWhat is promissory estoppel?\n\n### Response:\n" \
-n 512 --temp 0.0
llama.cpp Python
from llama_cpp import Llama
llm = Llama(model_path="qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf", n_ctx=1024)
output = llm(
"### Instruction:\nProve that the square root of 2 is irrational.\n\n### Response:\n",
max_tokens=512,
temperature=0.0,
)
print(output["choices"][0]["text"])
Ollama
echo 'FROM ./qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf' > Modelfile
ollama create stem-legal-tiny -f Modelfile
ollama run stem-legal-tiny "Explain the difference between a felony and a misdemeanor."
LM Studio
Download any GGUF file from this repo and load directly in LM Studio.
Prompt Formats
STEM derivation (Stage 1):
Solve the following problem carefully and show a rigorous derivation.
Problem:
[Your problem]
Proof:
Instruction-following (Stage 2):
### Instruction:
[Your question]
### Response:
Limitations
0.6B is a hard capacity constraint. The model trades depth for deployability โ it will make errors that larger models avoid. Multi-step proofs beyond ~8 steps degrade. Legal reasoning covers general concepts but lacks nuance. Always verify critical outputs. This is not a substitute for formal proof verification, licensed legal counsel, or professional analysis.
Source Model
Full training methodology, hyperparameters, and the two-stage pipeline are documented in:
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT
Mathematical Foundations
This is a GGUF-quantized variant. The mathematical foundations (Discrepancy Calculus, Topological Knowledge Distillation) are documented in the source model's card. The discrepancy operator $Df(x)$ and BV decomposition that inform the training pipeline are preserved through quantization โ the structural boundaries detected by DISC during training are baked into the weights, not dependent on precision.
Related Models
| Model | Description |
|---|---|
| Qwen3-0.6B-STEM-Proof-Distilled-Thinking | Stage 1 only โ pure STEM backbone |
| Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT | Full precision source model |
| Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF | Larger 1.7B variant GGUF |
Citation
@misc{colca2026thinking06bgguf,
title={Qwen3-0.6B Distilled Thinking SFT: 50x Compression GGUF for Edge Deployment},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF},
note={Convergent Intelligence LLC: Research Division}
}
Convergent Intelligence LLC: Research Division "Where classical analysis fails to see, we begin."
Convergent Intelligence Portfolio
Part of the Qwen3 0.6B Distillation Series by Convergent Intelligence LLC: Research Division
Mathematical Foundations
This is a GGUF-quantized variant. The mathematical foundations (Discrepancy Calculus, Topological Knowledge Distillation) are documented in the source model's card. The discrepancy operator $Df(x)$ and BV decomposition that inform the training pipeline are preserved through quantization โ the structural boundaries detected by DISC during training are baked into the weights, not dependent on precision.
Related Models
| Model | Downloads | Format |
|---|---|---|
| Qwen3-0.6B-Distilled-30B-A3B | 36 | HF |
| Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT | 33 | HF |
Top Models from Our Lab
| Model | Downloads |
|---|---|
| Qwen3-1.7B-Thinking-Distil | 501 |
| LFM2.5-1.2B-Distilled-SFT | 342 |
| Qwen3-1.7B-Coder-Distilled-SFT | 302 |
| Qwen3-1.7B-Coder-Distilled-SFT-GGUF | 194 |
| Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF | 175 |
Total Portfolio: 41 models | 2,781 total downloads
Last updated: 2026-03-28 12:49 UTC
DistilQwen Collection
This model is part of the DistilQwen proof-weighted distillation series. Collection: 9 models | 2,788 downloads
Teacher Variant Comparison
| Teacher | Student Size | Strength | Models |
|---|---|---|---|
| Qwen3-30B-A3B (Instruct) | 1.7B | Instruction following, structured output, legal reasoning | 3 (833 DL) |
| Qwen3-30B-A3B (Thinking) | 0.6B | Extended deliberation, higher-entropy distributions, proof derivation | 3 (779 DL) โ this model |
| Qwen3-30B-A3B (Coder) | 1.7B | Structured decomposition, STEM derivation, logical inference | 2 (825 DL) |
Methodology
The only BF16 collection in the portfolio. While the broader Convergent Intelligence catalog (43 models, 12,000+ downloads) was trained on CPU at FP32 for $24 total compute, the DistilQwen series was trained on H100 at BF16 with a 30B-parameter teacher. Same methodology, premium hardware. This is what happens when you give the pipeline real compute.
All models use proof-weighted knowledge distillation: 55% cross-entropy with decaying proof weights (2.5ร โ 1.5ร), 45% KL divergence at T=2.0. The proof weight amplifies loss on reasoning-critical tokens, forcing the student to allocate capacity to structural understanding rather than surface-level pattern matching.
Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)
Related in this series
- Qwen3-0.6B-Distilled-30B-A3B (236 downloads)
- Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT (227 downloads)
Part of the reaperdoesntknow research portfolio โ 49 models, 22,598 total downloads | Last refreshed: 2026-03-30 12:05 UTC
- Downloads last month
- 1,443
4-bit
5-bit
8-bit
16-bit
Model tree for reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF
Base model
Qwen/Qwen3-0.6B-Base