distill2-0.6B — Expert Language Model for CLI Output (GGUF)

distill2-0.6B is the second-generation domain-specific Expert Language Model for CLI output compression and classification — in GGUF format for cross-platform use with llama.cpp.

See distill2-0.6B-4bit-MLX for the MLX (Apple Silicon) version.

What is distill?

distill compresses arbitrary command-line output to structured summaries.

Input:  500 lines of npm install logs
Output: PASS — 24 packages installed, 0 vulnerabilities

distill2-0.6B achieves 98.4% accuracy at 0.6B parameters — outperforming its 1.7B predecessor.

Files

File Format Size Use case
distill2-0.6B-Q4_K_M.gguf Q4_K_M (4-bit) 378 MB Production, low memory
distill2-0.6B-fp16.gguf fp16 1.2 GB Maximum quality

Performance

Metric Value
Overall accuracy 98.4%
Tasks at 100% 5 of 8
Tasks ≥95% 7 of 8
Base model Qwen3-0.6B
Training QLoRA 4-bit + GGUF conversion

8 Specialized Tasks

Task Accuracy Description
pass_fail 100% Command success/failure
safe_review 100% Terraform plan safety
json_extraction 100% JSON from noisy logs
test_result 100% Test suite pass/fail
typescript_check 100% TS compiler errors
terraform_plan 98.4% Resource change counts
security_audit 96.6% Vulnerability counts
generic 93.1% Free-form CLI summaries

Usage (llama.cpp)

# Download
huggingface-cli download samuelfaj/distill2-0.6B-4bit-GGUF distill2-0.6B-Q4_K_M.gguf --local-dir .

# Run with llama-cli
llama-cli -m distill2-0.6B-Q4_K_M.gguf -p "Command output: npm test\n4 passed, 0 failed"

# Or as server
llama-server -m distill2-0.6B-Q4_K_M.gguf --port 8080

Conversion Pipeline

This GGUF was created from the QLoRA-trained model via:

  1. Fuse QLoRA adapter into 4-bit base → fp16 with mlx_lm fuse --dequantize
  2. Strip MLX quantization artifacts (bias tensors)
  3. Convert to GGUF fp16 with llama.cpp/convert_hf_to_gguf.py
  4. Quantize to Q4_K_M with llama-quantize

Project

distill — CLI output compression engine.

Full Distill Collection

Downloads last month
155
GGUF
Model size
0.6B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samuelfaj/distill2-0.6B-4bit-GGUF

Finetuned
Qwen/Qwen3-0.6B
Quantized
(307)
this model

Collection including samuelfaj/distill2-0.6B-4bit-GGUF