samuelfaj commited on
Commit
87f899f
·
verified ·
1 Parent(s): eba1c37

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - gguf
7
+ - distill
8
+ - distill-mini
9
+ - cli
10
+ - code
11
+ - compression
12
+ - qwen
13
+ - qwen3
14
+ - expert-model
15
+ - domain-specific
16
+ - task-specialized
17
+ - qlora
18
+ pipeline_tag: text-generation
19
+ base_model: Qwen/Qwen3-0.6B
20
+ ---
21
+
22
+ # distill2-0.6B — Expert Language Model for CLI Output (GGUF)
23
+
24
+ **distill2-0.6B** is the second-generation **domain-specific Expert Language Model** for CLI output compression and classification — in GGUF format for cross-platform use with llama.cpp.
25
+
26
+ See [distill2-0.6B-4bit-MLX](https://huggingface.co/samuelfaj/distill2-0.6B-4bit-MLX) for the MLX (Apple Silicon) version.
27
+
28
+ ## What is distill?
29
+
30
+ [distill](https://github.com/samuelfaj/distill) compresses arbitrary command-line output to structured summaries.
31
+
32
+ ```
33
+ Input: 500 lines of npm install logs
34
+ Output: PASS — 24 packages installed, 0 vulnerabilities
35
+ ```
36
+
37
+ **distill2-0.6B** achieves **98.4% accuracy** at 0.6B parameters — outperforming its 1.7B predecessor.
38
+
39
+ ## Files
40
+
41
+ | File | Format | Size | Use case |
42
+ |------|--------|------|----------|
43
+ | `distill2-0.6B-Q4_K_M.gguf` | Q4_K_M (4-bit) | 378 MB | Production, low memory |
44
+ | `distill2-0.6B-fp16.gguf` | fp16 | 1.2 GB | Maximum quality |
45
+
46
+ ## Performance
47
+
48
+ | Metric | Value |
49
+ |--------|-------|
50
+ | Overall accuracy | **98.4%** |
51
+ | Tasks at 100% | 5 of 8 |
52
+ | Tasks ≥95% | 7 of 8 |
53
+ | Base model | Qwen3-0.6B |
54
+ | Training | QLoRA 4-bit + GGUF conversion |
55
+
56
+ ## 8 Specialized Tasks
57
+
58
+ | Task | Accuracy | Description |
59
+ |------|----------|-------------|
60
+ | `pass_fail` | 100% | Command success/failure |
61
+ | `safe_review` | 100% | Terraform plan safety |
62
+ | `json_extraction` | 100% | JSON from noisy logs |
63
+ | `test_result` | 100% | Test suite pass/fail |
64
+ | `typescript_check` | 100% | TS compiler errors |
65
+ | `terraform_plan` | 98.4% | Resource change counts |
66
+ | `security_audit` | 96.6% | Vulnerability counts |
67
+ | `generic` | 93.1% | Free-form CLI summaries |
68
+
69
+ ## Usage (llama.cpp)
70
+
71
+ ```bash
72
+ # Download
73
+ huggingface-cli download samuelfaj/distill2-0.6B-4bit-GGUF distill2-0.6B-Q4_K_M.gguf --local-dir .
74
+
75
+ # Run with llama-cli
76
+ llama-cli -m distill2-0.6B-Q4_K_M.gguf -p "Command output: npm test\n4 passed, 0 failed"
77
+
78
+ # Or as server
79
+ llama-server -m distill2-0.6B-Q4_K_M.gguf --port 8080
80
+ ```
81
+
82
+ ## Conversion Pipeline
83
+
84
+ This GGUF was created from the QLoRA-trained model via:
85
+
86
+ 1. Fuse QLoRA adapter into 4-bit base → fp16 with `mlx_lm fuse --dequantize`
87
+ 2. Strip MLX quantization artifacts (bias tensors)
88
+ 3. Convert to GGUF fp16 with `llama.cpp/convert_hf_to_gguf.py`
89
+ 4. Quantize to Q4_K_M with `llama-quantize`
90
+
91
+ ## Project
92
+
93
+ [distill](https://github.com/samuelfaj/distill) — CLI output compression engine.
94
+
95
+ [Full Distill Collection](https://huggingface.co/collections/samuelfaj/distill-6a0606f9b131c289025659fc)