asdf98 commited on
Commit
12b6652
Β·
verified Β·
1 Parent(s): 2aa76c3

Upload BONSAI_LIMITATIONS.md

Browse files
Files changed (1) hide show
  1. BONSAI_LIMITATIONS.md +98 -0
BONSAI_LIMITATIONS.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ⚠️ Bonsai (PrismML) β€” Training Limitation Report
2
+
3
+ ## Status: ❌ NOT SUPPORTED by Unsloth for Fine-Tuning
4
+
5
+ ### What is Bonsai?
6
+
7
+ [Bonsai](https://prismml.com/) by **PrismML** is an extremely lightweight LLM family using **1-bit ternary quantization**. The 8B parameter model compresses to approximately **1GB**, making it one of the smallest high-parameter-count models available.
8
+
9
+ - **HF Collection:** https://huggingface.co/collections/prism-ml/bonsai
10
+ - **Demo Repo:** https://github.com/PrismML-Eng/Bonsai-demo
11
+ - **Architecture:** `Qwen3ForCausalLM` (for the unpacked version)
12
+
13
+ ---
14
+
15
+ ## Why Bonsai Cannot Be Fine-Tuned with Unsloth (or Standard PEFT)
16
+
17
+ ### 1. **1-Bit Ternary Weights Are Incompatible with LoRA**
18
+
19
+ | Property | Standard Models (Qwen, Gemma, Llama) | Bonsai |
20
+ |----------|--------------------------------------|--------|
21
+ | Weight precision | FP16/BF16/FP32 | **1-bit ternary** (-1, 0, +1) |
22
+ | Quantization | 4-bit (bnb) or 8-bit | **Custom 1-bit kernels** |
23
+ | Unsloth support | βœ… Yes | ❌ No |
24
+ | LoRA/QLoRA | βœ… Works | ❌ Requires FP16 base weights |
25
+ | bitsandbytes | βœ… Compatible | ❌ Incompatible |
26
+
27
+ **The core issue:** LoRA fine-tuning works by adding small, trainable FP16 matrices (A and B) to frozen base weights. Bonsai's base weights are stored in a custom 1-bit format that:
28
+ - Cannot be dequantized to FP16 in a way that supports gradient flow
29
+ - Requires PrismML's proprietary CUDA kernels for inference
30
+ - Does not have an `AutoModelForCausalLM` compatible weight format
31
+
32
+ ### 2. **No Unsloth 4-bit Conversion Exists**
33
+
34
+ We searched the Unsloth model catalog thoroughly:
35
+ - **Unsloth HF namespace:** https://huggingface.co/unsloth
36
+ - **Search terms:** "bonsai", "prism", "ternary"
37
+ - **Result:** **ZERO** Bonsai models in the Unsloth catalog
38
+
39
+ There are **no** `unsloth-bnb-4bit` or `unsloth-gemma-4bit` style conversions for Bonsai because the 1-bit format is fundamentally different from the standard INT4/FP4 quantization that Unsloth and bitsandbytes use.
40
+
41
+ ### 3. **Available Bonsai Variants on HF**
42
+
43
+ | Variant | Size | Fine-Tunable? | Notes |
44
+ |---------|------|---------------|-------|
45
+ | `prism-ml/Bonsai-1B` | ~1GB | ❌ No | 1-bit weights, custom inference only |
46
+ | `prism-ml/Bonsai-8B` | ~1GB packed | ❌ No | Same 1-bit format |
47
+ | `prism-ml/Bonsai-8B-unpacked` | ~15GB | ⚠️ Maybe* | Qwen3 architecture, but weights may still be ternary |
48
+
49
+ *The "unpacked" variant lists `Qwen3ForCausalLM` in its config, but the actual weight tensors are still ternary-encoded. Standard `from_pretrained()` will fail or produce garbage because the weight files use a custom serialization format.
50
+
51
+ ### 4. **PrismML's Training Stack**
52
+
53
+ PrismML has not (as of May 2026) released:
54
+ - An open-source fine-tuning framework for Bonsai
55
+ - A conversion tool from 1-bit β†’ standard FP16
56
+ - LoRA adapter support
57
+ - Integration with Hugging Face TRL, PEFT, or Unsloth
58
+
59
+ The [Bonsai-demo](https://github.com/PrismML-Eng/Bonsai-demo) repository only shows **inference** examples, not training.
60
+
61
+ ---
62
+
63
+ ## What ARE the Options for Extremely Lightweight Models?
64
+
65
+ If your goal is to fine-tune a very small model on T4 with minimal VRAM, these **are** supported by Unsloth:
66
+
67
+ | Model | Params | 4-bit Size | T4 Batch Size | Unsloth Support |
68
+ |-------|--------|-----------|---------------|-----------------|
69
+ | **LFM2.5-1.2B** | 1.2B | ~1GB | **8** | βœ… Excellent |
70
+ | **Qwen3.5-0.8B** | 0.8B | ~0.5GB | **8** | βœ… Excellent |
71
+ | **Qwen3.5-2B** | 2B | ~1.2GB | **4-8** | βœ… Excellent |
72
+ | **Gemma-4 E2B** | ~2B dense | ~7.6GB | **1** | βœ… Tight but works |
73
+
74
+ These models are **already** extremely small and can be fine-tuned with very large batch sizes on T4. They achieve similar or better compression-through-performance ratios than Bonsai, **with** full training support.
75
+
76
+ ---
77
+
78
+ ## Future Possibility
79
+
80
+ If PrismML releases:
81
+ 1. A **standard FP16/FP32 checkpoint** of Bonsai (even if larger)
82
+ 2. Or a **Bonsai β†’ standard format converter**
83
+ 3. Or adds Bonsai to the Unsloth model catalog
84
+
85
+ ...then we can create a notebook. Until then, **Bonsai fine-tuning on Unsloth/TRL/PEFT is not possible**.
86
+
87
+ ---
88
+
89
+ ## Sources
90
+
91
+ - PrismML Bonsai Collection: https://huggingface.co/collections/prism-ml/bonsai
92
+ - Bonsai Demo (inference only): https://github.com/PrismML-Eng/Bonsai-demo
93
+ - Unsloth Model Catalog: https://unsloth.ai/docs/get-started/unsloth-model-catalog
94
+ - PrismML Blog (1-bit ternary): https://byteiota.com/prismml-1-bit-bonsai-llm-14x-smaller-8x-faster/
95
+
96
+ ---
97
+
98
+ *Last updated: May 2026*