β οΈ Bonsai (PrismML) β Training Limitation Report
Status: β NOT SUPPORTED by Unsloth for Fine-Tuning
What is Bonsai?
Bonsai by PrismML is an extremely lightweight LLM family using 1-bit ternary quantization. The 8B parameter model compresses to approximately 1GB, making it one of the smallest high-parameter-count models available.
- HF Collection: https://huggingface.co/collections/prism-ml/bonsai
- Demo Repo: https://github.com/PrismML-Eng/Bonsai-demo
- Architecture:
Qwen3ForCausalLM(for the unpacked version)
Why Bonsai Cannot Be Fine-Tuned with Unsloth (or Standard PEFT)
1. 1-Bit Ternary Weights Are Incompatible with LoRA
| Property | Standard Models (Qwen, Gemma, Llama) | Bonsai |
|---|---|---|
| Weight precision | FP16/BF16/FP32 | 1-bit ternary (-1, 0, +1) |
| Quantization | 4-bit (bnb) or 8-bit | Custom 1-bit kernels |
| Unsloth support | β Yes | β No |
| LoRA/QLoRA | β Works | β Requires FP16 base weights |
| bitsandbytes | β Compatible | β Incompatible |
The core issue: LoRA fine-tuning works by adding small, trainable FP16 matrices (A and B) to frozen base weights. Bonsai's base weights are stored in a custom 1-bit format that:
- Cannot be dequantized to FP16 in a way that supports gradient flow
- Requires PrismML's proprietary CUDA kernels for inference
- Does not have an
AutoModelForCausalLMcompatible weight format
2. No Unsloth 4-bit Conversion Exists
We searched the Unsloth model catalog thoroughly:
- Unsloth HF namespace: https://huggingface.co/unsloth
- Search terms: "bonsai", "prism", "ternary"
- Result: ZERO Bonsai models in the Unsloth catalog
There are no unsloth-bnb-4bit or unsloth-gemma-4bit style conversions for Bonsai because the 1-bit format is fundamentally different from the standard INT4/FP4 quantization that Unsloth and bitsandbytes use.
3. Available Bonsai Variants on HF
| Variant | Size | Fine-Tunable? | Notes |
|---|---|---|---|
prism-ml/Bonsai-1B |
~1GB | β No | 1-bit weights, custom inference only |
prism-ml/Bonsai-8B |
~1GB packed | β No | Same 1-bit format |
prism-ml/Bonsai-8B-unpacked |
~15GB | β οΈ Maybe* | Qwen3 architecture, but weights may still be ternary |
*The "unpacked" variant lists Qwen3ForCausalLM in its config, but the actual weight tensors are still ternary-encoded. Standard from_pretrained() will fail or produce garbage because the weight files use a custom serialization format.
4. PrismML's Training Stack
PrismML has not (as of May 2026) released:
- An open-source fine-tuning framework for Bonsai
- A conversion tool from 1-bit β standard FP16
- LoRA adapter support
- Integration with Hugging Face TRL, PEFT, or Unsloth
The Bonsai-demo repository only shows inference examples, not training.
What ARE the Options for Extremely Lightweight Models?
If your goal is to fine-tune a very small model on T4 with minimal VRAM, these are supported by Unsloth:
| Model | Params | 4-bit Size | T4 Batch Size | Unsloth Support |
|---|---|---|---|---|
| LFM2.5-1.2B | 1.2B | ~1GB | 8 | β Excellent |
| Qwen3.5-0.8B | 0.8B | ~0.5GB | 8 | β Excellent |
| Qwen3.5-2B | 2B | ~1.2GB | 4-8 | β Excellent |
| Gemma-4 E2B | ~2B dense | ~7.6GB | 1 | β Tight but works |
These models are already extremely small and can be fine-tuned with very large batch sizes on T4. They achieve similar or better compression-through-performance ratios than Bonsai, with full training support.
Future Possibility
If PrismML releases:
- A standard FP16/FP32 checkpoint of Bonsai (even if larger)
- Or a Bonsai β standard format converter
- Or adds Bonsai to the Unsloth model catalog
...then we can create a notebook. Until then, Bonsai fine-tuning on Unsloth/TRL/PEFT is not possible.
Sources
- PrismML Bonsai Collection: https://huggingface.co/collections/prism-ml/bonsai
- Bonsai Demo (inference only): https://github.com/PrismML-Eng/Bonsai-demo
- Unsloth Model Catalog: https://unsloth.ai/docs/get-started/unsloth-model-catalog
- PrismML Blog (1-bit ternary): https://byteiota.com/prismml-1-bit-bonsai-llm-14x-smaller-8x-faster/
Last updated: May 2026