⚠️ Bonsai (PrismML) — Training Limitation Report

Status: ❌ NOT SUPPORTED by Unsloth for Fine-Tuning

What is Bonsai?

Bonsai by PrismML is an extremely lightweight LLM family using 1-bit ternary quantization. The 8B parameter model compresses to approximately 1GB, making it one of the smallest high-parameter-count models available.

HF Collection: https://huggingface.co/collections/prism-ml/bonsai
Demo Repo: https://github.com/PrismML-Eng/Bonsai-demo
Architecture: Qwen3ForCausalLM (for the unpacked version)

Why Bonsai Cannot Be Fine-Tuned with Unsloth (or Standard PEFT)

1. 1-Bit Ternary Weights Are Incompatible with LoRA

Property	Standard Models (Qwen, Gemma, Llama)	Bonsai
Weight precision	FP16/BF16/FP32	1-bit ternary (-1, 0, +1)
Quantization	4-bit (bnb) or 8-bit	Custom 1-bit kernels
Unsloth support	✅ Yes	❌ No
LoRA/QLoRA	✅ Works	❌ Requires FP16 base weights
bitsandbytes	✅ Compatible	❌ Incompatible

The core issue: LoRA fine-tuning works by adding small, trainable FP16 matrices (A and B) to frozen base weights. Bonsai's base weights are stored in a custom 1-bit format that:

Cannot be dequantized to FP16 in a way that supports gradient flow
Requires PrismML's proprietary CUDA kernels for inference
Does not have an AutoModelForCausalLM compatible weight format

2. No Unsloth 4-bit Conversion Exists

We searched the Unsloth model catalog thoroughly:

Unsloth HF namespace: https://huggingface.co/unsloth
Search terms: "bonsai", "prism", "ternary"
Result: ZERO Bonsai models in the Unsloth catalog

There are no unsloth-bnb-4bit or unsloth-gemma-4bit style conversions for Bonsai because the 1-bit format is fundamentally different from the standard INT4/FP4 quantization that Unsloth and bitsandbytes use.

3. Available Bonsai Variants on HF

Variant	Size	Fine-Tunable?	Notes
`prism-ml/Bonsai-1B`	~1GB	❌ No	1-bit weights, custom inference only
`prism-ml/Bonsai-8B`	~1GB packed	❌ No	Same 1-bit format
`prism-ml/Bonsai-8B-unpacked`	~15GB	⚠️ Maybe*	Qwen3 architecture, but weights may still be ternary

*The "unpacked" variant lists Qwen3ForCausalLM in its config, but the actual weight tensors are still ternary-encoded. Standard from_pretrained() will fail or produce garbage because the weight files use a custom serialization format.

4. PrismML's Training Stack

PrismML has not (as of May 2026) released:

An open-source fine-tuning framework for Bonsai
A conversion tool from 1-bit → standard FP16
LoRA adapter support
Integration with Hugging Face TRL, PEFT, or Unsloth

The Bonsai-demo repository only shows inference examples, not training.

What ARE the Options for Extremely Lightweight Models?

If your goal is to fine-tune a very small model on T4 with minimal VRAM, these are supported by Unsloth:

Model	Params	4-bit Size	T4 Batch Size	Unsloth Support
LFM2.5-1.2B	1.2B	~1GB	8	✅ Excellent
Qwen3.5-0.8B	0.8B	~0.5GB	8	✅ Excellent
Qwen3.5-2B	2B	~1.2GB	4-8	✅ Excellent
Gemma-4 E2B	~2B dense	~7.6GB	1	✅ Tight but works

These models are already extremely small and can be fine-tuned with very large batch sizes on T4. They achieve similar or better compression-through-performance ratios than Bonsai, with full training support.

Future Possibility

If PrismML releases:

A standard FP16/FP32 checkpoint of Bonsai (even if larger)
Or a Bonsai → standard format converter
Or adds Bonsai to the Unsloth model catalog

...then we can create a notebook. Until then, Bonsai fine-tuning on Unsloth/TRL/PEFT is not possible.

Sources

PrismML Bonsai Collection: https://huggingface.co/collections/prism-ml/bonsai
Bonsai Demo (inference only): https://github.com/PrismML-Eng/Bonsai-demo
Unsloth Model Catalog: https://unsloth.ai/docs/get-started/unsloth-model-catalog
PrismML Blog (1-bit ternary): https://byteiota.com/prismml-1-bit-bonsai-llm-14x-smaller-8x-faster/

Last updated: May 2026