ethical-hacking-llm-colab / BONSAI_LIMITATIONS.md
asdf98's picture
Upload BONSAI_LIMITATIONS.md
12b6652 verified

⚠️ Bonsai (PrismML) β€” Training Limitation Report

Status: ❌ NOT SUPPORTED by Unsloth for Fine-Tuning

What is Bonsai?

Bonsai by PrismML is an extremely lightweight LLM family using 1-bit ternary quantization. The 8B parameter model compresses to approximately 1GB, making it one of the smallest high-parameter-count models available.


Why Bonsai Cannot Be Fine-Tuned with Unsloth (or Standard PEFT)

1. 1-Bit Ternary Weights Are Incompatible with LoRA

Property Standard Models (Qwen, Gemma, Llama) Bonsai
Weight precision FP16/BF16/FP32 1-bit ternary (-1, 0, +1)
Quantization 4-bit (bnb) or 8-bit Custom 1-bit kernels
Unsloth support βœ… Yes ❌ No
LoRA/QLoRA βœ… Works ❌ Requires FP16 base weights
bitsandbytes βœ… Compatible ❌ Incompatible

The core issue: LoRA fine-tuning works by adding small, trainable FP16 matrices (A and B) to frozen base weights. Bonsai's base weights are stored in a custom 1-bit format that:

  • Cannot be dequantized to FP16 in a way that supports gradient flow
  • Requires PrismML's proprietary CUDA kernels for inference
  • Does not have an AutoModelForCausalLM compatible weight format

2. No Unsloth 4-bit Conversion Exists

We searched the Unsloth model catalog thoroughly:

There are no unsloth-bnb-4bit or unsloth-gemma-4bit style conversions for Bonsai because the 1-bit format is fundamentally different from the standard INT4/FP4 quantization that Unsloth and bitsandbytes use.

3. Available Bonsai Variants on HF

Variant Size Fine-Tunable? Notes
prism-ml/Bonsai-1B ~1GB ❌ No 1-bit weights, custom inference only
prism-ml/Bonsai-8B ~1GB packed ❌ No Same 1-bit format
prism-ml/Bonsai-8B-unpacked ~15GB ⚠️ Maybe* Qwen3 architecture, but weights may still be ternary

*The "unpacked" variant lists Qwen3ForCausalLM in its config, but the actual weight tensors are still ternary-encoded. Standard from_pretrained() will fail or produce garbage because the weight files use a custom serialization format.

4. PrismML's Training Stack

PrismML has not (as of May 2026) released:

  • An open-source fine-tuning framework for Bonsai
  • A conversion tool from 1-bit β†’ standard FP16
  • LoRA adapter support
  • Integration with Hugging Face TRL, PEFT, or Unsloth

The Bonsai-demo repository only shows inference examples, not training.


What ARE the Options for Extremely Lightweight Models?

If your goal is to fine-tune a very small model on T4 with minimal VRAM, these are supported by Unsloth:

Model Params 4-bit Size T4 Batch Size Unsloth Support
LFM2.5-1.2B 1.2B ~1GB 8 βœ… Excellent
Qwen3.5-0.8B 0.8B ~0.5GB 8 βœ… Excellent
Qwen3.5-2B 2B ~1.2GB 4-8 βœ… Excellent
Gemma-4 E2B ~2B dense ~7.6GB 1 βœ… Tight but works

These models are already extremely small and can be fine-tuned with very large batch sizes on T4. They achieve similar or better compression-through-performance ratios than Bonsai, with full training support.


Future Possibility

If PrismML releases:

  1. A standard FP16/FP32 checkpoint of Bonsai (even if larger)
  2. Or a Bonsai β†’ standard format converter
  3. Or adds Bonsai to the Unsloth model catalog

...then we can create a notebook. Until then, Bonsai fine-tuning on Unsloth/TRL/PEFT is not possible.


Sources


Last updated: May 2026