Spaces:
Runtime error
Runtime error
File size: 2,888 Bytes
33b46cf f813944 33b46cf 4339749 0633a27 5056b70 0633a27 5056b70 0633a27 5056b70 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | ---
title: Bonsai 1-bit GPU
emoji: 🌿
colorFrom: green
colorTo: blue
sdk: docker
app_port: 7860
suggested_hardware: l40sx1
pinned: true
short_description: Run 1-bit Bonsai LLMs on GPUs
models:
- prism-ml/Bonsai-8B-gguf
- prism-ml/Bonsai-4B-gguf
- prism-ml/Bonsai-1.7B-gguf
---
# Bonsai Demo
Interactive demo for **[Bonsai](https://huggingface.co/collections/prism-ml/bonsai)** — the first commercially viable 1-bit LLMs, by **[PrismML](https://prismml.com)**.
Bonsai models run at **true 1-bit precision** — every weight is a single bit. An 8B model fits in **1.15 GB**, a 1.7B model in just **240 MB**. Small enough to run in a browser, on a phone, or on any laptop — while remaining competitive with full-precision models on benchmarks.
> **This demo will be available for a limited time (approximately 1–2 weeks).** Enjoy it while it lasts!
## Highlights
Bonsai-8B fits in **1.15 GB** (14x smaller than FP16) and generates at **~330 tok/s** on an L40S (6.3x faster than FP16). Scores 70.5 average across 6 benchmark tasks, competitive with full-precision 8B models.
## Models
| Model | Size | GGUF | MLX |
|---|---|---|---|
| **Bonsai-8B** | 1.15 GB | [prism-ml/Bonsai-8B-gguf](https://huggingface.co/prism-ml/Bonsai-8B-gguf) | [prism-ml/Bonsai-8B-mlx-1bit](https://huggingface.co/prism-ml/Bonsai-8B-mlx-1bit) |
| **Bonsai-4B** | 570 MB | [prism-ml/Bonsai-4B-gguf](https://huggingface.co/prism-ml/Bonsai-4B-gguf) | [prism-ml/Bonsai-4B-mlx-1bit](https://huggingface.co/prism-ml/Bonsai-4B-mlx-1bit) |
| **Bonsai-1.7B** | 240 MB | [prism-ml/Bonsai-1.7B-gguf](https://huggingface.co/prism-ml/Bonsai-1.7B-gguf) | [prism-ml/Bonsai-1.7B-mlx-1bit](https://huggingface.co/prism-ml/Bonsai-1.7B-mlx-1bit) |
## Resources
- [1-bit Bonsai Whitepaper](https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf)
- [Google Colab notebook](https://colab.research.google.com/drive/1EzyAaQ2nwDv_1X0jaC5XiVC3ZREg9bdG?usp=sharing)
- [GitHub Demo](https://github.com/PrismML-Eng/Bonsai-demo)
- [Discord community](https://discord.gg/prismml)
- [Prism ML website](https://prismml.com)
- [Run Bonsai locally in the browser (WebGPU)](https://huggingface.co/spaces/webml-community/bonsai-webgpu)
## Privacy
- **We do not log any messages.** Chat content is never stored on the server.
- This demo uses the built-in llama-server UI, which saves your conversation history **in your browser's local storage only**. Clearing your browser cache will erase it.
- That said, **please do not submit sensitive, private, or confidential information** in your messages.
## Fair Use
We've allocated multiple GPUs to keep this demo responsive, but resources are shared across all users. Under heavy load you may experience slower responses or brief queuing. Please be mindful of usage and avoid sending large bursts of automated requests so everyone can enjoy the demo. |