pashak commited on
Commit
5056b70
·
verified ·
1 Parent(s): e28b544

Update README.md

Browse files

Update demo readme with some description and links

Files changed (1) hide show
  1. README.md +36 -12
README.md CHANGED
@@ -1,20 +1,44 @@
1
- ---
2
- title: Bonsai Demo
3
- emoji: 🌿
4
- colorFrom: green
5
- colorTo: blue
6
- sdk: docker
7
- app_port: 7860
8
- suggested_hardware: l40sx1
9
- pinned: false
10
- ---
 
11
 
12
  # Bonsai Demo
13
 
14
- Interactive demo for [Bonsai](https://huggingface.co/collections/prism-ml/bonsai), end-to-end 1-bit language models by [Prism ML](https://prismml.com).
 
 
15
 
16
  > **This demo will be available for a limited time (approximately 1–2 weeks).** Enjoy it while it lasts!
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## Privacy
19
 
20
  - **We do not log any messages.** Chat content is never stored on the server.
@@ -23,4 +47,4 @@ Interactive demo for [Bonsai](https://huggingface.co/collections/prism-ml/bonsai
23
 
24
  ## Fair Use
25
 
26
- We've allocated multiple GPUs to keep this demo responsive, but resources are shared across all users. Under heavy load you may experience slower responses or brief queuing. Please be mindful of usage and avoid sending large bursts of automated requests so everyone can enjoy the demo.
 
1
+ ---
2
+ title: Bonsai 1-bit GPU
3
+ emoji: 🌿
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: docker
7
+ app_port: 7860
8
+ suggested_hardware: l40sx1
9
+ pinned: true
10
+ short_description: Run 1-bit Bonsai LLMs on GPUs
11
+ ---
12
 
13
  # Bonsai Demo
14
 
15
+ Interactive demo for **[Bonsai](https://huggingface.co/collections/prism-ml/bonsai)** the first commercially viable 1-bit LLMs, by **[PrismML](https://prismml.com)**.
16
+
17
+ Bonsai models run at **true 1-bit precision** — every weight is a single bit. An 8B model fits in **1.15 GB**, a 1.7B model in just **240 MB**. Small enough to run in a browser, on a phone, or on any laptop — while remaining competitive with full-precision models on benchmarks.
18
 
19
  > **This demo will be available for a limited time (approximately 1–2 weeks).** Enjoy it while it lasts!
20
 
21
+ ## Highlights
22
+
23
+ Bonsai-8B fits in **1.15 GB** (14x smaller than FP16) and generates at **~330 tok/s** on an L40S (6.3x faster than FP16). Scores 70.5 average across 6 benchmark tasks, competitive with full-precision 8B models.
24
+
25
+ ## Models
26
+
27
+ | Model | Size | GGUF | MLX |
28
+ |---|---|---|---|
29
+ | **Bonsai-8B** | 1.15 GB | [prism-ml/Bonsai-8B-gguf](https://huggingface.co/prism-ml/Bonsai-8B-gguf) | [prism-ml/Bonsai-8B-mlx-1bit](https://huggingface.co/prism-ml/Bonsai-8B-mlx-1bit) |
30
+ | **Bonsai-4B** | 570 MB | [prism-ml/Bonsai-4B-gguf](https://huggingface.co/prism-ml/Bonsai-4B-gguf) | [prism-ml/Bonsai-4B-mlx-1bit](https://huggingface.co/prism-ml/Bonsai-4B-mlx-1bit) |
31
+ | **Bonsai-1.7B** | 240 MB | [prism-ml/Bonsai-1.7B-gguf](https://huggingface.co/prism-ml/Bonsai-1.7B-gguf) | [prism-ml/Bonsai-1.7B-mlx-1bit](https://huggingface.co/prism-ml/Bonsai-1.7B-mlx-1bit) |
32
+
33
+ ## Resources
34
+
35
+ - [1-bit Bonsai Whitepaper](https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf)
36
+ - [Google Colab notebook](https://colab.research.google.com/drive/1EzyAaQ2nwDv_1X0jaC5XiVC3ZREg9bdG?usp=sharing)
37
+ - [GitHub Demo](https://github.com/PrismML-Eng/Bonsai-demo)
38
+ - [Discord community](https://discord.gg/prismml)
39
+ - [Prism ML website](https://prismml.com)
40
+ - [Run Bonsai locally in the browser (WebGPU)](https://huggingface.co/spaces/webml-community/bonsai-webgpu)
41
+
42
  ## Privacy
43
 
44
  - **We do not log any messages.** Chat content is never stored on the server.
 
47
 
48
  ## Fair Use
49
 
50
+ We've allocated multiple GPUs to keep this demo responsive, but resources are shared across all users. Under heavy load you may experience slower responses or brief queuing. Please be mindful of usage and avoid sending large bursts of automated requests so everyone can enjoy the demo.