Spaces:

knoxel
/

bitnet-b158-cpu-explorer

Running

App Files Files Community

knoxel commited on 10 days ago

Commit

aec068f

verified ·

1 Parent(s): 541bc33

Upload README.md

Browse files

Files changed (1) hide show

README.md +49 -5

README.md CHANGED Viewed

@@ -1,12 +1,56 @@
 ---
-title: Bitnet B158 Cpu Explorer
-emoji: 👀
-colorFrom: blue
-colorTo: pink
 sdk: gradio
 sdk_version: 6.14.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: "🧬 BitNet b1.58 2B4T — CPU-Only Inference Explorer"
+emoji: 🧬
+colorFrom: indigo
+colorTo: blue
 sdk: gradio
 sdk_version: 6.14.0
 app_file: app.py
 pinned: false
+models:
+  - microsoft/bitnet-b1.58-2B-4T
+tags:
+  - bitnet
+  - 1-bit
+  - cpu-inference
+  - ternary-weights
+  - efficient-inference
+short_description: "Chat with Microsoft's 1-bit LLM on CPU — no GPU needed"
 ---
+# 🧬 BitNet b1.58 2B4T — CPU-Only Inference Explorer
+An interactive demo of **Microsoft Research's first open-source native 1-bit Large Language Model**.
+## What makes this special?
+| Feature | Detail |
+|---|---|
+| **Weights** | Ternary {-1, 0, +1} — just 1.58 bits per weight |
+| **Memory** | 0.4 GB (non-embedding) — **5-13× less** than comparable models |
+| **Energy** | 0.028J per token — **6-9× less** than FP16 models |
+| **Quality** | 54.2% avg benchmark — competitive with Qwen2.5 1.5B (55.2%) |
+| **Training** | Trained from scratch on 4T tokens (not post-training quantized) |
+## Key insight
+Since weights are only -1, 0, or +1, matrix multiplication becomes pure **addition/subtraction**. No floating-point multiplies needed — this is why CPUs can run BitNet efficiently.
+## Demo features
+- 💬 **Chat** — Streaming conversation with live tokens/sec stats
+- 📊 **Benchmark** — Single-shot generation with memory & speed metrics
+- 📈 **Paper Results** — Published benchmark comparison table
+- 🏗️ **Architecture** — Visual explainer of how BitNet b1.58 differs from standard Transformers
+- ⚙️ **System** — Live hardware & memory stats
+## ⚠️ Performance note
+This demo uses the `transformers` library, which does **not** include the specialized `bitnet.cpp` kernels. For the paper's reported CPU latency (29ms/token), use [bitnet.cpp](https://github.com/microsoft/BitNet) with the [GGUF weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf).
+## References
+- 📄 [Technical Report](https://arxiv.org/abs/2504.12285)
+- 🤗 [Model Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T)
+- 💻 [bitnet.cpp](https://github.com/microsoft/BitNet) (38K+ ⭐)
+- 📦 [GGUF Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf)