Spaces:

knoxel
/

bitnet-cpp-explorer

Runtime error

App Files Files Community

knoxel commited on 4 days ago

Commit

622c714

verified ·

1 Parent(s): e17101b

Upload README.md

Browse files

Files changed (1) hide show

README.md +60 -4

README.md CHANGED Viewed

@@ -1,10 +1,66 @@
 ---
-title: Bitnet Cpp Explorer
-emoji: 🦀
-colorFrom: yellow
 colorTo: blue
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: "🧬 BitNet b1.58 2B4T — CPU Inference (bitnet.cpp)"
+emoji: ⚡
+colorFrom: indigo
 colorTo: blue
 sdk: docker
 pinned: false
+models:
+  - microsoft/bitnet-b1.58-2B-4T
+  - microsoft/bitnet-b1.58-2B-4T-gguf
+tags:
+  - bitnet
+  - 1-bit
+  - cpu-inference
+  - ternary-weights
+  - bitnet-cpp
+  - llama-server
+short_description: "1-bit LLM on CPU with bitnet.cpp — 4-10x faster"
 ---
+# ⚡ BitNet b1.58 2B4T — CPU Inference with bitnet.cpp
+The **fast** version. This Space compiles Microsoft's [bitnet.cpp](https://github.com/microsoft/BitNet) from source and runs the official GGUF model with the **I2_S lossless kernel** — achieving **4-10× speedup** over the transformers-based version.
+## Architecture
+```
+┌─────────────────────────────────────────────────┐
+│  Gradio UI (Python)                             │
+│    ↕ OpenAI-compatible streaming API            │
+│  llama-server (bitnet.cpp, port 8080)           │
+│    ↕ I2_S ternary kernel (additions only)       │
+│  ggml-model-i2_s.gguf (1.1 GB, lossless)       │
+└─────────────────────────────────────────────────┘
+```
+## Why bitnet.cpp?
+| Engine | Speed | Lossless? | Notes |
+|---|---|---|---|
+| **bitnet.cpp I2_S** | ~10-15 tok/s | ✅ 100% | Optimized ternary kernel |
+| transformers (bf16) | ~1.4 tok/s | ✅ | No ternary optimization |
+| llama.cpp TQ1_0 | ~2 tok/s | ❌ 1.4% | Quality degraded |
+## Features
+- 💬 **Streaming Chat** — Real-time conversation with live tok/s stats
+- 📊 **Benchmark** — Greedy generation with detailed performance metrics
+- 📈 **Paper Results** — Published comparison table from the technical report
+- 🏗️ **Architecture** — How ternary weights enable multiplication-free inference
+- ⚙️ **System Info** — Live CPU/memory stats
+## How it works
+1. **Docker build** compiles bitnet.cpp with I2_S kernel optimizations
+2. **Container startup** launches `llama-server` on localhost:8080
+3. **Gradio app** connects via OpenAI-compatible streaming API
+4. **All inference** happens on CPU with addition-only matrix operations
+## References
+- 📄 [Technical Report](https://arxiv.org/abs/2504.12285)
+- 📄 [bitnet.cpp Paper](https://arxiv.org/abs/2502.11880)
+- 🤗 [Model Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T)
+- 🤗 [GGUF Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf)
+- 💻 [bitnet.cpp](https://github.com/microsoft/BitNet) (38K+ ⭐)