--- title: "🧬 BitNet b1.58 2B4T — CPU Inference (bitnet.cpp)" emoji: ⚡ colorFrom: indigo colorTo: blue sdk: docker pinned: false models: - microsoft/bitnet-b1.58-2B-4T - microsoft/bitnet-b1.58-2B-4T-gguf tags: - bitnet - 1-bit - cpu-inference - ternary-weights - bitnet-cpp - llama-server short_description: "1-bit LLM on CPU with bitnet.cpp — 4-10x faster" --- # ⚡ BitNet b1.58 2B4T — CPU Inference with bitnet.cpp The **fast** version. This Space compiles Microsoft's [bitnet.cpp](https://github.com/microsoft/BitNet) from source and runs the official GGUF model with the **I2_S lossless kernel** — achieving **4-10× speedup** over the transformers-based version. ## Architecture ``` ┌─────────────────────────────────────────────────┐ │ Gradio UI (Python) │ │ ↕ OpenAI-compatible streaming API │ │ llama-server (bitnet.cpp, port 8080) │ │ ↕ I2_S ternary kernel (additions only) │ │ ggml-model-i2_s.gguf (1.1 GB, lossless) │ └─────────────────────────────────────────────────┘ ``` ## Why bitnet.cpp? | Engine | Speed | Lossless? | Notes | |---|---|---|---| | **bitnet.cpp I2_S** | ~10-15 tok/s | ✅ 100% | Optimized ternary kernel | | transformers (bf16) | ~1.4 tok/s | ✅ | No ternary optimization | | llama.cpp TQ1_0 | ~2 tok/s | ❌ 1.4% | Quality degraded | ## Features - 💬 **Streaming Chat** — Real-time conversation with live tok/s stats - 📊 **Benchmark** — Greedy generation with detailed performance metrics - 📈 **Paper Results** — Published comparison table from the technical report - 🏗️ **Architecture** — How ternary weights enable multiplication-free inference - ⚙️ **System Info** — Live CPU/memory stats ## How it works 1. **Docker build** compiles bitnet.cpp with I2_S kernel optimizations 2. **Container startup** launches `llama-server` on localhost:8080 3. **Gradio app** connects via OpenAI-compatible streaming API 4. **All inference** happens on CPU with addition-only matrix operations ## References - 📄 [Technical Report](https://arxiv.org/abs/2504.12285) - 📄 [bitnet.cpp Paper](https://arxiv.org/abs/2502.11880) - 🤗 [Model Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T) - 🤗 [GGUF Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf) - 💻 [bitnet.cpp](https://github.com/microsoft/BitNet) (38K+ ⭐)