Spaces:
Running
Running
docs: add cross-link to fast bitnet.cpp version
Browse files
README.md
CHANGED
|
@@ -22,6 +22,8 @@ short_description: "Chat with Microsoft's 1-bit LLM on CPU β no GPU needed"
|
|
| 22 |
|
| 23 |
An interactive demo of **Microsoft Research's first open-source native 1-bit Large Language Model**.
|
| 24 |
|
|
|
|
|
|
|
| 25 |
## What makes this special?
|
| 26 |
|
| 27 |
| Feature | Detail |
|
|
@@ -44,13 +46,15 @@ Since weights are only -1, 0, or +1, matrix multiplication becomes pure **additi
|
|
| 44 |
- ποΈ **Architecture** β Visual explainer of how BitNet b1.58 differs from standard Transformers
|
| 45 |
- βοΈ **System** β Live hardware & memory stats
|
| 46 |
|
| 47 |
-
##
|
| 48 |
|
| 49 |
-
This demo uses the `transformers` library, which does **not** include the specialized
|
|
|
|
|
|
|
| 50 |
|
| 51 |
## References
|
| 52 |
|
| 53 |
-
- π [Technical Report](https://arxiv.org/abs/2504.12285)
|
|
|
|
| 54 |
- π€ [Model Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T)
|
| 55 |
- π» [bitnet.cpp](https://github.com/microsoft/BitNet) (38K+ β)
|
| 56 |
-
- π¦ [GGUF Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf)
|
|
|
|
| 22 |
|
| 23 |
An interactive demo of **Microsoft Research's first open-source native 1-bit Large Language Model**.
|
| 24 |
|
| 25 |
+
> β‘ **Want the fast version?** See [knoxel/bitnet-cpp-explorer](https://huggingface.co/spaces/knoxel/bitnet-cpp-explorer) β same model but powered by bitnet.cpp's optimized ternary kernels (4-10Γ faster).
|
| 26 |
+
|
| 27 |
## What makes this special?
|
| 28 |
|
| 29 |
| Feature | Detail |
|
|
|
|
| 46 |
- ποΈ **Architecture** β Visual explainer of how BitNet b1.58 differs from standard Transformers
|
| 47 |
- βοΈ **System** β Live hardware & memory stats
|
| 48 |
|
| 49 |
+
## Performance note
|
| 50 |
|
| 51 |
+
This demo uses the `transformers` library (~1.4 tok/s), which does **not** include the specialized bitnet.cpp kernels. For the paper's reported CPU latency (29ms/token = ~34 tok/s), see:
|
| 52 |
+
- β‘ [Fast version with bitnet.cpp](https://huggingface.co/spaces/knoxel/bitnet-cpp-explorer)
|
| 53 |
+
- π» [bitnet.cpp repo](https://github.com/microsoft/BitNet) with the [GGUF weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf)
|
| 54 |
|
| 55 |
## References
|
| 56 |
|
| 57 |
+
- π [Technical Report](https://arxiv.org/abs/2504.12285) β BitNet b1.58 2B4T
|
| 58 |
+
- π [bitnet.cpp Paper](https://arxiv.org/abs/2502.11880) β Optimized inference kernels
|
| 59 |
- π€ [Model Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T)
|
| 60 |
- π» [bitnet.cpp](https://github.com/microsoft/BitNet) (38K+ β)
|
|
|