knoxel commited on
Commit
7039448
Β·
verified Β·
1 Parent(s): 70d10ec

docs: add cross-link to fast bitnet.cpp version

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -22,6 +22,8 @@ short_description: "Chat with Microsoft's 1-bit LLM on CPU β€” no GPU needed"
22
 
23
  An interactive demo of **Microsoft Research's first open-source native 1-bit Large Language Model**.
24
 
 
 
25
  ## What makes this special?
26
 
27
  | Feature | Detail |
@@ -44,13 +46,15 @@ Since weights are only -1, 0, or +1, matrix multiplication becomes pure **additi
44
  - πŸ—οΈ **Architecture** β€” Visual explainer of how BitNet b1.58 differs from standard Transformers
45
  - βš™οΈ **System** β€” Live hardware & memory stats
46
 
47
- ## ⚠️ Performance note
48
 
49
- This demo uses the `transformers` library, which does **not** include the specialized `bitnet.cpp` kernels. For the paper's reported CPU latency (29ms/token), use [bitnet.cpp](https://github.com/microsoft/BitNet) with the [GGUF weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf).
 
 
50
 
51
  ## References
52
 
53
- - πŸ“„ [Technical Report](https://arxiv.org/abs/2504.12285)
 
54
  - πŸ€— [Model Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T)
55
  - πŸ’» [bitnet.cpp](https://github.com/microsoft/BitNet) (38K+ ⭐)
56
- - πŸ“¦ [GGUF Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf)
 
22
 
23
  An interactive demo of **Microsoft Research's first open-source native 1-bit Large Language Model**.
24
 
25
+ > ⚑ **Want the fast version?** See [knoxel/bitnet-cpp-explorer](https://huggingface.co/spaces/knoxel/bitnet-cpp-explorer) β€” same model but powered by bitnet.cpp's optimized ternary kernels (4-10Γ— faster).
26
+
27
  ## What makes this special?
28
 
29
  | Feature | Detail |
 
46
  - πŸ—οΈ **Architecture** β€” Visual explainer of how BitNet b1.58 differs from standard Transformers
47
  - βš™οΈ **System** β€” Live hardware & memory stats
48
 
49
+ ## Performance note
50
 
51
+ This demo uses the `transformers` library (~1.4 tok/s), which does **not** include the specialized bitnet.cpp kernels. For the paper's reported CPU latency (29ms/token = ~34 tok/s), see:
52
+ - ⚑ [Fast version with bitnet.cpp](https://huggingface.co/spaces/knoxel/bitnet-cpp-explorer)
53
+ - πŸ’» [bitnet.cpp repo](https://github.com/microsoft/BitNet) with the [GGUF weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf)
54
 
55
  ## References
56
 
57
+ - πŸ“„ [Technical Report](https://arxiv.org/abs/2504.12285) β€” BitNet b1.58 2B4T
58
+ - πŸ“„ [bitnet.cpp Paper](https://arxiv.org/abs/2502.11880) β€” Optimized inference kernels
59
  - πŸ€— [Model Weights](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T)
60
  - πŸ’» [bitnet.cpp](https://github.com/microsoft/BitNet) (38K+ ⭐)