Fix comparison table: correct BF16 ratio, add ZipServ, add streaming example
Browse files
README.md
CHANGED
|
@@ -52,20 +52,23 @@ model = from_pretrained("wpferrell/mistral-7b-instruct-bigsmall", model_class=Mi
|
|
| 52 |
| Compressed size | 9.3 GB |
|
| 53 |
| Ratio | 65.6% (BF16) |
|
| 54 |
| Format | BF16 → BigSmall (.bs shards) |
|
| 55 |
-
| Lossless verified |
|
| 56 |
| Peak RAM (streaming) | < 2 GB |
|
| 57 |
|
| 58 |
-
##
|
| 59 |
|
| 60 |
-
| Tool | BF16 Ratio | Inference Overhead | Hardware |
|
| 61 |
-
|------|------------|-------------------|---------|
|
| 62 |
-
| ZipNN |
|
| 63 |
-
| DFloat11 | ~70% | ~2x at batch=1 | CUDA only |
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
## About BigSmall
|
| 67 |
|
| 68 |
-
BigSmall compresses
|
| 69 |
|
| 70 |
- GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
|
| 71 |
- PyPI: `pip install bigsmall`
|
|
|
|
| 52 |
| Compressed size | 9.3 GB |
|
| 53 |
| Ratio | 65.6% (BF16) |
|
| 54 |
| Format | BF16 → BigSmall (.bs shards) |
|
| 55 |
+
| Lossless verified | md5 every tensor |
|
| 56 |
| Peak RAM (streaming) | < 2 GB |
|
| 57 |
|
| 58 |
+
## Comparison
|
| 59 |
|
| 60 |
+
| Tool | BF16 Ratio | FP32 Ratio | Inference Overhead | Hardware |
|
| 61 |
+
|------|------------|------------|-------------------|---------|
|
| 62 |
+
| [ZipNN](https://arxiv.org/abs/2411.05239) | 67% | 83% | None | CPU |
|
| 63 |
+
| [DFloat11](https://arxiv.org/abs/2504.11651) | ~70% | BF16 only | ~2x at batch=1 | CUDA only |
|
| 64 |
+
| [ZipServ](https://arxiv.org/abs/2603.17435) | ~70% | BF16 only | 1.22x faster | GDDR GPU |
|
| 65 |
+
| **BigSmall** | **65.6%** | **75.5%** | **None** | **CPU + any GPU** |
|
| 66 |
+
|
| 67 |
+
*Lower ratio = better compression.*
|
| 68 |
|
| 69 |
## About BigSmall
|
| 70 |
|
| 71 |
+
BigSmall compresses at the joint entropy floor for neural network weights. It codes sign+exponent jointly and mantissa conditioned on exponent, achieving the information-theoretic minimum. The streaming loader decompresses one transformer layer at a time directly into VRAM — making 7B+ models accessible on hardware that couldn't otherwise load them.
|
| 72 |
|
| 73 |
- GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
|
| 74 |
- PyPI: `pip install bigsmall`
|