wpferrell
/

mistral-7b-instruct-bigsmall

@@ -52,20 +52,23 @@ model = from_pretrained("wpferrell/mistral-7b-instruct-bigsmall", model_class=Mi
 | Compressed size | 9.3 GB |
 | Ratio | 65.6% (BF16) |
 | Format | BF16 → BigSmall (.bs shards) |
-| Lossless verified | ✅ md5 every tensor |
 | Peak RAM (streaming) | < 2 GB |
-## vs other compression tools
-| Tool | BF16 Ratio | Inference Overhead | Hardware |
-|------|------------|-------------------|---------|
-| ZipNN | ~83% | None | CPU |
-| DFloat11 | ~70% | ~2x at batch=1 | CUDA only |
-| **BigSmall** | **59.8%** | **None** | **CPU + GPU** |
 ## About BigSmall
-BigSmall compresses neural network weights at the Shannon entropy floor. It detects float format automatically (FP32, BF16, FP16, FP8, FP4) and applies the optimal lossless codec per tensor. The streaming loader decompresses one transformer layer at a time directly into VRAM — making 7B+ models accessible on hardware that couldn't otherwise load them.
 - GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
 - PyPI: `pip install bigsmall`

 | Compressed size | 9.3 GB |
 | Ratio | 65.6% (BF16) |
 | Format | BF16 → BigSmall (.bs shards) |
+| Lossless verified | md5 every tensor |
 | Peak RAM (streaming) | < 2 GB |
+## Comparison
+| Tool | BF16 Ratio | FP32 Ratio | Inference Overhead | Hardware |
+|------|------------|------------|-------------------|---------|
+| [ZipNN](https://arxiv.org/abs/2411.05239) | 67% | 83% | None | CPU |
+| [DFloat11](https://arxiv.org/abs/2504.11651) | ~70% | BF16 only | ~2x at batch=1 | CUDA only |
+| [ZipServ](https://arxiv.org/abs/2603.17435) | ~70% | BF16 only | 1.22x faster | GDDR GPU |
+| **BigSmall** | **65.6%** | **75.5%** | **None** | **CPU + any GPU** |
+*Lower ratio = better compression.*
 ## About BigSmall
+BigSmall compresses at the joint entropy floor for neural network weights. It codes sign+exponent jointly and mantissa conditioned on exponent, achieving the information-theoretic minimum. The streaming loader decompresses one transformer layer at a time directly into VRAM — making 7B+ models accessible on hardware that couldn't otherwise load them.
 - GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
 - PyPI: `pip install bigsmall`