wpferrell commited on
Commit
131e635
·
verified ·
1 Parent(s): 4a6740d

Fix comparison table: correct BF16 ratio, add ZipServ, add streaming example

Browse files
Files changed (1) hide show
  1. README.md +11 -8
README.md CHANGED
@@ -52,20 +52,23 @@ model = from_pretrained("wpferrell/mistral-7b-instruct-bigsmall", model_class=Mi
52
  | Compressed size | 9.3 GB |
53
  | Ratio | 65.6% (BF16) |
54
  | Format | BF16 → BigSmall (.bs shards) |
55
- | Lossless verified | md5 every tensor |
56
  | Peak RAM (streaming) | < 2 GB |
57
 
58
- ## vs other compression tools
59
 
60
- | Tool | BF16 Ratio | Inference Overhead | Hardware |
61
- |------|------------|-------------------|---------|
62
- | ZipNN | ~83% | None | CPU |
63
- | DFloat11 | ~70% | ~2x at batch=1 | CUDA only |
64
- | **BigSmall** | **59.8%** | **None** | **CPU + GPU** |
 
 
 
65
 
66
  ## About BigSmall
67
 
68
- BigSmall compresses neural network weights at the Shannon entropy floor. It detects float format automatically (FP32, BF16, FP16, FP8, FP4) and applies the optimal lossless codec per tensor. The streaming loader decompresses one transformer layer at a time directly into VRAM — making 7B+ models accessible on hardware that couldn't otherwise load them.
69
 
70
  - GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
71
  - PyPI: `pip install bigsmall`
 
52
  | Compressed size | 9.3 GB |
53
  | Ratio | 65.6% (BF16) |
54
  | Format | BF16 → BigSmall (.bs shards) |
55
+ | Lossless verified | md5 every tensor |
56
  | Peak RAM (streaming) | < 2 GB |
57
 
58
+ ## Comparison
59
 
60
+ | Tool | BF16 Ratio | FP32 Ratio | Inference Overhead | Hardware |
61
+ |------|------------|------------|-------------------|---------|
62
+ | [ZipNN](https://arxiv.org/abs/2411.05239) | 67% | 83% | None | CPU |
63
+ | [DFloat11](https://arxiv.org/abs/2504.11651) | ~70% | BF16 only | ~2x at batch=1 | CUDA only |
64
+ | [ZipServ](https://arxiv.org/abs/2603.17435) | ~70% | BF16 only | 1.22x faster | GDDR GPU |
65
+ | **BigSmall** | **65.6%** | **75.5%** | **None** | **CPU + any GPU** |
66
+
67
+ *Lower ratio = better compression.*
68
 
69
  ## About BigSmall
70
 
71
+ BigSmall compresses at the joint entropy floor for neural network weights. It codes sign+exponent jointly and mantissa conditioned on exponent, achieving the information-theoretic minimum. The streaming loader decompresses one transformer layer at a time directly into VRAM — making 7B+ models accessible on hardware that couldn't otherwise load them.
72
 
73
  - GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
74
  - PyPI: `pip install bigsmall`