Update model card with streaming loader and benchmark info
Browse files
README.md
CHANGED
|
@@ -1,41 +1,71 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
tags:
|
| 4 |
-
- bigsmall
|
| 5 |
-
- compression
|
| 6 |
-
- lossless
|
| 7 |
-
- mistral
|
| 8 |
---
|
| 9 |
-
# Mistral 7B Instruct v0.3 (BigSmall compressed)
|
| 10 |
|
| 11 |
-
Mistral 7B Instruct
|
| 12 |
|
| 13 |
-
**14 GB → 9.3 GB.
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
pip install bigsmall
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
```python
|
| 19 |
-
|
| 20 |
-
from transformers import
|
| 21 |
|
| 22 |
-
|
| 23 |
-
|
|
|
|
| 24 |
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
```
|
| 26 |
|
| 27 |
-
Or
|
|
|
|
| 28 |
```python
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
```
|
| 33 |
|
| 34 |
## Compression stats
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
-
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
## About BigSmall
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
tags:
|
| 4 |
+
- bigsmall
|
| 5 |
+
- compression
|
| 6 |
+
- lossless
|
| 7 |
+
- mistral
|
| 8 |
---
|
|
|
|
| 9 |
|
| 10 |
+
# Mistral 7B Instruct (BigSmall compressed)
|
| 11 |
|
| 12 |
+
**14 GB → 9.3 GB. Under 2 GB peak RAM. Full quality — not quantization.**
|
| 13 |
|
| 14 |
+
This is Mistral-7B-Instruct-v0.3 compressed with [BigSmall](https://github.com/wpferrell/Bigsmall) — lossless neural network weight compression. Every weight is bit-identical to the original. No accuracy loss whatsoever.
|
| 15 |
+
|
| 16 |
+
## Install
|
| 17 |
+
|
| 18 |
+
```bash
|
| 19 |
pip install bigsmall
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
## Load and run inference (streaming — under 2GB peak RAM)
|
| 23 |
|
| 24 |
```python
|
| 25 |
+
from bigsmall import StreamingLoader
|
| 26 |
+
from transformers import MistralForCausalLM, AutoTokenizer
|
| 27 |
|
| 28 |
+
# Streams one layer at a time — 9.3GB download, under 2GB peak RAM
|
| 29 |
+
loader = StreamingLoader("wpferrell/mistral-7b-instruct-bigsmall")
|
| 30 |
+
model = loader.load_model(MistralForCausalLM)
|
| 31 |
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
|
| 32 |
+
|
| 33 |
+
messages = [{"role": "user", "content": "Explain lossless compression in one paragraph."}]
|
| 34 |
+
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
|
| 35 |
+
outputs = model.generate(inputs, max_new_tokens=200)
|
| 36 |
+
print(tokenizer.decode(outputs[0]))
|
| 37 |
```
|
| 38 |
|
| 39 |
+
## Or decompress to disk first
|
| 40 |
+
|
| 41 |
```python
|
| 42 |
+
from bigsmall import from_pretrained
|
| 43 |
+
from transformers import MistralForCausalLM
|
| 44 |
+
model = from_pretrained("wpferrell/mistral-7b-instruct-bigsmall", model_class=MistralForCausalLM)
|
| 45 |
```
|
| 46 |
|
| 47 |
## Compression stats
|
| 48 |
+
|
| 49 |
+
| Metric | Value |
|
| 50 |
+
|--------|-------|
|
| 51 |
+
| Original size | 14.2 GB |
|
| 52 |
+
| Compressed size | 9.3 GB |
|
| 53 |
+
| Ratio | 65.6% (BF16) |
|
| 54 |
+
| Format | BF16 → BigSmall (.bs shards) |
|
| 55 |
+
| Lossless verified | ✅ md5 every tensor |
|
| 56 |
+
| Peak RAM (streaming) | < 2 GB |
|
| 57 |
+
|
| 58 |
+
## vs other compression tools
|
| 59 |
+
|
| 60 |
+
| Tool | BF16 Ratio | Inference Overhead | Hardware |
|
| 61 |
+
|------|------------|-------------------|---------|
|
| 62 |
+
| ZipNN | ~83% | None | CPU |
|
| 63 |
+
| DFloat11 | ~70% | ~2x at batch=1 | CUDA only |
|
| 64 |
+
| **BigSmall** | **59.8%** | **None** | **CPU + GPU** |
|
| 65 |
|
| 66 |
## About BigSmall
|
| 67 |
+
|
| 68 |
+
BigSmall compresses neural network weights at the Shannon entropy floor. It detects float format automatically (FP32, BF16, FP16, FP8, FP4) and applies the optimal lossless codec per tensor. The streaming loader decompresses one transformer layer at a time directly into VRAM — making 7B+ models accessible on hardware that couldn't otherwise load them.
|
| 69 |
+
|
| 70 |
+
- GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
|
| 71 |
+
- PyPI: `pip install bigsmall`
|