wpferrell commited on
Commit
4a6740d
·
verified ·
1 Parent(s): 9dd2f01

Update model card with streaming loader and benchmark info

Browse files
Files changed (1) hide show
  1. README.md +51 -21
README.md CHANGED
@@ -1,41 +1,71 @@
1
  ---
2
  license: apache-2.0
3
  tags:
4
- - bigsmall
5
- - compression
6
- - lossless
7
- - mistral
8
  ---
9
- # Mistral 7B Instruct v0.3 (BigSmall compressed)
10
 
11
- Mistral 7B Instruct v0.3 compressed with [BigSmall](https://github.com/wpferrell/Bigsmall) — lossless neural network weight compression.
12
 
13
- **14 GB → 9.3 GB. Bit-for-bit identical weights. No quality loss.**
14
 
15
- ## Usage
 
 
 
 
16
  pip install bigsmall
 
 
 
17
 
18
  ```python
19
- import bigsmall
20
- from transformers import AutoModelForCausalLM, AutoTokenizer
21
 
22
- state_dict = bigsmall.from_pretrained("wpferrell/mistral-7b-instruct-bigsmall")
23
- model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3", state_dict=state_dict)
 
24
  tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
 
 
 
 
 
25
  ```
26
 
27
- Or stream layer-by-layer with under 2GB peak RAM:
 
28
  ```python
29
- with bigsmall.StreamingLoader("model.bs", device="cuda") as loader:
30
- for layer_idx, tensors in loader.iter_layers():
31
- pass
32
  ```
33
 
34
  ## Compression stats
35
- - Original: ~14.2 GB (BF16)
36
- - Compressed: ~9.3 GB
37
- - Ratio: ~65.6% (lossless)
38
- - All tensors md5-verified bit-identical
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## About BigSmall
41
- pip install bigsmall | https://github.com/wpferrell/Bigsmall
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  tags:
4
+ - bigsmall
5
+ - compression
6
+ - lossless
7
+ - mistral
8
  ---
 
9
 
10
+ # Mistral 7B Instruct (BigSmall compressed)
11
 
12
+ **14 GB → 9.3 GB. Under 2 GB peak RAM. Full quality — not quantization.**
13
 
14
+ This is Mistral-7B-Instruct-v0.3 compressed with [BigSmall](https://github.com/wpferrell/Bigsmall) — lossless neural network weight compression. Every weight is bit-identical to the original. No accuracy loss whatsoever.
15
+
16
+ ## Install
17
+
18
+ ```bash
19
  pip install bigsmall
20
+ ```
21
+
22
+ ## Load and run inference (streaming — under 2GB peak RAM)
23
 
24
  ```python
25
+ from bigsmall import StreamingLoader
26
+ from transformers import MistralForCausalLM, AutoTokenizer
27
 
28
+ # Streams one layer at a time — 9.3GB download, under 2GB peak RAM
29
+ loader = StreamingLoader("wpferrell/mistral-7b-instruct-bigsmall")
30
+ model = loader.load_model(MistralForCausalLM)
31
  tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
32
+
33
+ messages = [{"role": "user", "content": "Explain lossless compression in one paragraph."}]
34
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
35
+ outputs = model.generate(inputs, max_new_tokens=200)
36
+ print(tokenizer.decode(outputs[0]))
37
  ```
38
 
39
+ ## Or decompress to disk first
40
+
41
  ```python
42
+ from bigsmall import from_pretrained
43
+ from transformers import MistralForCausalLM
44
+ model = from_pretrained("wpferrell/mistral-7b-instruct-bigsmall", model_class=MistralForCausalLM)
45
  ```
46
 
47
  ## Compression stats
48
+
49
+ | Metric | Value |
50
+ |--------|-------|
51
+ | Original size | 14.2 GB |
52
+ | Compressed size | 9.3 GB |
53
+ | Ratio | 65.6% (BF16) |
54
+ | Format | BF16 → BigSmall (.bs shards) |
55
+ | Lossless verified | ✅ md5 every tensor |
56
+ | Peak RAM (streaming) | < 2 GB |
57
+
58
+ ## vs other compression tools
59
+
60
+ | Tool | BF16 Ratio | Inference Overhead | Hardware |
61
+ |------|------------|-------------------|---------|
62
+ | ZipNN | ~83% | None | CPU |
63
+ | DFloat11 | ~70% | ~2x at batch=1 | CUDA only |
64
+ | **BigSmall** | **59.8%** | **None** | **CPU + GPU** |
65
 
66
  ## About BigSmall
67
+
68
+ BigSmall compresses neural network weights at the Shannon entropy floor. It detects float format automatically (FP32, BF16, FP16, FP8, FP4) and applies the optimal lossless codec per tensor. The streaming loader decompresses one transformer layer at a time directly into VRAM — making 7B+ models accessible on hardware that couldn't otherwise load them.
69
+
70
+ - GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
71
+ - PyPI: `pip install bigsmall`