gpt2
bigsmall
compression
lossless
wpferrell commited on
Commit
6fd0437
·
verified ·
1 Parent(s): 7214371

Update model card with streaming loader and benchmark info

Browse files
Files changed (1) hide show
  1. README.md +42 -26
README.md CHANGED
@@ -1,50 +1,66 @@
1
  ---
2
- license: mit
3
  tags:
4
- - bigsmall
5
- - compression
6
- - lossless
7
- - gpt2
8
  ---
9
 
10
  # GPT-2 (BigSmall compressed)
11
 
12
- This is GPT-2 compressed with [BigSmall](https://github.com/wpferrell/Bigsmall) lossless neural network weight compression.
13
 
14
- The weights are **bit-for-bit identical** to the original after decompression. No quality degradation, no accuracy loss.
15
 
16
- ## Usage
17
 
18
- ```python
19
  pip install bigsmall
20
  ```
21
 
 
 
22
  ```python
23
- import bigsmall
24
  from transformers import GPT2LMHeadModel, GPT2Tokenizer
25
 
26
- # Load compressed weights
27
- state_dict = bigsmall.from_pretrained("wpferrell/gpt2-bigsmall")
28
-
29
- # Load into model
30
- model = GPT2LMHeadModel.from_pretrained("gpt2", state_dict=state_dict)
31
- tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
32
 
33
- # Run inference - identical to original GPT-2
34
  inputs = tokenizer("Hello, I'm a language model", return_tensors="pt")
35
- outputs = model.generate(**inputs, max_new_tokens=20)
36
  print(tokenizer.decode(outputs[0]))
37
  ```
38
 
39
- ## Compression stats
40
- - Original size: ~548 MB
41
- - Compressed size: ~414 MB
42
- - Ratio: 75.53% (lossless)
43
- - Format: FP32
44
- - Round-trip verified: 160/160 tensors md5-identical
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ## About BigSmall
47
- BigSmall compresses model weights losslessly. Unlike quantization, the decompressed weights are bit-for-bit identical to the originals. Supports BF16, FP16, FP32, FP64, INT8 formats across LLMs and diffusion models.
48
 
 
 
 
49
  - PyPI: `pip install bigsmall`
50
- - GitHub: https://github.com/wpferrell/Bigsmall
 
1
  ---
2
+ license: apache-2.0
3
  tags:
4
+ - bigsmall
5
+ - compression
6
+ - lossless
7
+ - gpt2
8
  ---
9
 
10
  # GPT-2 (BigSmall compressed)
11
 
12
+ **548MB 414MB (75.5%). Bit-identical. Under 500MB peak RAM with streaming.**
13
 
14
+ This is GPT-2 117M compressed with [BigSmall](https://github.com/wpferrell/Bigsmall) lossless neural network weight compression. Not quantization. Every weight is bit-identical to the original.
15
 
16
+ ## Install
17
 
18
+ ```bash
19
  pip install bigsmall
20
  ```
21
 
22
+ ## Load and run inference
23
+
24
  ```python
25
+ from bigsmall import StreamingLoader
26
  from transformers import GPT2LMHeadModel, GPT2Tokenizer
27
 
28
+ # Decompress one layer at a time — under 500MB peak RAM
29
+ loader = StreamingLoader("wpferrell/gpt2-bigsmall")
30
+ model = loader.load_model(GPT2LMHeadModel)
31
+ tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
 
 
32
 
 
33
  inputs = tokenizer("Hello, I'm a language model", return_tensors="pt")
34
+ outputs = model.generate(**inputs, max_new_tokens=50)
35
  print(tokenizer.decode(outputs[0]))
36
  ```
37
 
38
+ ## Or decompress to disk first
39
+
40
+ ```python
41
+ from bigsmall import from_pretrained
42
+ model = from_pretrained("wpferrell/gpt2-bigsmall", model_class=GPT2LMHeadModel)
43
+ ```
44
+
45
+ ## What's inside
46
+
47
+ | File | Original | Compressed | Ratio |
48
+ |------|----------|------------|-------|
49
+ | model.safetensors (FP32) | 548 MB | 414 MB | 75.5% |
50
+
51
+ Verified lossless: md5 of every weight tensor matches original after decompression.
52
+
53
+ ## vs other compression tools
54
+
55
+ | Tool | BF16 Ratio | Inference Overhead | Hardware |
56
+ |------|------------|-------------------|---------|
57
+ | ZipNN | ~83% | None | CPU |
58
+ | DFloat11 | ~70% | ~2x at batch=1 | CUDA only |
59
+ | **BigSmall** | **59.8%** | **None** | **CPU + GPU** |
60
 
61
  ## About BigSmall
 
62
 
63
+ BigSmall compresses at the Shannon entropy floor for neural network weights. It detects the float format automatically (FP32, BF16, FP16, FP8, FP4) and applies the optimal lossless codec for each tensor. Streaming loader decompresses one layer at a time directly into VRAM — peak RAM stays under 2GB even for 7B models.
64
+
65
+ - GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
66
  - PyPI: `pip install bigsmall`