Abstract
ZipNN, a lossless compression tailored for neural networks, reduces model size by over 50% and improves compression speed, saving substantial network traffic.
With the growth of model sizes and the scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast model compression literature deleting parts of the model weights for faster inference, we investigate a more traditional type of compression - one that represents the model in a compact form and is coupled with a decompression algorithm that returns it to its original form and size - namely lossless compression. We present ZipNN a lossless compression tailored to neural networks. Somewhat surprisingly, we show that specific lossless compression can gain significant network and storage reduction on popular models, often saving 33% and at times reducing over 50% of the model size. We investigate the source of model compressibility and introduce specialized compression variants tailored for models that further increase the effectiveness of compression. On popular models (e.g. Llama 3) ZipNN shows space savings that are over 17% better than vanilla compression while also improving compression and decompression speeds by 62%. We estimate that these methods could save over an ExaByte per month of network traffic downloaded from a large model hub like Hugging Face.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding (2026)
- Float8@2bits: Entropy Coding Enables Data-Free Model Compression (2026)
- Single-Stage Huffman Encoder for ML Compression (2026)
- FlashOptim: Optimizers for Memory Efficient Training (2026)
- ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling (2026)
- Seq2Seq2Seq: Lossless Data Compression via Discrete Latent Transformers and Reinforcement Learning (2026)
- Proxy Compression for Language Modeling (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2411.05239 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper