DMX Delta for Qwen2.5-3B-Instruct
This repository contains a DMX-encoded delta for Qwen2.5-3B-Instruct, enabling near-lossless reconstruction of the full model from a compatible base checkpoint.
DMX reduces storage requirements by 55-80% while preserving model quality (+0.03-0.16% perplexity), using structure-aware integer transformations rather than generic byte-level compression.
Unlike traditional compression, DMX operates at the model level β storing structured weight deltas that can be deterministically reconstructed. This enables efficient distribution and versioning of model variants without duplicating full checkpoints.
Key properties:
- Near-lossless reconstruction β verified roundtrip accuracy
- No retraining required β works on pretrained safetensors
- Deterministic decode β exact or bounded-error recovery
- Delta-based storage β distribute only what changed
DMX extends delta compression into a system for managing model evolution, with support for efficient chaining and adaptive rebasing to maintain high compression efficiency across model families.
Files
| File | Size | Precision | Savings vs Full Model |
|---|---|---|---|
instruct.dmxd |
2.88 GB | int16 (near-lossless) | 78.8% |
instruct-int32.dmxd |
4.39 GB | int32 (practically lossless) | 67.7% |
| Full model (reference) | 13.59 GB | β | β |
Verified Reconstruction Quality
| Tier | Cosine Similarity | RelL2 Error | Max Tensor Error |
|---|---|---|---|
| int16 | 0.9999999680 | 3.5e-4 | 1.4e-3 |
| int32 | 1.0000000007 | 3.2e-9 | 2.98e-8 |
How to Use
1. Install DMX
pip install dmx-compress transformers
2. Download base model and export to single safetensors
python -c "
from transformers import AutoModelForCausalLM
from safetensors.torch import save_file
import torch
m = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-3B', torch_dtype=torch.float32)
save_file({k: v.clone() for k, v in m.state_dict().items()}, 'qwen2.5-3b-base.safetensors')
"
3. Download and apply DMX delta
# Download the delta (2.9 GB instead of 13.6 GB)
huggingface-cli download Senat1/dmx-qwen2.5-3b-instruct-delta instruct.dmxd
# Reconstruct the full Instruct model
dmx delta-reconstruct qwen2.5-3b-base.safetensors instruct.dmxd qwen2.5-3b-instruct.safetensors
4. Load reconstructed model
from safetensors.torch import load_file
weights = load_file("qwen2.5-3b-instruct.safetensors")
# Load into your framework of choice
Base Model
This delta requires Qwen/Qwen2.5-3B as the base checkpoint. The delta is locked to this specific base β reconstruction will fail if a different base is used.
Note on Multi-Shard Models
Qwen 2.5 3B is normally distributed as 2 shards on HuggingFace. This delta was created from a merged single-file export. Multi-shard delta support (automatic per-shard matching) is on the DMX roadmap.
About DMX
DMX is a structure-aware neural network weight compression format. It achieves 67-87% delta compression by exploiting structural relationships between weight tensors, enabling efficient distribution of model variants as small diffs from a shared base.
Patent Pending. MIT License. (c) 2026 William J. Riley.
Model tree for Senat1/dmx-qwen2.5-3b-instruct-delta
Base model
Qwen/Qwen2.5-3B