DMX Delta for Qwen2.5-3B-Instruct

This repository contains a DMX-encoded delta for Qwen2.5-3B-Instruct, enabling near-lossless reconstruction of the full model from a compatible base checkpoint.

DMX reduces storage requirements by 55-80% while preserving model quality (+0.03-0.16% perplexity), using structure-aware integer transformations rather than generic byte-level compression.

Unlike traditional compression, DMX operates at the model level — storing structured weight deltas that can be deterministically reconstructed. This enables efficient distribution and versioning of model variants without duplicating full checkpoints.

Key properties:

Near-lossless reconstruction — verified roundtrip accuracy
No retraining required — works on pretrained safetensors
Deterministic decode — exact or bounded-error recovery
Delta-based storage — distribute only what changed

DMX extends delta compression into a system for managing model evolution, with support for efficient chaining and adaptive rebasing to maintain high compression efficiency across model families.

Files

File	Size	Precision	Savings vs Full Model
`instruct.dmxd`	2.88 GB	int16 (near-lossless)	78.8%
`instruct-int32.dmxd`	4.39 GB	int32 (practically lossless)	67.7%
Full model (reference)	13.59 GB	—	—

Verified Reconstruction Quality

Tier	Cosine Similarity	RelL2 Error	Max Tensor Error
int16	0.9999999680	3.5e-4	1.4e-3
int32	1.0000000007	3.2e-9	2.98e-8

How to Use

1. Install DMX

pip install dmx-compress transformers

2. Download base model and export to single safetensors

python -c "
from transformers import AutoModelForCausalLM
from safetensors.torch import save_file
import torch

m = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-3B', torch_dtype=torch.float32)
save_file({k: v.clone() for k, v in m.state_dict().items()}, 'qwen2.5-3b-base.safetensors')
"

3. Download and apply DMX delta

# Download the delta (2.9 GB instead of 13.6 GB)
huggingface-cli download Senat1/dmx-qwen2.5-3b-instruct-delta instruct.dmxd

# Reconstruct the full Instruct model
dmx delta-reconstruct qwen2.5-3b-base.safetensors instruct.dmxd qwen2.5-3b-instruct.safetensors

4. Load reconstructed model

from safetensors.torch import load_file

weights = load_file("qwen2.5-3b-instruct.safetensors")
# Load into your framework of choice

Base Model

This delta requires Qwen/Qwen2.5-3B as the base checkpoint. The delta is locked to this specific base — reconstruction will fail if a different base is used.

Note on Multi-Shard Models

Qwen 2.5 3B is normally distributed as 2 shards on HuggingFace. This delta was created from a merged single-file export. Multi-shard delta support (automatic per-shard matching) is on the DMX roadmap.

About DMX

DMX is a structure-aware neural network weight compression format. It achieves 67-87% delta compression by exploiting structural relationships between weight tensors, enabling efficient distribution of model variants as small diffs from a shared base.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Senat1/dmx-qwen2.5-3b-instruct-delta

Base model

Qwen/Qwen2.5-3B

Finetuned

(368)

this model