DMX Delta for Qwen2.5-3B-Instruct

This repository contains a DMX-encoded delta for Qwen2.5-3B-Instruct, enabling near-lossless reconstruction of the full model from a compatible base checkpoint.

DMX reduces storage requirements by 55-80% while preserving model quality (+0.03-0.16% perplexity), using structure-aware integer transformations rather than generic byte-level compression.

Unlike traditional compression, DMX operates at the model level β€” storing structured weight deltas that can be deterministically reconstructed. This enables efficient distribution and versioning of model variants without duplicating full checkpoints.

Key properties:

  • Near-lossless reconstruction β€” verified roundtrip accuracy
  • No retraining required β€” works on pretrained safetensors
  • Deterministic decode β€” exact or bounded-error recovery
  • Delta-based storage β€” distribute only what changed

DMX extends delta compression into a system for managing model evolution, with support for efficient chaining and adaptive rebasing to maintain high compression efficiency across model families.

Files

File Size Precision Savings vs Full Model
instruct.dmxd 2.88 GB int16 (near-lossless) 78.8%
instruct-int32.dmxd 4.39 GB int32 (practically lossless) 67.7%
Full model (reference) 13.59 GB β€” β€”

Verified Reconstruction Quality

Tier Cosine Similarity RelL2 Error Max Tensor Error
int16 0.9999999680 3.5e-4 1.4e-3
int32 1.0000000007 3.2e-9 2.98e-8

How to Use

1. Install DMX

pip install dmx-compress transformers

2. Download base model and export to single safetensors

python -c "
from transformers import AutoModelForCausalLM
from safetensors.torch import save_file
import torch

m = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-3B', torch_dtype=torch.float32)
save_file({k: v.clone() for k, v in m.state_dict().items()}, 'qwen2.5-3b-base.safetensors')
"

3. Download and apply DMX delta

# Download the delta (2.9 GB instead of 13.6 GB)
huggingface-cli download Senat1/dmx-qwen2.5-3b-instruct-delta instruct.dmxd

# Reconstruct the full Instruct model
dmx delta-reconstruct qwen2.5-3b-base.safetensors instruct.dmxd qwen2.5-3b-instruct.safetensors

4. Load reconstructed model

from safetensors.torch import load_file

weights = load_file("qwen2.5-3b-instruct.safetensors")
# Load into your framework of choice

Base Model

This delta requires Qwen/Qwen2.5-3B as the base checkpoint. The delta is locked to this specific base β€” reconstruction will fail if a different base is used.

Note on Multi-Shard Models

Qwen 2.5 3B is normally distributed as 2 shards on HuggingFace. This delta was created from a merged single-file export. Multi-shard delta support (automatic per-shard matching) is on the DMX roadmap.

About DMX

DMX is a structure-aware neural network weight compression format. It achieves 67-87% delta compression by exploiting structural relationships between weight tensors, enabling efficient distribution of model variants as small diffs from a shared base.

Patent Pending. MIT License. (c) 2026 William J. Riley.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Senat1/dmx-qwen2.5-3b-instruct-delta

Base model

Qwen/Qwen2.5-3B
Finetuned
(368)
this model