Qwen2.5B Quantum 4-bit (MV2 Format)

This is the quantum-quantized 4-bit version of Qwen2.5-0.5B-Instruct, optimized for the Oklo-Reactor inference system.

Model Details

Attribute Value
Base Model Qwen/Qwen2.5-0.5B-Instruct
Parameters ~0.5B
Quantization TQ4_1S (TurboQuant 4-bit with 1-bit signs)
Format MV2 (Model/ModelVariant 2) v3.1
Features zstd compression, Ed25519 signed
Size ~184 MB (compressed) / ~243 MB (uncompressed)
Bits per Weight ~4.3 BPW
Compression Ratio 1.30x vs raw TQ4_1S

What is MV2 Format?

MV2 (Model/ModelVariant 2) is a unified binary format developed for the Oklo-Reactor ecosystem. It provides:

  • Single-file distribution: All model components in one atomic file
  • Streaming compression: zstd compression reduces size by ~30%
  • Cryptographic signatures: Ed25519 signatures for authenticity verification
  • Memory-mapped loading: Zero-copy tensor access for instant startup
  • Self-contained: Includes config, tokenizer, and all tensors

MV2 Format Versions

Version Features File Extension
v3.0 Basic format .mv2
v3.1 + Compression, Signatures, Delta updates .mv2

Files in this Repository

Primary File (Recommended)

File Size Description
Qwen2.5B-quantum-4bit-v3.1.mv2 ~184 MB Complete model with compression and signatures

Extracted Components (For Reference)

These files are included for users who need individual components. They can be extracted from the MV2 file using:

python3 oklo-skill/python/oklo_skill/mv2_format_ext.py extract \
    Qwen2.5B-quantum-4bit-v3.1.mv2 ./extracted/
File Size Description
config_flattened.json ~1 KB Model architecture configuration
tokenizer.json ~6.7 MB Tokenizer vocabulary and merges
tokenizer_config.json ~7 KB Tokenizer settings
quantum_metadata.json ~100 KB Tensor metadata and index

Example Files

File Size Purpose
examples/skill-capsule.mv2 ~68 MB Example LoRA skill capsule (futurecoder training)
examples/dataset.mv2 (coming soon) Example training dataset

Usage

Using with Oklo-Reactor

# Download the model
huggingface-cli download bbearforever/quantum-Qwen2.5-4-bit \
    Qwen2.5B-quantum-4bit-v3.1.mv2 \
    --local-dir ./models

# Start Oklo-Server
cargo run --release -p oklo-server

# The server will auto-detect and load the MV2 file

Using with Python (oklo-skill)

from oklo_skill import SkillRuntime

# Load base model from MV2
runtime = SkillRuntime(
    base_model="./models/Qwen2.5B-quantum-4bit-v3.1.mv2",
    cache_size=4
)

# Run inference
result = runtime.infer(
    "Explain quantum computing in simple terms",
    temperature=0.7,
    max_tokens=500
)
print(result.response)

Extracting MV2 to Directory

# Extract if you need individual files
python3 -c "
from oklo_skill.mv2_format_ext import QuantumMV2LoaderExt
import sys

with QuantumMV2LoaderExt('Qwen2.5B-quantum-4bit-v3.1.mv2') as loader:
    loader.extract_to_directory('./extracted')
    print('Extracted to ./extracted')
"

Loading with Direct Tensor Access

from oklo_skill.mv2_format_ext import QuantumMV2LoaderExt

with QuantumMV2LoaderExt('Qwen2.5B-quantum-4bit-v3.1.mv2') as loader:
    # List all tensors
    tensors = loader.list_tensors()
    print(f"Model has {len(tensors)} tensors")
    
    # Get specific tensor
    tensor_data = loader.get_tensor('model_layers_0_self_attn_q_proj_weight')
    print(f"Tensor size: {len(tensor_data)} bytes")

Technical Details

TQ4_1S Quantization

TQ4_1S (TurboQuant 4-bit with 1-bit signs) achieves high compression through:

  • Walsh-Hadamard Transform: Preconditioning for better quantization
  • Lloyd-Max Quantization: Optimized 4-bit centroids per block
  • Packed Signs: 1-bit signs stored separately
  • Block-wise Scales: Per-256-element scale factors

MV2 v3.1 Binary Format

[0:6]     Magic: "OKLOMV"
[6:8]     Format Type: 0x0003 (Quantum Model)
[8:12]    Version: 0x00030001 (3.1)
[12:16]   Header Size (uint32)
[16:18]   Feature Flags:
              Bit 0: zstd compression
              Bit 1: Ed25519 signature
              Bit 2: Delta update support
              Bit 3: Memory-mapped layout
[18:20]   Compression Level: 0-22
[20:84]   Signature (64 bytes, if signed)
[84:N]    JSON Header
[N:]      Binary Data (compressed tensors)

Tensor Structure

The model contains 290 quantized tensors:

  • 24 transformer layers
  • Each layer: Q, K, V, O projections + MLP (gate, up, down)
  • Layer norms and embeddings
  • LM head

Performance

Metric FP16 TQ4_1S Reduction
Model Size ~1 GB ~184 MB 5.4x
Load Time ~2s ~0.5s 4x
VRAM Usage ~1 GB ~250 MB 4x
Quality Loss - <1% perplexity Minimal

Oklo-Reactor Ecosystem

This model is part of the Oklo-Reactor project - a self-contained LLM interface with:

  • Quantum-state quantization: Bespoke 4-bit compression
  • Micromodel LoRA adapters: ~10-50MB skill capsules
  • Self-distillation: Knowledge absorption into base model
  • Skill System: Hot-swap adapters at runtime
  • Memory Capsules: .mv2 format with full-text search
  • Edge Deployment: ESP32 with 3-bit LogQuant

Related MV2 Formats

Format Type Extension Purpose
Dataset .mv2 Training data (memvid format)
Skill .mv2 LoRA adapter capsules
Quantum Model .mv2 Base quantized models (this file)
Delta Patch .mv2 Incremental model updates

Verification

Check File Integrity

# Show MV2 file information
python3 -c "
from oklo_skill.mv2_format_ext import QuantumMV2LoaderExt

with QuantumMV2LoaderExt('Qwen2.5B-quantum-4bit-v3.1.mv2') as loader:
    print(f'Version: {loader.header.version}')
    print(f'Features: {loader.header.features}')
    print(f'Tensors: {len(loader._tensor_index)}')
    print(f'Compression: {loader.get_compression_ratio():.2f}x')
"

Verify Signature

python3 oklo-skill/python/oklo_skill/mv2_format_ext.py info \
    Qwen2.5B-quantum-4bit-v3.1.mv2 --verify

Citation

@software{oklo_reactor,
  title = {Oklo-Reactor: Self-Contained LLM Interface with Quantum-State Quantization},
  author = {bbearforever},
  year = {2026},
  url = {https://github.com/bbearforever/oklo-reactor}
}

License

Apache-2.0

Acknowledgments

  • Base model: Qwen2.5-0.5B-Instruct by Alibaba Cloud
  • Quantization: TQ4_1S format developed for Oklo-Reactor
  • Format: MV2 unified binary format
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bbearforever/quantum-Qwen2.5-4-bit

Finetuned
(670)
this model