Qwen2.5B Quantum 4-bit (MV2 Format)

This is the quantum-quantized 4-bit version of Qwen2.5-0.5B-Instruct, optimized for the Oklo-Reactor inference system.

Model Details

Attribute	Value
Base Model	Qwen/Qwen2.5-0.5B-Instruct
Parameters	~0.5B
Quantization	TQ4_1S (TurboQuant 4-bit with 1-bit signs)
Format	MV2 (Model/ModelVariant 2) v3.1
Features	zstd compression, Ed25519 signed
Size	~184 MB (compressed) / ~243 MB (uncompressed)
Bits per Weight	~4.3 BPW
Compression Ratio	1.30x vs raw TQ4_1S

What is MV2 Format?

MV2 (Model/ModelVariant 2) is a unified binary format developed for the Oklo-Reactor ecosystem. It provides:

Single-file distribution: All model components in one atomic file
Streaming compression: zstd compression reduces size by ~30%
Cryptographic signatures: Ed25519 signatures for authenticity verification
Memory-mapped loading: Zero-copy tensor access for instant startup
Self-contained: Includes config, tokenizer, and all tensors

MV2 Format Versions

Version	Features	File Extension
v3.0	Basic format	`.mv2`
v3.1	+ Compression, Signatures, Delta updates	`.mv2`

Files in this Repository

Primary File (Recommended)

File	Size	Description
`Qwen2.5B-quantum-4bit-v3.1.mv2`	~184 MB	Complete model with compression and signatures

Extracted Components (For Reference)

These files are included for users who need individual components. They can be extracted from the MV2 file using:

python3 oklo-skill/python/oklo_skill/mv2_format_ext.py extract \
    Qwen2.5B-quantum-4bit-v3.1.mv2 ./extracted/

File	Size	Description
`config_flattened.json`	~1 KB	Model architecture configuration
`tokenizer.json`	~6.7 MB	Tokenizer vocabulary and merges
`tokenizer_config.json`	~7 KB	Tokenizer settings
`quantum_metadata.json`	~100 KB	Tensor metadata and index

Example Files

File	Size	Purpose
`examples/skill-capsule.mv2`	~68 MB	Example LoRA skill capsule (futurecoder training)
`examples/dataset.mv2`	(coming soon)	Example training dataset

Usage

Using with Oklo-Reactor

# Download the model
huggingface-cli download bbearforever/quantum-Qwen2.5-4-bit \
    Qwen2.5B-quantum-4bit-v3.1.mv2 \
    --local-dir ./models

# Start Oklo-Server
cargo run --release -p oklo-server

# The server will auto-detect and load the MV2 file

Using with Python (oklo-skill)

from oklo_skill import SkillRuntime

# Load base model from MV2
runtime = SkillRuntime(
    base_model="./models/Qwen2.5B-quantum-4bit-v3.1.mv2",
    cache_size=4
)

# Run inference
result = runtime.infer(
    "Explain quantum computing in simple terms",
    temperature=0.7,
    max_tokens=500
)
print(result.response)

Extracting MV2 to Directory

# Extract if you need individual files
python3 -c "
from oklo_skill.mv2_format_ext import QuantumMV2LoaderExt
import sys

with QuantumMV2LoaderExt('Qwen2.5B-quantum-4bit-v3.1.mv2') as loader:
    loader.extract_to_directory('./extracted')
    print('Extracted to ./extracted')
"

Loading with Direct Tensor Access

from oklo_skill.mv2_format_ext import QuantumMV2LoaderExt

with QuantumMV2LoaderExt('Qwen2.5B-quantum-4bit-v3.1.mv2') as loader:
    # List all tensors
    tensors = loader.list_tensors()
    print(f"Model has {len(tensors)} tensors")
    
    # Get specific tensor
    tensor_data = loader.get_tensor('model_layers_0_self_attn_q_proj_weight')
    print(f"Tensor size: {len(tensor_data)} bytes")

Technical Details

TQ4_1S Quantization

TQ4_1S (TurboQuant 4-bit with 1-bit signs) achieves high compression through:

Walsh-Hadamard Transform: Preconditioning for better quantization
Lloyd-Max Quantization: Optimized 4-bit centroids per block
Packed Signs: 1-bit signs stored separately
Block-wise Scales: Per-256-element scale factors

MV2 v3.1 Binary Format

[0:6]     Magic: "OKLOMV"
[6:8]     Format Type: 0x0003 (Quantum Model)
[8:12]    Version: 0x00030001 (3.1)
[12:16]   Header Size (uint32)
[16:18]   Feature Flags:
              Bit 0: zstd compression
              Bit 1: Ed25519 signature
              Bit 2: Delta update support
              Bit 3: Memory-mapped layout
[18:20]   Compression Level: 0-22
[20:84]   Signature (64 bytes, if signed)
[84:N]    JSON Header
[N:]      Binary Data (compressed tensors)

Tensor Structure

The model contains 290 quantized tensors:

24 transformer layers
Each layer: Q, K, V, O projections + MLP (gate, up, down)
Layer norms and embeddings
LM head

Performance

Metric	FP16	TQ4_1S	Reduction
Model Size	~1 GB	~184 MB	5.4x
Load Time	~2s	~0.5s	4x
VRAM Usage	~1 GB	~250 MB	4x
Quality Loss	-	<1% perplexity	Minimal

Oklo-Reactor Ecosystem

This model is part of the Oklo-Reactor project - a self-contained LLM interface with:

Quantum-state quantization: Bespoke 4-bit compression
Micromodel LoRA adapters: ~10-50MB skill capsules
Self-distillation: Knowledge absorption into base model
Skill System: Hot-swap adapters at runtime
Memory Capsules: .mv2 format with full-text search
Edge Deployment: ESP32 with 3-bit LogQuant

Related MV2 Formats

Format Type	Extension	Purpose
Dataset	`.mv2`	Training data (memvid format)
Skill	`.mv2`	LoRA adapter capsules
Quantum Model	`.mv2`	Base quantized models (this file)
Delta Patch	`.mv2`	Incremental model updates

Verification

Check File Integrity

# Show MV2 file information
python3 -c "
from oklo_skill.mv2_format_ext import QuantumMV2LoaderExt

with QuantumMV2LoaderExt('Qwen2.5B-quantum-4bit-v3.1.mv2') as loader:
    print(f'Version: {loader.header.version}')
    print(f'Features: {loader.header.features}')
    print(f'Tensors: {len(loader._tensor_index)}')
    print(f'Compression: {loader.get_compression_ratio():.2f}x')
"

Verify Signature

python3 oklo-skill/python/oklo_skill/mv2_format_ext.py info \
    Qwen2.5B-quantum-4bit-v3.1.mv2 --verify

Citation

@software{oklo_reactor,
  title = {Oklo-Reactor: Self-Contained LLM Interface with Quantum-State Quantization},
  author = {bbearforever},
  year = {2026},
  url = {https://github.com/bbearforever/oklo-reactor}
}

License

Apache-2.0

Acknowledgments

Base model: Qwen2.5-0.5B-Instruct by Alibaba Cloud
Quantization: TQ4_1S format developed for Oklo-Reactor
Format: MV2 unified binary format

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for bbearforever/quantum-Qwen2.5-4-bit

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Finetuned

(670)

this model