Qwen2.5B Quantum 4-bit (MV2 Format)
This is the quantum-quantized 4-bit version of Qwen2.5-0.5B-Instruct, optimized for the Oklo-Reactor inference system.
Model Details
| Attribute | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-0.5B-Instruct |
| Parameters | ~0.5B |
| Quantization | TQ4_1S (TurboQuant 4-bit with 1-bit signs) |
| Format | MV2 (Model/ModelVariant 2) v3.1 |
| Features | zstd compression, Ed25519 signed |
| Size | ~184 MB (compressed) / ~243 MB (uncompressed) |
| Bits per Weight | ~4.3 BPW |
| Compression Ratio | 1.30x vs raw TQ4_1S |
What is MV2 Format?
MV2 (Model/ModelVariant 2) is a unified binary format developed for the Oklo-Reactor ecosystem. It provides:
- Single-file distribution: All model components in one atomic file
- Streaming compression: zstd compression reduces size by ~30%
- Cryptographic signatures: Ed25519 signatures for authenticity verification
- Memory-mapped loading: Zero-copy tensor access for instant startup
- Self-contained: Includes config, tokenizer, and all tensors
MV2 Format Versions
| Version | Features | File Extension |
|---|---|---|
| v3.0 | Basic format | .mv2 |
| v3.1 | + Compression, Signatures, Delta updates | .mv2 |
Files in this Repository
Primary File (Recommended)
| File | Size | Description |
|---|---|---|
Qwen2.5B-quantum-4bit-v3.1.mv2 |
~184 MB | Complete model with compression and signatures |
Extracted Components (For Reference)
These files are included for users who need individual components. They can be extracted from the MV2 file using:
python3 oklo-skill/python/oklo_skill/mv2_format_ext.py extract \
Qwen2.5B-quantum-4bit-v3.1.mv2 ./extracted/
| File | Size | Description |
|---|---|---|
config_flattened.json |
~1 KB | Model architecture configuration |
tokenizer.json |
~6.7 MB | Tokenizer vocabulary and merges |
tokenizer_config.json |
~7 KB | Tokenizer settings |
quantum_metadata.json |
~100 KB | Tensor metadata and index |
Example Files
| File | Size | Purpose |
|---|---|---|
examples/skill-capsule.mv2 |
~68 MB | Example LoRA skill capsule (futurecoder training) |
examples/dataset.mv2 |
(coming soon) | Example training dataset |
Usage
Using with Oklo-Reactor
# Download the model
huggingface-cli download bbearforever/quantum-Qwen2.5-4-bit \
Qwen2.5B-quantum-4bit-v3.1.mv2 \
--local-dir ./models
# Start Oklo-Server
cargo run --release -p oklo-server
# The server will auto-detect and load the MV2 file
Using with Python (oklo-skill)
from oklo_skill import SkillRuntime
# Load base model from MV2
runtime = SkillRuntime(
base_model="./models/Qwen2.5B-quantum-4bit-v3.1.mv2",
cache_size=4
)
# Run inference
result = runtime.infer(
"Explain quantum computing in simple terms",
temperature=0.7,
max_tokens=500
)
print(result.response)
Extracting MV2 to Directory
# Extract if you need individual files
python3 -c "
from oklo_skill.mv2_format_ext import QuantumMV2LoaderExt
import sys
with QuantumMV2LoaderExt('Qwen2.5B-quantum-4bit-v3.1.mv2') as loader:
loader.extract_to_directory('./extracted')
print('Extracted to ./extracted')
"
Loading with Direct Tensor Access
from oklo_skill.mv2_format_ext import QuantumMV2LoaderExt
with QuantumMV2LoaderExt('Qwen2.5B-quantum-4bit-v3.1.mv2') as loader:
# List all tensors
tensors = loader.list_tensors()
print(f"Model has {len(tensors)} tensors")
# Get specific tensor
tensor_data = loader.get_tensor('model_layers_0_self_attn_q_proj_weight')
print(f"Tensor size: {len(tensor_data)} bytes")
Technical Details
TQ4_1S Quantization
TQ4_1S (TurboQuant 4-bit with 1-bit signs) achieves high compression through:
- Walsh-Hadamard Transform: Preconditioning for better quantization
- Lloyd-Max Quantization: Optimized 4-bit centroids per block
- Packed Signs: 1-bit signs stored separately
- Block-wise Scales: Per-256-element scale factors
MV2 v3.1 Binary Format
[0:6] Magic: "OKLOMV"
[6:8] Format Type: 0x0003 (Quantum Model)
[8:12] Version: 0x00030001 (3.1)
[12:16] Header Size (uint32)
[16:18] Feature Flags:
Bit 0: zstd compression
Bit 1: Ed25519 signature
Bit 2: Delta update support
Bit 3: Memory-mapped layout
[18:20] Compression Level: 0-22
[20:84] Signature (64 bytes, if signed)
[84:N] JSON Header
[N:] Binary Data (compressed tensors)
Tensor Structure
The model contains 290 quantized tensors:
- 24 transformer layers
- Each layer: Q, K, V, O projections + MLP (gate, up, down)
- Layer norms and embeddings
- LM head
Performance
| Metric | FP16 | TQ4_1S | Reduction |
|---|---|---|---|
| Model Size | ~1 GB | ~184 MB | 5.4x |
| Load Time | ~2s | ~0.5s | 4x |
| VRAM Usage | ~1 GB | ~250 MB | 4x |
| Quality Loss | - | <1% perplexity | Minimal |
Oklo-Reactor Ecosystem
This model is part of the Oklo-Reactor project - a self-contained LLM interface with:
- Quantum-state quantization: Bespoke 4-bit compression
- Micromodel LoRA adapters: ~10-50MB skill capsules
- Self-distillation: Knowledge absorption into base model
- Skill System: Hot-swap adapters at runtime
- Memory Capsules:
.mv2format with full-text search - Edge Deployment: ESP32 with 3-bit LogQuant
Related MV2 Formats
| Format Type | Extension | Purpose |
|---|---|---|
| Dataset | .mv2 |
Training data (memvid format) |
| Skill | .mv2 |
LoRA adapter capsules |
| Quantum Model | .mv2 |
Base quantized models (this file) |
| Delta Patch | .mv2 |
Incremental model updates |
Verification
Check File Integrity
# Show MV2 file information
python3 -c "
from oklo_skill.mv2_format_ext import QuantumMV2LoaderExt
with QuantumMV2LoaderExt('Qwen2.5B-quantum-4bit-v3.1.mv2') as loader:
print(f'Version: {loader.header.version}')
print(f'Features: {loader.header.features}')
print(f'Tensors: {len(loader._tensor_index)}')
print(f'Compression: {loader.get_compression_ratio():.2f}x')
"
Verify Signature
python3 oklo-skill/python/oklo_skill/mv2_format_ext.py info \
Qwen2.5B-quantum-4bit-v3.1.mv2 --verify
Citation
@software{oklo_reactor,
title = {Oklo-Reactor: Self-Contained LLM Interface with Quantum-State Quantization},
author = {bbearforever},
year = {2026},
url = {https://github.com/bbearforever/oklo-reactor}
}
License
Apache-2.0
Acknowledgments
- Base model: Qwen2.5-0.5B-Instruct by Alibaba Cloud
- Quantization: TQ4_1S format developed for Oklo-Reactor
- Format: MV2 unified binary format