YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ROLV Primitive©
Sparse matrix operator for Mixture-of-Experts AI inference.
5–103× faster than cuBLAS/MKL. Up to 99% energy reduction. Bit-identical outputs.
Test any HuggingFace MoE model — no upload required.
What is this
Modern frontier AI models — DeepSeek-V3, Llama-4, Kimi-K2, Qwen3, Mixtral — use
Mixture-of-Experts (MoE) architecture. Each token activates only a small fraction
of the model's experts (typically 8 of 256 in DeepSeek-V3). The inactive experts
produce zero outputs. Standard libraries — NVIDIA cuBLAS, Intel MKL, cuSPARSE —
multiply those zeros anyway.
ROLV Primitive© skips them. Outputs are mathematically identical. The speedup is real.
The INT_MAX finding
cuSPARSE cannot benchmark the full DeepSeek-V3 or Kimi-K2 stacked expert matrix.
The matrix is 256 experts × 2048 × 7168 = 3,758,489,600 elements, which exceeds
INT_MAX (2,147,483,647). cuSPARSE overflows silently and returns a submatrix result.
Every published cuSPARSE benchmark on these models is reporting a fraction of the
full computation. ROLV Primitive© handles the full matrix natively.
Published finding: doi.org/10.5281/zenodo.19221455
Verified results 482 SHA-256 verified cases on real downloaded model weights, 7 hardware platforms. Independent validation by the University of Miami Frost Institute for Data Science and Computing is currently underway. Model Layer Sparsity vs dense vs cuSPARSE DeepSeek-V3 gate_proj 87.5% 5.58× overflow* Llama-4-Scout gate_proj 92.2% 9.54× 103× Kimi-K2 gate_proj 93.8% 8.97× overflow* Mixtral-8×22B gate_proj 87.5% 5.39× 109× Mixtral-8×7B gate_proj 87.5% 5.21× 76× OLMoE-1B-7B gate_proj 87.5% 5.58× confirmed OLMoE-1B-7B up_proj 87.5% 5.74× confirmed OLMoE-1B-7B down_proj 87.5% 5.67× confirmed Peak (99% sparsity) down_proj 99% 46× — CPU (Intel i7, Windows, 8 threads): 28/28 PASS, ATOL=0.0000, all real weights. ARM (Google Axion): 5.12× vs MKL confirmed. cuSPARSE INT_MAX overflow — see finding above.
Quick start
Step 1 — Download the wheel for your platform
From Releases:
Platform Python Wheel filename
Windows 64-bit 3.13 rolvprimitive-1.0.0-cp313-none-win_amd64.whl
Windows 64-bit 3.11 rolvprimitive-1.0.0-cp311-none-win_amd64.whl
Linux x86_64 3.12 rolvprimitive-1.0.0-cp312-cp312-linux_x86_64.whl
Any / Anaconda any rolvprimitive-1.0.0-py3-none-any.whl
Step 2 — Install
pip install rolvprimitive-1.0.0-cp313-none-win_amd64.whl # Windows py3.13
pip install rolvprimitive-1.0.0-cp312-cp312-linux_x86_64.whl # Linux py3.12
pip install rolvprimitive-1.0.0-py3-none-any.whl # Anaconda / any
Step 3 — Run the benchmark The script downloads model weights directly from HuggingFace to your own machine. Nothing is uploaded. You benchmark on your own hardware.
pip install torch scipy psutil transformers accelerate huggingface_hub einops tqdm
# DeepSeek-V3 shapes — no download, uses real dimensions (fastest start):
python scripts/benchmark.py --model deepseek-shapes
# OLMoE real weights — ~7 GB download, CPU or GPU:
python scripts/benchmark.py --model olmoe
# Mixtral-8x7B — ~26 GB, GPU recommended:
python scripts/benchmark.py --model mixtral-8x7b
# Any HuggingFace MoE model by ID:
python scripts/benchmark.py --model mistralai/Mixtral-8x22B-v0.1
# CPU only:
python scripts/benchmark.py --model olmoe --device cpu
# Custom iterations and batch size:
python scripts/benchmark.py --model olmoe --iterations 2000 --batch 2000
# Multiple models:
python scripts/benchmark.py --model deepseek-shapes,olmoe
Available model shortcuts
Shortcut Model Download
deepseek-shapes DeepSeek-V3 real dimensions (synthetic weights) none
olmoe allenai/OLMoE-1B-7B-0924 ~7 GB
mixtral-8x7b mistralai/Mixtral-8x7B-v0.1 ~26 GB
mixtral-8x22b mistralai/Mixtral-8x22B-v0.1 ~87 GB
phi35moe microsoft/Phi-3.5-MoE-instruct ~16 GB
deepseek-moe deepseek-ai/deepseek-moe-16b-base ~32 GB
qwen2moe Qwen/Qwen1.5-MoE-A2.7B ~6 GB
jamba ai21labs/Jamba-1.5-Mini ~24 GB
auto DeepSeek shapes + OLMoE ~7 GB
any HF model ID e.g. mistralai/Mistral-7B-v0.1 varies
What the benchmark measures
Per ROLV Benchmark Harness Prerequisites & Standards v2.0:
Hardware detection banner — CPU, GPU, RAM, VRAM, backend, energy source
4 SHA-256 hashes per case — W, X, baseline output, ROLV output
4 error metrics — max/mean absolute and relative error (raw FP32)
ATOL correctness check — column-normalised, threshold 0.05
Perturbation test — proves live computation, not a cached result
Speed — build_ms, ms/iter, speedup× and %, vs both dense and sparse
Energy — joules and watts via pynvml (NVIDIA), pyrsmi (AMD), or proxy
FLOPs reduction, tok/s gain, TTFT reduction — all vs vendor baseline
ROLVswitch™ strategy printed per case
RSMT™ threshold printed per case
CSV output — all results saved to rolv_results.csv
Disk cleanup after each model — no disk exhaustion
Use in your own code
import torch
from rolvprimitive import ROLVHybrid
# Your MoE expert weight matrix — any source
W = your_expert_weight # shape: (out_features, in_features)
# Build once at model load time — ROLVswitch™ auto-selects strategy
op = ROLVHybrid(W, batch=1000)
print(op._strategy) # see which path was selected
# Use at inference time
out = op.apply(X) # X: (batch, in_features) — identical to W @ X.T
Post your results Run the benchmark and share your output. Include your hardware, model, and numbers. Reddit: r/LocalLLaMA · r/MachineLearning HuggingFace: huggingface.co/rolv-ai GitHub Discussions: open a thread here
Citation
bibtex @misc{heggenhougen2026rolv, title = {ROLV Primitive: A Sparse Matrix Operator for Mixture-of-Experts Inference}, author = {Heggenhougen, Rolv Eitrem}, year = {2026}, doi = {10.5281/zenodo.19221455}, url = {https://doi.org/10.5281/zenodo.19221455} }
License Free for personal and research use. Commercial use requires a license. Commercial use includes inference APIs, cloud services, enterprise software, and any business deployment. Commercial licensing: rolv@rolv.ai | rolv.ai
ROLV LLC · 445 NE 12th Ave · Fort Lauderdale FL 33301 ROLV Primitive© · RSMT™ · ROLVswitch™ · 3 Patents Pending Copyright © 2025-2026 ROLV LLC · All rights reserved rolv@rolv.ai · rolv.ai