Outlier-10B-V3.2

Ternary mixture-of-experts overlay on Qwen/Qwen2.5-7B-Instruct. 23B total effective parameters, 10B active per forward pass.

TL;DR

  • Architecture: Outlier ternary MoE overlay on frozen Qwen 2.5 7B base
  • Parameters: 23B total, 10B active per forward (sparse routing)
  • MMLU: ~76% — [CLAIM]
  • License: Apache 2.0

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer

name = "Outlier-Ai/Outlier-10B-V3.2"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(
    name, trust_remote_code=True, torch_dtype="auto"
)

prompt = tok.apply_chat_template(
    [{"role": "user", "content": "What is the capital of France?"}],
    tokenize=False, add_generation_prompt=True,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
print(tok.decode(model.generate(**inputs, max_new_tokens=200)[0]))

For consumer Apple Silicon inference use MLX or GGUF tiers:

Benchmarks

Metric Value Provenance
MMLU ~76% [CLAIM] — historical smoke test only (limit=570, n=12,498 not 14,042). Full-sample re-verification pending.

Rule 66 provenance labels:

  • [VERIFIED] — full source JSON with config.limit=None, n-samples complete, model_args present, reproducible from commit SHA.
  • [INCOMPLETE] — number exists on disk but provenance fields are stripped; cannot be cited publicly.
  • [CLAIM] — historical smoke-test value pending full re-verification on cluster.
  • [PENDING] — benchmark scheduled; results expected by a specific date.

Notes

Known issue: config.json references Qwen2MoEForCausalLM in auto_map but modeling_outlier_moe.py defines OutlierMoEForCausalLM. Load with trust_remote_code=True or use Outlier-70B-V3.2 for production.

Architecture

  • Base backbone: Qwen/Qwen2.5-7B-Instruct (frozen during distillation)
  • MoE overlay: ternary delta experts ({-1, 0, +1} + per-row fp16 scale) with top-K routing
  • Expert layers: varies by variant
  • Experts per layer: 8 routed + 1 shared
  • Top-k routing: 2
  • Context: inherits Qwen 2.5's 32,768 tokens
  • Expert paging: three-tier memory (SRAM / DRAM / NVMe) on 70B+

Ternary-weight arithmetic ({-1, 0, +1}) reduces a matmul to a stream of additions and subtractions — no multiplications — enabling consumer hardware to run flagship-scale models at usable speeds.

Patents filed

Three provisional patents filed April 2026 (61 claims total) covering ternary MoE weight composition, expert paging, and specialist merging techniques.

Known limitations

  • Calibration + full-sample MMLU re-verification is queued for cluster time; numbers labeled [CLAIM] below are historical smoke-test values awaiting verification.
  • Outlier's ternary MoE overlay is research-grade — use the consumer tier (Nano / Lite / Compact / Max) for production local-inference.
  • Qwen 2.5 tokenizer + chat template apply; no custom tokenizer.
  • English-tuned. Multilingual performance inherits the base model and is not separately optimized.

See also

Citation

@misc{outlier2026,
  author       = {Kerr, Matt},
  title        = {Outlier: Ternary Mixture-of-Experts for Consumer Hardware},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Outlier-Ai}}
}

Links

Downloads last month
983
Safetensors
Model size
23B params
Tensor type
F16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Outlier-Ai/Outlier-10B-V3.2

Base model

Qwen/Qwen2.5-7B
Adapter
(1769)
this model

Collection including Outlier-Ai/Outlier-10B-V3.2