Outlier-70B-V3.2

Ternary mixture-of-experts overlay on Qwen/Qwen2.5-32B-Instruct. 70B total effective parameters, 32B active per forward pass.

TL;DR

Architecture: Outlier ternary MoE overlay on frozen Qwen 2.5 32B base
Parameters: 70B total, 32B active per forward (sparse routing)
MMLU: 81.49% — [VERIFIED]
License: Apache 2.0

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer

name = "Outlier-Ai/Outlier-70B-V3.2"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(
    name, trust_remote_code=True, torch_dtype="auto"
)

prompt = tok.apply_chat_template(
    [{"role": "user", "content": "What is the capital of France?"}],
    tokenize=False, add_generation_prompt=True,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
print(tok.decode(model.generate(**inputs, max_new_tokens=200)[0]))

For consumer Apple Silicon inference use MLX or GGUF tiers:

Benchmarks

Metric	Value	Provenance
MMLU	81.49%	`[VERIFIED]` — full-sample, n=14,042, stderr 0.00314, source `cloud_sprint_day12/results/70b_v3_2_mmlu_full.json`, 62 task results, full provenance retained.

Rule 66 provenance labels:

[VERIFIED] — full source JSON with config.limit=None, n-samples complete, model_args present, reproducible from commit SHA.
[INCOMPLETE] — number exists on disk but provenance fields are stripped; cannot be cited publicly.
[CLAIM] — historical smoke-test value pending full re-verification on cluster.
[PENDING] — benchmark scheduled; results expected by a specific date.

Architecture

Base backbone: Qwen/Qwen2.5-32B-Instruct (frozen during distillation)
MoE overlay: ternary delta experts ({-1, 0, +1} + per-row fp16 scale) with top-K routing
Expert layers: varies by variant
Experts per layer: 8 routed + 1 shared
Top-k routing: 2
Context: inherits Qwen 2.5's 32,768 tokens
Expert paging: three-tier memory (SRAM / DRAM / NVMe) on 70B+

Ternary-weight arithmetic ({-1, 0, +1}) reduces a matmul to a stream of additions and subtractions — no multiplications — enabling consumer hardware to run flagship-scale models at usable speeds.

Patents filed

Three provisional patents filed April 2026 (61 claims total) covering ternary MoE weight composition, expert paging, and specialist merging techniques.

Known limitations

Calibration + full-sample MMLU re-verification is queued for cluster time; numbers labeled [CLAIM] below are historical smoke-test values awaiting verification.
Outlier's ternary MoE overlay is research-grade — use the consumer tier (Nano / Lite / Compact / Max) for production local-inference.
Qwen 2.5 tokenizer + chat template apply; no custom tokenizer.
English-tuned. Multilingual performance inherits the base model and is not separately optimized.

Citation

@misc{outlier2026,
  author       = {Kerr, Matt},
  title        = {Outlier: Ternary Mixture-of-Experts for Consumer Hardware},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Outlier-Ai}}
}

Model tree for Outlier-Ai/Outlier-70B-V3.2

Base model

Qwen/Qwen2.5-32B

Finetuned

Qwen/Qwen2.5-32B-Instruct

Adapter

(89)

this model

Collection including Outlier-Ai/Outlier-70B-V3.2

Outlier Server V3.2

Collection

V3.2 ternary MoE overlay on Qwen 2.5 server-class models. • 4 items • Updated 1 day ago

Outlier-Ai
/

Outlier-70B-V3.2