Outlier-10B-V3.2
Ternary mixture-of-experts overlay on Qwen/Qwen2.5-7B-Instruct. 23B total effective parameters, 10B active per forward pass.
TL;DR
- Architecture: Outlier ternary MoE overlay on frozen Qwen 2.5 7B base
- Parameters: 23B total, 10B active per forward (sparse routing)
- MMLU: ~76% —
[CLAIM] - License: Apache 2.0
Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
name = "Outlier-Ai/Outlier-10B-V3.2"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(
name, trust_remote_code=True, torch_dtype="auto"
)
prompt = tok.apply_chat_template(
[{"role": "user", "content": "What is the capital of France?"}],
tokenize=False, add_generation_prompt=True,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
print(tok.decode(model.generate(**inputs, max_new_tokens=200)[0]))
For consumer Apple Silicon inference use MLX or GGUF tiers:
- Outlier-Ai/Outlier-Lite-7B-MLX-4bit
- Outlier-Ai/Outlier-Compact-14B-MLX-4bit
- Outlier-Ai/Outlier-Max-32B-GGUF
Benchmarks
| Metric | Value | Provenance |
|---|---|---|
| MMLU | ~76% | [CLAIM] — historical smoke test only (limit=570, n=12,498 not 14,042). Full-sample re-verification pending. |
Rule 66 provenance labels:
[VERIFIED]— full source JSON withconfig.limit=None, n-samples complete,model_argspresent, reproducible from commit SHA.[INCOMPLETE]— number exists on disk but provenance fields are stripped; cannot be cited publicly.[CLAIM]— historical smoke-test value pending full re-verification on cluster.[PENDING]— benchmark scheduled; results expected by a specific date.
Notes
Known issue: config.json references Qwen2MoEForCausalLM in auto_map but modeling_outlier_moe.py defines OutlierMoEForCausalLM. Load with trust_remote_code=True or use Outlier-70B-V3.2 for production.
Architecture
- Base backbone: Qwen/Qwen2.5-7B-Instruct (frozen during distillation)
- MoE overlay: ternary delta experts ({-1, 0, +1} + per-row fp16 scale) with top-K routing
- Expert layers: varies by variant
- Experts per layer: 8 routed + 1 shared
- Top-k routing: 2
- Context: inherits Qwen 2.5's 32,768 tokens
- Expert paging: three-tier memory (SRAM / DRAM / NVMe) on 70B+
Ternary-weight arithmetic ({-1, 0, +1}) reduces a matmul to a stream of additions and subtractions — no multiplications — enabling consumer hardware to run flagship-scale models at usable speeds.
Patents filed
Three provisional patents filed April 2026 (61 claims total) covering ternary MoE weight composition, expert paging, and specialist merging techniques.
Known limitations
- Calibration + full-sample MMLU re-verification is queued for cluster time; numbers labeled
[CLAIM]below are historical smoke-test values awaiting verification. - Outlier's ternary MoE overlay is research-grade — use the consumer tier (Nano / Lite / Compact / Max) for production local-inference.
- Qwen 2.5 tokenizer + chat template apply; no custom tokenizer.
- English-tuned. Multilingual performance inherits the base model and is not separately optimized.
See also
- V3.2 family:
- Outlier-10B-V3.2
- Outlier-40B-V3.2 —
[VERIFIED]MMLU 77.80% - Outlier-70B-V3.2 —
[VERIFIED]MMLU 81.49% - Outlier-150B-V3.2
- V3.3 preview:
Citation
@misc{outlier2026,
author = {Kerr, Matt},
title = {Outlier: Ternary Mixture-of-Experts for Consumer Hardware},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Outlier-Ai}}
}
Links
- Website: https://outlier.host
- GitHub: https://github.com/Outlier-host/Outlier
- All models: https://huggingface.co/Outlier-Ai
- Consumer Edition collection: https://huggingface.co/collections/Outlier-Ai/outlier-consumer-edition-69e2fb4a0df119ea1747275e
- Server V3.2 collection: https://huggingface.co/collections/Outlier-Ai/outlier-server-v32-69e2fb4b71984614b3c7a4a3
- Research collection: https://huggingface.co/collections/Outlier-Ai/outlier-research-69e2fb3a71984614b3c7a279
- Downloads last month
- 983