Outlier-70B-V3.3
Ternary MoE overlay on Qwen/Qwen2.5-32B-Instruct. 68B total parameters, 32B active per forward pass.
Status: [VERIFIED] Β· Role: research / server-scale Β· Shipping Mac tier? No β use Lite / Compact / Max in the Outlier desktop app for local inference on your Mac.
What it is
Outlier is an Apple-of-local-AI platform. The shipping desktop app runs curated Qwen tiers today; on the research side we train ternary mixture-of-experts overlays on top of frozen Qwen bases to push MMLU-per-GB at larger scales. This repo holds the research checkpoint β the one used for the numbers below.
Three feature bullets:
- Overlay on a frozen Qwen/Qwen2.5-32B-Instruct backbone β shared full-precision path acts as the quality anchor; ternary experts {β1, 0, +1} specialize by domain with per-row fp16 scales
- Alpha-fix refinement β 32B-active MoE routing with learned per-expert scalar gates (15 KB overlay recovered V4 regressions + added +1.61pp on 70B)
- Apache 2.0 β weights, code, and distributed runtimes are all Apache 2.0 throughout the chain
Quickstart
from transformers import AutoModelForCausalLM, AutoTokenizer
name = "Outlier-Ai/Outlier-70B-V3.3"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(
name,
trust_remote_code=True,
torch_dtype="auto",
)
prompt = tok.apply_chat_template(
[{"role": "user", "content": "What is the capital of France?"}],
tokenize=False,
add_generation_prompt=True,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=200)
print(tok.decode(out[0], skip_special_tokens=True))
For Apple Silicon local inference, use the shipping MLX tiers instead:
- Outlier-Lite-7B-MLX-4bit β Qwen 2.5 7B AWQ, 71.30 tok/s / 4.47 GB on M1 Ultra
- Outlier-Compact-14B-MLX-4bit β Qwen 2.5 14B AWQ, 37.26 tok/s / 8.24 GB on M1 Ultra
Benchmarks (MMLU)
| Metric | Value | n | Stderr | Harness | Date | Status |
|---|---|---|---|---|---|---|
| MMLU (this model) | 83.10% | 14042 | ~0.30% | v0.4.9.1 | 2026-04-13 | [VERIFIED] |
MMLU vs. base Qwen (honest comparison):
| Outlier | Base Qwen | Delta |
|---|---|---|
| 83.10% (70B V3.3) | Qwen 2.5 32B β 83.3% | β0.20pp (tied) |
Read: Outlier MoE overlays underperform or tie base Qwen on raw MMLU. The product thesis is MMLU per GB of RAM, not raw MMLU β see GROUND_TRUTH v12 Β§2.6.
Provenance labels (Rule 66):
[VERIFIED]β full source JSON,config.limit=None, complete n-samples,model_argspresent, reproducible from commit SHA[SUPERSEDED YYYY-MM-DD]β replaced by newer measurement; retained for audit[INCOMPLETE]β number exists on disk but provenance fields are stripped[CLAIM]β reported but not independently confirmed
Secondary benchmarks (cloud, Day 13 [VERIFIED])
| Model | HellaSwag | ARC-C | ARC-E | Winogrande | TruthfulQA |
|---|---|---|---|---|---|
| 70B V3.3 | 85.95% | 73.46% | 91.62% | 81.29% | 67.12% |
Harness: lm-evaluation-harness v0.4.9.1, n=14,042. Source: ~/v4_cloud_sprint_day13/sprint003_artifacts/results/.
Architecture
- Base backbone: Qwen/Qwen2.5-32B-Instruct β frozen during distillation
- Overlay: ternary delta experts ({β1, 0, +1} + per-row fp16 scale), top-k routing
- Experts per layer: 8 routed + 1 shared, top-k = 2
- Context: inherits Qwen/Qwen2.5's 32,768 tokens
- Total / active params: 68B / 32B
- Alpha-fix overlay: 280 per-expert scalar gates, 18 min on one B200, +1.61pp MMLU on 70B (V3.2 81.49% β V3.3 83.10%). 15 KB overlay file.
Ternary arithmetic reduces a matmul to a stream of additions and subtractions β no multiplications β which is what makes overlays at this scale feasible to run outside a datacenter (once the ssd_stream engine is wired; that's a v1.5+ sprint, tracked in the registry).
What we are not claiming
- We do NOT match frontier cloud models (GPT-5, Claude Opus 4.7, Gemini 3 Pro) on pure MMLU
- We are NOT beating base Qwen on raw MMLU at most scales (we tie 70B, regress 10B / 40B / 150B)
- We are NOT currently shipping MoE tiers in the desktop app β the app ships curated Qwen (Nano / Lite / Compact / Max). MoE tiers ship when the
ssd_streamengine is wired + the scale beats Qwen at the same RAM budget - We are NOT claiming "Outlier-branded MMLU" on raw Qwen shipping tiers β the numbers above apply to THIS overlay checkpoint only (Rule 138)
Known limitations
- Overlay weights load via
transformerswithtrust_remote_code=Trueβ outlier-engine is the reference runtime (separate package) - Chat template: inherits Qwen/Qwen2.5-32B-Instruct (no custom tokenizer)
- English-tuned. Multilingual behavior inherits the base model; not separately optimized
- Server-scale MoE path requires
ssd_streampaging to fit on consumer RAM β NOT yet wired ([EXISTS BUT UNWIRED]per GROUND_TRUTH v12 Β§11) - Expert paging + AWQ base quantization land in v1.5 engine sprint
Patents filed
Three U.S. provisional patents filed April 2026 (61 claims total):
- Ternary MoE weight composition on frozen bases (
#64/026,886) - Expert paging + memory hierarchy (
#64/030,368) - Specialist branch-train-mix merging for binary-weight experts (
#64/034,028)
Non-provisional deadline: April 3β9, 2027.
Citation
@misc{outlier2026,
author = {Kerr, Matt},
title = {Outlier: A Local AI Platform for Apple Silicon},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Outlier-Ai}}
}
License
Apache 2.0 throughout β base weights (Qwen team), overlay weights, and distributed runtime. See LICENSE.
Attribution
Base model by the Qwen team at Alibaba, released under Apache 2.0. Outlier adds MLX / GGUF / quantization work on top and distributes under the same license. All credit for capability belongs to the upstream Qwen team β we make it fast and easy to run on Mac.
- Qwen team: https://qwenlm.github.io/
- Qwen 2.5 release paper: https://arxiv.org/abs/2412.15115
Links
- Website: https://outlier.host
- Desktop app: download the latest DMG β one installer, four tiers, offline-first
- GitHub: https://github.com/Outlier-host/Outlier
- All Outlier models: https://huggingface.co/Outlier-Ai
- Consumer Edition collection: https://huggingface.co/collections/Outlier-Ai/outlier-consumer-edition-69e2fb4a0df119ea1747275e
- Research V3.x collection: https://huggingface.co/collections/Outlier-Ai/outlier-research-69e2fb3a71984614b3c7a279
- Server V3.2 collection: https://huggingface.co/collections/Outlier-Ai/outlier-server-v32-69e2fb4b71984614b3c7a4a3
- Downloads last month
- 233