Outlier
AI & ML interests
None defined yet.
Recent Activity
Outlier
Ternary Mixture-of-Experts language models on consumer hardware.
Outlier is an open-source ternary-quantized Mixture-of-Experts runtime and a small family of models trained by one founder on a single GPU pipeline. Routed experts are stored in {-1, 0, +1} ternary at ~1.6 bits per weight. A frozen full-precision base model acts as the shared expert. Top-2 routing per layer. Apache 2.0.
Built solo in under 14 days on consumer hardware plus spot GPUs. Total compute spend under $700. Three U.S. provisional patents filed.
Honest benchmark status
We use lm-evaluation-harness v0.4.9.1, 5-shot MMLU, bfloat16. Every number on this card has a sample size, a stderr, and a status. When a number is preliminary, we say so. When a number can't be reproduced from a saved source file, we say that too.
| Model | MMLU | Sample size | Status |
|---|---|---|---|
| Outlier-10B V3.2 | 76.19% | n ≈ 13,693 (limit=570/subtask), stderr 0.0037 | Verified — backed by source JSON |
| Outlier-40B V3.2 | reserved | — | Re-running on cloud GPU, ~48h |
| Outlier-70B V3.2 | reserved | — | Re-running on cloud GPU, ~48h |
| Outlier-150B V3.2 | reserved | — | Re-running on cloud GPU, ~48h |
On the 40B / 70B / 150B numbers: A previous version of this card showed 77.82% / 81.46% / 84.49% MMLU at full sample. Those numbers came from a training-cluster session whose original eval JSON files were not preserved when the cluster was decommissioned. We chose not to publish numbers we cannot reproduce from a saved source file. The full-sample re-runs are running now. Verified numbers will appear here within 48 hours.
The 10B is the one we can defend in writing today. It is backed by v3_2_10b_moe_mmlu_2026-04-09T21-24-07.012991.json (preserved in our local backup), acc=0.7619, acc_stderr=0.00365, n≈13,693.
What actually exists today
- Open-source engine at github.com/mattkerr09/outlier-engine — Apache 2.0. Ternary MoE loader, three-tier paged cache, MPS + CPU backends, lm-eval compatible.
- First-party V3.2 inference verified on Apple Silicon — Outlier-10B V3.2 generates coherent text in paged mode on a 64 GB Mac Studio M1 Ultra (52 GB peak RSS, ~0.1 tok/s steady state). Non-paged 10B runs at ~13.5 tok/s on the same hardware.
- GPU-resident expert dequantization — A patched modeling file (modeling_outlier_150b_rexmoe.py) materializes ternary experts to bf16 at load time, achieving ~56× speedup vs the original CPU→GPU dequant path on a single B200. The 70B and 40B repos are pending the same patch — port in progress.
- Alive Model (TTT) module — 224 alpha scalars per user, profile switch in 0.259 ms, routing predictor at 99.7% on 10B. Quality lift on domain tasks is unvalidated pending an alpha-loader fix; we are not claiming it yet.
- Three U.S. provisional patents filed — #64/026,886 (April 3, 2026), #64/030,368 (April 6, 2026), #64/034,028 (April 9, 2026).
What we are not claiming
We do not match or beat frontier models on raw MMLU. Kimi K2.5, GLM-5, Claude Opus, Gemini 3 Pro, and GPT-5 class models all score higher than our 10B V3.2 on pure MMLU.
Our working hypothesis — being tested right now — is that ternary MoE with paged inference lets a consumer-hardware user run a larger-capacity model in less RAM than dense Q4 quantization at comparable quality. That is a claim about compression and scaling, not benchmark dominance. We will publish the comparison numbers as they land.
Status
Pre-launch. Public release planned for Tuesday, April 14, 2026. V3.2 model weights are private until then; the engine and the 10B verified number are already public.
Links
- Website: outlier.host
- Engine: github.com/mattkerr09/outlier-engine (Apache 2.0)
- Patents: US Provisional #64/026,886 · #64/030,368 · #64/034,028
- Contact: matt@outlier.host
- Built by: Matt Kerr · Kerr & Company LLC · Grand Rapids, MI
Changelog
- April 11, 2026 (evening): Removed unverified four-row MMLU table (76.19 / 77.82 / 81.46 / 84.49 at n≈14,042 with "stderr < 0.004"). A forensic audit confirmed that only the 10B number was backed by a preserved source file; the 40B, 70B, and 150B JSONs lived only on a training cluster that was decommissioned before they could be saved. Re-running on cloud GPU now.
- April 11, 2026 (afternoon): A previous well-intentioned session updated this card with the four-row table based on a chat-style status report from the training cluster. We did not verify the source files at the time. We are documenting that mistake here so the failure mode is public.