AI & ML interests

None defined yet.

Recent Activity

ur-dad-matt  updated a model about 16 hours ago
Outlier-Ai/Outlier-40B
ur-dad-matt  updated a model about 16 hours ago
Outlier-Ai/Outlier-10B-V2
ur-dad-matt  updated a model about 16 hours ago
Outlier-Ai/Outlier-10B
View all activity

Organization Card

Outlier

Ternary Mixture-of-Experts language models on consumer hardware.

Outlier is an open-source ternary-quantized Mixture-of-Experts runtime and a small family of models trained by one founder on a single GPU pipeline. Routed experts are stored in {-1, 0, +1} ternary at ~1.6 bits per weight. A frozen full-precision base model acts as the shared expert. Top-2 routing per layer. Apache 2.0.

Built solo in under 14 days on consumer hardware plus spot GPUs. Total compute spend under $700. Three U.S. provisional patents filed.

Honest benchmark status

We use lm-evaluation-harness v0.4.9.1, 5-shot MMLU, bfloat16. Every number on this card has a sample size, a stderr, and a status. When a number is preliminary, we say so. When a number can't be reproduced from a saved source file, we say that too.

Model MMLU Sample size Status
Outlier-10B V3.2 76.19% n ≈ 13,693 (limit=570/subtask), stderr 0.0037 Verified — backed by source JSON
Outlier-40B V3.2 reserved Re-running on cloud GPU, ~48h
Outlier-70B V3.2 reserved Re-running on cloud GPU, ~48h
Outlier-150B V3.2 reserved Re-running on cloud GPU, ~48h

On the 40B / 70B / 150B numbers: A previous version of this card showed 77.82% / 81.46% / 84.49% MMLU at full sample. Those numbers came from a training-cluster session whose original eval JSON files were not preserved when the cluster was decommissioned. We chose not to publish numbers we cannot reproduce from a saved source file. The full-sample re-runs are running now. Verified numbers will appear here within 48 hours.

The 10B is the one we can defend in writing today. It is backed by v3_2_10b_moe_mmlu_2026-04-09T21-24-07.012991.json (preserved in our local backup), acc=0.7619, acc_stderr=0.00365, n≈13,693.

What actually exists today

  • Open-source engine at github.com/mattkerr09/outlier-engine — Apache 2.0. Ternary MoE loader, three-tier paged cache, MPS + CPU backends, lm-eval compatible.
  • First-party V3.2 inference verified on Apple Silicon — Outlier-10B V3.2 generates coherent text in paged mode on a 64 GB Mac Studio M1 Ultra (52 GB peak RSS, ~0.1 tok/s steady state). Non-paged 10B runs at ~13.5 tok/s on the same hardware.
  • GPU-resident expert dequantization — A patched modeling file (modeling_outlier_150b_rexmoe.py) materializes ternary experts to bf16 at load time, achieving ~56× speedup vs the original CPU→GPU dequant path on a single B200. The 70B and 40B repos are pending the same patch — port in progress.
  • Alive Model (TTT) module — 224 alpha scalars per user, profile switch in 0.259 ms, routing predictor at 99.7% on 10B. Quality lift on domain tasks is unvalidated pending an alpha-loader fix; we are not claiming it yet.
  • Three U.S. provisional patents filed — #64/026,886 (April 3, 2026), #64/030,368 (April 6, 2026), #64/034,028 (April 9, 2026).

What we are not claiming

We do not match or beat frontier models on raw MMLU. Kimi K2.5, GLM-5, Claude Opus, Gemini 3 Pro, and GPT-5 class models all score higher than our 10B V3.2 on pure MMLU.

Our working hypothesis — being tested right now — is that ternary MoE with paged inference lets a consumer-hardware user run a larger-capacity model in less RAM than dense Q4 quantization at comparable quality. That is a claim about compression and scaling, not benchmark dominance. We will publish the comparison numbers as they land.

Status

Pre-launch. Public release planned for Tuesday, April 14, 2026. V3.2 model weights are private until then; the engine and the 10B verified number are already public.

Links

Changelog

  • April 11, 2026 (evening): Removed unverified four-row MMLU table (76.19 / 77.82 / 81.46 / 84.49 at n≈14,042 with "stderr < 0.004"). A forensic audit confirmed that only the 10B number was backed by a preserved source file; the 40B, 70B, and 150B JSONs lived only on a training cluster that was decommissioned before they could be saved. Re-running on cloud GPU now.
  • April 11, 2026 (afternoon): A previous well-intentioned session updated this card with the four-row table based on a chat-style status report from the training cluster. We did not verify the source files at the time. We are documenting that mistake here so the failure mode is public.

datasets 0

None public yet