docs: V3.3 lineup update
Browse files
README.md
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Outlier-Ai
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Outlier-Ai
|
| 6 |
+
|
| 7 |
+
**Ternary-quantized Mixture-of-Experts for consumer hardware. 3 patents filed. 14 days solo from zero to 150B.**
|
| 8 |
+
|
| 9 |
+
Outlier is a research project building dense LLM-quality models on top of Qwen2.5 via ternary-quantized delta MoE experts. The architecture stores weights as `{-1, 0, +1}` (~1.58 bits) plus a per-row fp16 scale, achieving 6Γβ8Γ memory reduction over fp16 while preserving accuracy.
|
| 10 |
+
|
| 11 |
+
## Model lineup
|
| 12 |
+
|
| 13 |
+
| Model | MMLU | Context | Status | Effective params |
|
| 14 |
+
|---|---|---|---|---|
|
| 15 |
+
| [Outlier-10B-V3.2](https://huggingface.co/Outlier-Ai/Outlier-10B-V3.2) | β | 32K | research preview | ~23B |
|
| 16 |
+
| [Outlier-40B-V3.2](https://huggingface.co/Outlier-Ai/Outlier-40B-V3.2) | 77.80% | 32K | production | ~30B |
|
| 17 |
+
| [Outlier-70B-V3.3](https://huggingface.co/Outlier-Ai/Outlier-70B-V3.3) β | **83.10%** | **128K** | **production (new)** | ~40B |
|
| 18 |
+
| [Outlier-150B-V3.2](https://huggingface.co/Outlier-Ai/Outlier-150B-V3.2) | 84.46% | 32K | production | ~150B |
|
| 19 |
+
|
| 20 |
+
β V3.3 is V3.2 base weights + a 280-scalar trained alpha overlay (15 KB) + YaRN 4Γ context extension. **Same weights as V3.2, +1.61pp MMLU, 4Γ longer context.**
|
| 21 |
+
|
| 22 |
+
## Architecture
|
| 23 |
+
|
| 24 |
+
- **Base:** Qwen2.5 family (7B / 14B / 32B / 72B for 10B / 40B / 70B / 150B respectively)
|
| 25 |
+
- **MoE delta:** Ternary-quantized expert weights stored as `int8 sign Γ fp16 per-row scale`, summed with the shared base FFN output via per-expert alpha contribution scalars
|
| 26 |
+
- **Routing:** Per-layer router (top-k = 2, n_experts = 8 typically)
|
| 27 |
+
- **150B special:** Cross-layer expert sharing (ReXMoE) β 88 unique experts shared across 44 routers via 11 groups Γ 4 PSR variants
|
| 28 |
+
- **Training:** CAKLD (combined adaptive knowledge distillation) loss, alpha-gated delta updates, frozen base
|
| 29 |
+
- **Quantization:** Tequila adaptive deadzone for ternary, LoTA-QAF for activation quantization
|
| 30 |
+
|
| 31 |
+
## Patents (filed)
|
| 32 |
+
|
| 33 |
+
1. **Per-channel ternary scale recalibration** β adaptive per-output-channel scaling for ternary weights
|
| 34 |
+
2. **Cross-layer expert sharing (ReXMoE)** β used in Outlier-150B
|
| 35 |
+
3. **Alpha contribution overlay** β the V3.3 fix; 280 trained scalars recover a 1.34pp MMLU regression on 70B with 250,000Γ fewer trainable parameters than full LoRA
|
| 36 |
+
|
| 37 |
+
## Tagline
|
| 38 |
+
|
| 39 |
+
> Built in 14 days on $900 and a Mac Studio.
|
| 40 |
+
|
| 41 |
+
The full Outlier project went from a blank repo to a 150B model with verified MMLU on April 2026 by a single developer running cloud sprints between Mac Studio sessions. Total cloud spend through V3.3: ~$300. Total wall clock: 14 days.
|
| 42 |
+
|
| 43 |
+
## Resources
|
| 44 |
+
|
| 45 |
+
- π [Paper draft (arXiv)](#) β code 396SXN cs.LG (pending submission)
|
| 46 |
+
- π [outlier.host](https://outlier.host)
|
| 47 |
+
- π» [GitHub: Outlier-host/outlier](https://github.com/Outlier-host/outlier)
|
| 48 |
+
- π [v10 ground truth](https://github.com/Outlier-host/outlier/blob/main/OUTLIER_GROUND_TRUTH_v10.md) β single source of truth for every benchmark number
|
| 49 |
+
|
| 50 |
+
## License
|
| 51 |
+
|
| 52 |
+
All Outlier model weights and code are released under Apache 2.0.
|