ur-dad-matt commited on
Commit
d6f1e2a
Β·
verified Β·
1 Parent(s): 6441b36

docs: V3.3 lineup update

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Outlier-Ai
3
+ ---
4
+
5
+ # Outlier-Ai
6
+
7
+ **Ternary-quantized Mixture-of-Experts for consumer hardware. 3 patents filed. 14 days solo from zero to 150B.**
8
+
9
+ Outlier is a research project building dense LLM-quality models on top of Qwen2.5 via ternary-quantized delta MoE experts. The architecture stores weights as `{-1, 0, +1}` (~1.58 bits) plus a per-row fp16 scale, achieving 6×–8Γ— memory reduction over fp16 while preserving accuracy.
10
+
11
+ ## Model lineup
12
+
13
+ | Model | MMLU | Context | Status | Effective params |
14
+ |---|---|---|---|---|
15
+ | [Outlier-10B-V3.2](https://huggingface.co/Outlier-Ai/Outlier-10B-V3.2) | β€” | 32K | research preview | ~23B |
16
+ | [Outlier-40B-V3.2](https://huggingface.co/Outlier-Ai/Outlier-40B-V3.2) | 77.80% | 32K | production | ~30B |
17
+ | [Outlier-70B-V3.3](https://huggingface.co/Outlier-Ai/Outlier-70B-V3.3) ⭐ | **83.10%** | **128K** | **production (new)** | ~40B |
18
+ | [Outlier-150B-V3.2](https://huggingface.co/Outlier-Ai/Outlier-150B-V3.2) | 84.46% | 32K | production | ~150B |
19
+
20
+ ⭐ V3.3 is V3.2 base weights + a 280-scalar trained alpha overlay (15 KB) + YaRN 4Γ— context extension. **Same weights as V3.2, +1.61pp MMLU, 4Γ— longer context.**
21
+
22
+ ## Architecture
23
+
24
+ - **Base:** Qwen2.5 family (7B / 14B / 32B / 72B for 10B / 40B / 70B / 150B respectively)
25
+ - **MoE delta:** Ternary-quantized expert weights stored as `int8 sign Γ— fp16 per-row scale`, summed with the shared base FFN output via per-expert alpha contribution scalars
26
+ - **Routing:** Per-layer router (top-k = 2, n_experts = 8 typically)
27
+ - **150B special:** Cross-layer expert sharing (ReXMoE) β€” 88 unique experts shared across 44 routers via 11 groups Γ— 4 PSR variants
28
+ - **Training:** CAKLD (combined adaptive knowledge distillation) loss, alpha-gated delta updates, frozen base
29
+ - **Quantization:** Tequila adaptive deadzone for ternary, LoTA-QAF for activation quantization
30
+
31
+ ## Patents (filed)
32
+
33
+ 1. **Per-channel ternary scale recalibration** β€” adaptive per-output-channel scaling for ternary weights
34
+ 2. **Cross-layer expert sharing (ReXMoE)** β€” used in Outlier-150B
35
+ 3. **Alpha contribution overlay** β€” the V3.3 fix; 280 trained scalars recover a 1.34pp MMLU regression on 70B with 250,000Γ— fewer trainable parameters than full LoRA
36
+
37
+ ## Tagline
38
+
39
+ > Built in 14 days on $900 and a Mac Studio.
40
+
41
+ The full Outlier project went from a blank repo to a 150B model with verified MMLU on April 2026 by a single developer running cloud sprints between Mac Studio sessions. Total cloud spend through V3.3: ~$300. Total wall clock: 14 days.
42
+
43
+ ## Resources
44
+
45
+ - πŸ“„ [Paper draft (arXiv)](#) β€” code 396SXN cs.LG (pending submission)
46
+ - 🌐 [outlier.host](https://outlier.host)
47
+ - πŸ’» [GitHub: Outlier-host/outlier](https://github.com/Outlier-host/outlier)
48
+ - πŸ“Š [v10 ground truth](https://github.com/Outlier-host/outlier/blob/main/OUTLIER_GROUND_TRUTH_v10.md) β€” single source of truth for every benchmark number
49
+
50
+ ## License
51
+
52
+ All Outlier model weights and code are released under Apache 2.0.