Outlier-40B

⚠️ Legacy model — superseded by Outlier-40B-V3.2

This is an early Outlier release. It is kept publicly available for historical reference and reproducibility. New users should prefer the V3.2 model linked above.

What this was

Outlier-40B was an early Outlier release based on Qwen2.5-14B-Instruct with 14.8B (legacy dense, label predates current naming) total parameters. It has been superseded by the V3.2 architecture which adds significant improvements in training methodology and runtime efficiency.

What's new in V3.2

Zero-delta expert initialization (faster convergence)
CAKLD distillation training
Three-tier paged runtime
Cross-layer expert prefetch
Alpha-only TTT for personalization

See Outlier-40B-V3.2 for the latest.

Architecture

Outlier uses a shared expert + ternary delta expert architecture:

Shared expert: The full base model serves as a shared dense expert
Ternary delta experts: Additional experts stored at 1.58 bits/weight using ternary quantization ({-1, 0, +1})
Dense-Sparse-Dense (DSD) layer pattern: Alternating dense and sparse layers for efficient compute
Zero-delta initialization: Experts initialized to zero so training begins from the base model
Top-2 routing: Each token activates the shared expert plus the top-2 ternary delta experts
Three-tier paged runtime: GPU → CPU → disk paging for consumer hardware deployment
Cross-layer expert prefetch: Prefetches next-layer experts during current-layer compute

License

Apache 2.0. The base model (Qwen2.5-14B-Instruct) was created by Alibaba Cloud and is used under its original license terms.

Built by

Matt Kerr · Kerr & Company LLC · Grand Rapids, MI

Downloads last month: 796

Safetensors

Model size

15B params

Tensor type

BF16

Model tree for Outlier-Ai/Outlier-40B

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-14B-Instruct

Finetuned

(382)

this model

Outlier-Ai
/

Outlier-40B