---
license: apache-2.0
tags:
  - auron
  - chimera
  - gdn
  - ouroboros
  - hybrid-architecture
language:
  - en
thumbnail: auron_banner.png
---

![Auron](auron_banner.png)

# Auron-279M (Archived)

> **Note:** This model is part of a scaling study. The 279M achieved a final val_loss of **3.188** — virtually identical to the 4x larger 1.1B model (3.180), revealing a scaling wall in the Ouroboros weight-sharing mechanism. The **510M** model is the best-performing Chimera variant.
>
> **For inference and testing, use [Auron-510M](https://huggingface.co/nyxia/Auron-510M) (val_loss 3.035).**

| Model | Params | Final Val Loss | Status |
|-------|--------|---------------|--------|
| Auron-279M | 279M | 3.188 | Archived |
| **[Auron-510M](https://huggingface.co/nyxia/Auron-510M)** | **510M** | **3.035** | **Best** |
| Auron-1.1B | 1.1B | 3.180 | Archived |

**Paper:** [Auron](https://github.com/Fy-/Auron) | **Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron) | **Blog:** [HuggingFace](https://huggingface.co/blog/nyxia/auron)

## Architecture
- **Type:** Chimera (4 bottom + 4×3 top = 16 virtual)
- **Dim:** 1024, head_dim=64, expand_v=2
- **Params:** 279M (123M unique + 155M embed)
- **Trained:** 250K steps, 5B tokens, WSD schedule

```python
from ouro import load_model, generate
model, tokenizer, device = load_model("nyxia/Auron-510M")  # Use 510M
generate(model, tokenizer, device, "The history of")
```

Built by [Florian Gasquez](https://fyx.jp) ([@nyxia](https://huggingface.co/nyxia)). Part of [Soulkyn](https://soulkyn.com).