---
license: apache-2.0
tags:
  - auron
  - chimera
  - gdn
  - ouroboros
  - hybrid-architecture
language:
  - en
thumbnail: auron_banner.png
---

![Auron](auron_banner.png)

# Auron-510M

**Auron** — Chimera hybrid GDN-Attention language models with Ouroboros weight sharing.

**Paper:** [Auron: Depth-Efficient Language Models via Hybrid Recurrent-Attention Weight Sharing](https://github.com/Fy-/Auron)
**Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron)

## Architecture
- **Type:** Chimera (ChimeraConfig)
- **Dim:** 1536
- **Layers:** 16 virtual
- **Params:** 510,217,280 (510M)
- **Vocab:** 151936 (Qwen 3 tokenizer)
- **Context:** 2048 tokens
- **Topology:** 4 unique bottom + 4×3 shared top
- **GDN:Attn ratio:** 3:1 (every 4th layer is attention)
- **Virtual equivalent:** ~1,020,434,560 params

## Training Curves

![Training Curves](training_curves.png)

## Training
- **Step:** 249,000
- **Data:** Mixed (75% FineWeb-Edu, 18% StarCoder, 5% FineMath, 2% UltraChat)
- **Optimizer:** Muon + AdamW (decoupled embedding LR)
- **Schedule:** WSD (Warmup-Stable-Decay)

## Usage

```bash
git clone https://github.com/Fy-/Auron && cd Auron && rye sync
```

```python
from ouro import load_model, generate

model, tokenizer, device = load_model("nyxia/Auron-510M")
generate(model, tokenizer, device, "The history of")
```

## Sampling

Default: T=0.7, top_k=20, top_p=0.9, rep_pen=1.0, presence_pen=1.5 (Ouroboros weight sharing requires presence penalty >= 1.5 to prevent attractor wells).

## Links

- **Paper:** [Auron: Depth-Efficient Language Models via Hybrid Recurrent-Attention Weight Sharing](https://github.com/Fy-/Auron/blob/master/Auron_chimera_topology_paper.pdf)
- **Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron)
- **Models:** [huggingface.co/nyxia](https://huggingface.co/nyxia)

Built by [Florian Gasquez](https://fyx.jp) ([@nyxia](https://huggingface.co/nyxia)). Part of the [Soulkyn](https://soulkyn.com) project.