--- license: apache-2.0 tags: - auron - chimera - gdn - ouroboros - hybrid-architecture language: - en thumbnail: auron_banner.png --- ![Auron](auron_banner.png) # Auron-510M **Auron** — Chimera hybrid GDN-Attention language models with Ouroboros weight sharing. **Paper:** [Auron: Depth-Efficient Language Models via Hybrid Recurrent-Attention Weight Sharing](https://github.com/Fy-/Auron) **Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron) ## Architecture - **Type:** Chimera (ChimeraConfig) - **Dim:** 1536 - **Layers:** 16 virtual - **Params:** 510,217,280 (510M) - **Vocab:** 151936 (Qwen 3 tokenizer) - **Context:** 2048 tokens - **Topology:** 4 unique bottom + 4×3 shared top - **GDN:Attn ratio:** 3:1 (every 4th layer is attention) - **Virtual equivalent:** ~1,020,434,560 params ## Training Curves ![Training Curves](training_curves.png) ## Training - **Step:** 249,000 - **Data:** Mixed (75% FineWeb-Edu, 18% StarCoder, 5% FineMath, 2% UltraChat) - **Optimizer:** Muon + AdamW (decoupled embedding LR) - **Schedule:** WSD (Warmup-Stable-Decay) ## Usage ```bash git clone https://github.com/Fy-/Auron && cd Auron && rye sync ``` ```python from ouro import load_model, generate model, tokenizer, device = load_model("nyxia/Auron-510M") generate(model, tokenizer, device, "The history of") ``` ## Sampling Default: T=0.7, top_k=20, top_p=0.9, rep_pen=1.0, presence_pen=1.5 (Ouroboros weight sharing requires presence penalty >= 1.5 to prevent attractor wells). ## Links - **Paper:** [Auron: Depth-Efficient Language Models via Hybrid Recurrent-Attention Weight Sharing](https://github.com/Fy-/Auron/blob/master/Auron_chimera_topology_paper.pdf) - **Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron) - **Models:** [huggingface.co/nyxia](https://huggingface.co/nyxia) Built by [Florian Gasquez](https://fyx.jp) ([@nyxia](https://huggingface.co/nyxia)). Part of the [Soulkyn](https://soulkyn.com) project.