--- license: apache-2.0 tags: - auron - chimera - gdn - ouroboros - hybrid-architecture language: - en thumbnail: auron_banner.png --- ![Auron](auron_banner.png) # Auron-279M (Archived) > **Note:** This model is part of a scaling study. The 279M achieved a final val_loss of **3.188** — virtually identical to the 4x larger 1.1B model (3.180), revealing a scaling wall in the Ouroboros weight-sharing mechanism. The **510M** model is the best-performing Chimera variant. > > **For inference and testing, use [Auron-510M](https://huggingface.co/nyxia/Auron-510M) (val_loss 3.035).** | Model | Params | Final Val Loss | Status | |-------|--------|---------------|--------| | Auron-279M | 279M | 3.188 | Archived | | **[Auron-510M](https://huggingface.co/nyxia/Auron-510M)** | **510M** | **3.035** | **Best** | | Auron-1.1B | 1.1B | 3.180 | Archived | **Paper:** [Auron](https://github.com/Fy-/Auron) | **Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron) | **Blog:** [HuggingFace](https://huggingface.co/blog/nyxia/auron) ## Architecture - **Type:** Chimera (4 bottom + 4×3 top = 16 virtual) - **Dim:** 1024, head_dim=64, expand_v=2 - **Params:** 279M (123M unique + 155M embed) - **Trained:** 250K steps, 5B tokens, WSD schedule ```python from ouro import load_model, generate model, tokenizer, device = load_model("nyxia/Auron-510M") # Use 510M generate(model, tokenizer, device, "The history of") ``` Built by [Florian Gasquez](https://fyx.jp) ([@nyxia](https://huggingface.co/nyxia)). Part of [Soulkyn](https://soulkyn.com).