Update README.md
Browse files
README.md
CHANGED
|
@@ -3,12 +3,16 @@ license: apache-2.0
|
|
| 3 |
---
|
| 4 |
# Zaya1-8B
|
| 5 |
|
| 6 |
-
Zaya1-8B is a small mixture of experts language model trained end-to-end by Zyphra. Zaya1-8B sets a new standard of intelligence efficiency for its parameter count through a combination of novel architecture
|
| 7 |
|
| 8 |
Zaya1-8B excels at detailed long-form reasoning especially for mathematical and coding task. It punches heavily above its weight in these regimes and due to its inference efficiency and small size can be highly effective in test-time compute harnesses.
|
| 9 |
|
| 10 |
Due to its small total parameter count, Zaya1-8B can also be deployed on-device for local LLM applications.
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
## Performance
|
| 13 |
|
| 14 |
Zaya1-8B performs extremely strongly, especially in challenging mathematical, reasoning, and coding benchmarks. Zaya1-8B is competitive with models several times its own size.
|
|
|
|
| 3 |
---
|
| 4 |
# Zaya1-8B
|
| 5 |
|
| 6 |
+
Zaya1-8B is a small mixture of experts language model with 760M active parameters and 8.4B total parameters trained end-to-end by Zyphra. Zaya1-8B sets a new standard of intelligence efficiency for its parameter count through a combination of novel architecture and innovations in pretraining and post-training.
|
| 7 |
|
| 8 |
Zaya1-8B excels at detailed long-form reasoning especially for mathematical and coding task. It punches heavily above its weight in these regimes and due to its inference efficiency and small size can be highly effective in test-time compute harnesses.
|
| 9 |
|
| 10 |
Due to its small total parameter count, Zaya1-8B can also be deployed on-device for local LLM applications.
|
| 11 |
|
| 12 |
+
Learn more in our [technical report](-/link) and [blog](-/link)
|
| 13 |
+
|
| 14 |
+
This is the post-trained reasoning version of Zaya1-8B. The pretraining base can be found [here](https://huggingface.co/Zyphra/ZAYA1-reasoning-base)
|
| 15 |
+
|
| 16 |
## Performance
|
| 17 |
|
| 18 |
Zaya1-8B performs extremely strongly, especially in challenging mathematical, reasoning, and coding benchmarks. Zaya1-8B is competitive with models several times its own size.
|