Update README.md
Browse files
README.md
CHANGED
|
@@ -3,7 +3,13 @@ license: apache-2.0
|
|
| 3 |
---
|
| 4 |
|
| 5 |
|
|
|
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
### In-class comparison against open-source reasoning models
|
| 9 |
|
|
@@ -23,6 +29,19 @@ license: apache-2.0
|
|
| 23 |
| Agentic | BFCL-v4 | 39.22 | 49.7 | 45.2 | 31.7 |
|
| 24 |
| Agentic | τ² | 43.12 | 52.9 | 82.1 | 37.7 |
|
| 25 |
|
| 26 |
-
\* Gemma-4-E4B-it includes 4B additional embedding parameters as part of its total.
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
|
| 6 |
+
## Performance
|
| 7 |
|
| 8 |
+
Zaya1-8B performs extremely strongly, especially in challenging mathematical, reasoning, and coding benchmarks. Zaya1-8B is competitive with models several times its own size.
|
| 9 |
+
|
| 10 |
+

|
| 11 |
+
|
| 12 |
+
First we compare Zaya1-8B to the SOTA Qwen3 and Qwen3.5 model series of approximately the same parameter count as well as the recently released Gemma4 models and secondly to a variety of larger open-weights models.
|
| 13 |
|
| 14 |
### In-class comparison against open-source reasoning models
|
| 15 |
|
|
|
|
| 29 |
| Agentic | BFCL-v4 | 39.22 | 49.7 | 45.2 | 31.7 |
|
| 30 |
| Agentic | τ² | 43.12 | 52.9 | 82.1 | 37.7 |
|
| 31 |
|
|
|
|
| 32 |
|
| 33 |
+
### Scaling comparison against larger open-source reasoning models
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
| Model | Active | Total | AIME'26 | HMMT'26 | LCB-v6 | IFEval | GPQA-D | MMLU-Pro |
|
| 37 |
+
|---|---:|---:|---:|---:|---:|---:|---:|---:|
|
| 38 |
+
| Zaya1-8B | 0.7B | 8B | 89.1 | 71.6 | 63.8 | 85.8 | 71.0 | 74.2 |
|
| 39 |
+
| Arcee-Trinity-Mini | 3B | 26B | 59.6 | 36.9 | 33.3 | 62.0 | 46.8 | 70.6 |
|
| 40 |
+
| N3-Nano-30B | 3B | 30B | 90.1 | 75.5 | 64.6 | 92.8 | 75.1 | 78.9 |
|
| 41 |
+
| OLMo-3.1-32B-Think | 32B | 32B | 78.9 | 50.6 | 58.3 | 93.2 | 59.6 | 75.8 |
|
| 42 |
+
| Qwen3-Next-80B-A3B-Think | 3B | 80B | 90.2 | 79.3 | 67.8 | 88.5 | 76.7 | 82.6 |
|
| 43 |
+
| Intellect-3 | 12B | 106B | 86.3 | 72.2 | 66.8 | 81.2 | 74.6 | 82.3 |
|
| 44 |
+
| Mistral-Small-4-119B | 6B | 119B | 86.4 | 70.6 | 57.9 | 84.0 | 77.2 | 81.6 |
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
All numbers are run on the Zyphra evaluation harness. Models are ordered by total parameter count.
|