Zyphra
/

ZAYA1-8B

Safetensors

zaya

Eval Results

Model card Files Files and versions

xet

Community

BerenMillidge commited on 4 days ago

Commit

6cd227b

verified ·

1 Parent(s): e9dcba3

Update README.md

Browse files

Files changed (1) hide show

README.md +21 -2

README.md CHANGED Viewed

@@ -3,7 +3,13 @@ license: apache-2.0
 ---
 ### In-class comparison against open-source reasoning models
@@ -23,6 +29,19 @@ license: apache-2.0
 | Agentic | BFCL-v4 | 39.22 | 49.7 | 45.2 | 31.7 |
 | Agentic | τ² | 43.12 | 52.9 | 82.1 | 37.7 |
-\* Gemma-4-E4B-it includes 4B additional embedding parameters as part of its total.
-> RW: Zaya1-8B numbers in this draft table are from the math+code+TTC soup checkpoint before final behavioral RL and should be refreshed after the final checkpoint is selected.

 ---
+## Performance
+Zaya1-8B performs extremely strongly, especially in challenging mathematical, reasoning, and coding benchmarks. Zaya1-8B is competitive with models several times its own size.
+![zaya1_scaling_barchart_v3](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/K7EnqZ1nYX_OJBVqvs8tM.png)
+First we compare Zaya1-8B to the SOTA Qwen3 and Qwen3.5 model series of approximately the same parameter count as well as the recently released Gemma4 models and secondly to a variety of larger open-weights models.
 ### In-class comparison against open-source reasoning models
 | Agentic | BFCL-v4 | 39.22 | 49.7 | 45.2 | 31.7 |
 | Agentic | τ² | 43.12 | 52.9 | 82.1 | 37.7 |
+### Scaling comparison against larger open-source reasoning models
+| Model | Active | Total | AIME'26 | HMMT'26 | LCB-v6 | IFEval | GPQA-D | MMLU-Pro |
+|---|---:|---:|---:|---:|---:|---:|---:|---:|
+| Zaya1-8B | 0.7B | 8B | 89.1 | 71.6 | 63.8 | 85.8 | 71.0 | 74.2 |
+| Arcee-Trinity-Mini | 3B | 26B | 59.6 | 36.9 | 33.3 | 62.0 | 46.8 | 70.6 |
+| N3-Nano-30B | 3B | 30B | 90.1 | 75.5 | 64.6 | 92.8 | 75.1 | 78.9 |
+| OLMo-3.1-32B-Think | 32B | 32B | 78.9 | 50.6 | 58.3 | 93.2 | 59.6 | 75.8 |
+| Qwen3-Next-80B-A3B-Think | 3B | 80B | 90.2 | 79.3 | 67.8 | 88.5 | 76.7 | 82.6 |
+| Intellect-3 | 12B | 106B | 86.3 | 72.2 | 66.8 | 81.2 | 74.6 | 82.3 |
+| Mistral-Small-4-119B | 6B | 119B | 86.4 | 70.6 | 57.9 | 84.0 | 77.2 | 81.6 |
+All numbers are run on the Zyphra evaluation harness. Models are ordered by total parameter count.