Zyphra
/

ZAYA1-8B

+---
+license: apache-2.0
+---
+### In-class comparison against open-source reasoning models
+| Category | Benchmark | Zaya1-8B<br>(0.7B / 8.0B) | Qwen3-4B-Thinking-2507<br>(4.0B / 4.0B) | Qwen3.5-4B<br>(4.0B / 4.0B) | Gemma-4-E4B-it<br>(4.0B / 8.0B*) |
+|---|---|---:|---:|---:|---:|
+| Math | AIME'26 | 89.1 | 77.5 | 84.5 | 50.3 |
+| Math | HMMT Feb.'26 | 71.6 | 60.8 | 63.6 | 32.1 |
+| Math | IMO-AnswerBench | 59.3 | 50.9 | 48.7 | 27.3 |
+| Math | APEX-shortlist | 32.2 | 16.9 | -- | 6.1 |
+| Code | LiveCodeBench-v6 | 65.8 | 54.2 | -- | 54.2 |
+| Knowledge | GPQA-Diamond | 71.0 | 66.5 | 76.2 | 57.4 |
+| Knowledge | MMLU-Pro | 74.2 | 74.3 | 79.1 | 70.2 |
+| Instruction | IFEval | 85.58 | 86.8 | 89.8 | 88.50 |
+| Instruction | IFBench | 52.56 | 52.9 | 59.2 | 42.67 |
+| Style & chat | EQBench | 72.95 | 79.6 | 79.5 | 80.15 |
+| Style & chat | Creative Writing v3 | 62.97 | 58.6 | 72.9 | 83.75 |
+| Agentic | BFCL-v4 | 39.22 | 49.7 | 45.2 | 31.7 |
+| Agentic | τ² | 43.12 | 52.9 | 82.1 | 37.7 |
+\* Gemma-4-E4B-it includes 4B additional embedding parameters as part of its total.
+> RW: Zaya1-8B numbers in this draft table are from the math+code+TTC soup checkpoint before final behavioral RL and should be refreshed after the final checkpoint is selected.