yury-zyphra commited on
Commit
2922cfb
·
verified ·
1 Parent(s): 3aac7ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -12
README.md CHANGED
@@ -1,29 +1,29 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- # Zaya1-8B
5
 
6
- Zaya1-8B is a small mixture of experts language model with 760M active parameters and 8.4B total parameters trained end-to-end by Zyphra. Zaya1-8B sets a new standard of intelligence efficiency for its parameter count through a combination of novel architecture and innovations in pretraining and post-training.
7
 
8
- Zaya1-8B excels at detailed long-form reasoning especially for mathematical and coding task. It punches heavily above its weight in these regimes and due to its inference efficiency and small size can be highly effective in test-time compute harnesses.
9
 
10
- Due to its small total parameter count, Zaya1-8B can also be deployed on-device for local LLM applications.
11
 
12
  Learn more in our [technical report](-/link) and [blog](-/link).
13
 
14
- This is the post-trained reasoning version of Zaya1-8B. The pretraining base can be found [here](https://huggingface.co/Zyphra/ZAYA1-reasoning-base).
15
 
16
  ## Performance
17
 
18
- Zaya1-8B performs extremely strongly, especially in challenging mathematical, reasoning, and coding benchmarks. Zaya1-8B is competitive with models several times its own size.
19
 
20
  ![zaya1_scaling_barchart_v3](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/K7EnqZ1nYX_OJBVqvs8tM.png)
21
 
22
- First we compare Zaya1-8B to the SOTA Qwen3 and Qwen3.5 model series of approximately the same parameter count as well as the recently released Gemma4 models and secondly to a variety of larger open-weights models.
23
 
24
  ### In-class comparison against open-source reasoning models
25
 
26
- | Category | Benchmark | Zaya1-8B<br>(0.7B / 8.0B) | Qwen3-4B-Thinking-2507<br>(4.0B / 4.0B) | Qwen3.5-4B<br>(4.0B / 4.0B) | Gemma-4-E4B-it<br>(4.0B / 8.0B*) |
27
  |---|---|---:|---:|---:|---:|
28
  | Math | AIME'26 | 89.1 | 77.5 | 84.5 | 50.3 |
29
  | Math | HMMT Feb.'26 | 71.6 | 60.8 | 63.6 | 32.1 |
@@ -45,7 +45,7 @@ First we compare Zaya1-8B to the SOTA Qwen3 and Qwen3.5 model series of approxim
45
 
46
  | Model | Active | Total | AIME'26 | HMMT'26 | LCB-v6 | IFEval | GPQA-D | MMLU-Pro |
47
  |---|---:|---:|---:|---:|---:|---:|---:|---:|
48
- | Zaya1-8B | 0.7B | 8B | 89.1 | 71.6 | 63.8 | 85.8 | 71.0 | 74.2 |
49
  | Arcee-Trinity-Mini | 3B | 26B | 59.6 | 36.9 | 33.3 | 62.0 | 46.8 | 70.6 |
50
  | N3-Nano-30B | 3B | 30B | 90.1 | 75.5 | 64.6 | 92.8 | 75.1 | 78.9 |
51
  | OLMo-3.1-32B-Think | 32B | 32B | 78.9 | 50.6 | 58.3 | 93.2 | 59.6 | 75.8 |
@@ -60,7 +60,7 @@ All numbers are run on the Zyphra evaluation harness. Models are ordered by tota
60
  ### Prerequisites
61
  We recommend installing the following libraries in a fresh python environment (tested with python 3.12).
62
 
63
- To use Zaya1-8B, install `zaya1` branch from our fork of `vllm` library (the command will trigger a full build of vLLM from source):
64
  ```bash
65
  pip install "vllm @ git+https://github.com/Zyphra/vllm.git@zaya1"
66
  ```
@@ -73,7 +73,7 @@ pip install "transformers @ git+https://github.com/Zyphra/transformers.git@zaya1
73
  ### Deployment
74
  To start vLLM server, run the following command:
75
  ```bash
76
- vllm serve Zyphra/Zaya1-8B --port 8010 \
77
  --mamba-cache-dtype float32 --dtype bfloat16 \
78
  --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser zaya_xml
79
  ```
@@ -83,7 +83,7 @@ Once the server is up, you can query a model with `curl` like in the following e
83
  curl http://localhost:8010/v1/chat/completions \
84
  -H "Content-Type: application/json" \
85
  -d '{
86
- "model": "Zyphra/Zaya1-8B",
87
  "messages": [
88
  {"role": "system", "content": "You are a helpful assistant."},
89
  {"role": "user", "content": "Hello. How is it going?"}
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # ZAYA1-8B
5
 
6
+ ZAYA1-8B is a small mixture of experts language model with 760M active parameters and 8.4B total parameters trained end-to-end by Zyphra. ZAYA1-8B sets a new standard of intelligence efficiency for its parameter count through a combination of novel architecture and innovations in pretraining and post-training.
7
 
8
+ ZAYA1-8B excels at detailed long-form reasoning especially for mathematical and coding task. It punches heavily above its weight in these regimes and due to its inference efficiency and small size can be highly effective in test-time compute harnesses.
9
 
10
+ Due to its small total parameter count, ZAYA1-8B can also be deployed on-device for local LLM applications.
11
 
12
  Learn more in our [technical report](-/link) and [blog](-/link).
13
 
14
+ This is the post-trained reasoning version of ZAYA1-8B. The pretraining base can be found [here](https://huggingface.co/Zyphra/ZAYA1-reasoning-base).
15
 
16
  ## Performance
17
 
18
+ ZAYA1-8B performs extremely strongly, especially in challenging mathematical, reasoning, and coding benchmarks. ZAYA1-8B is competitive with models several times its own size.
19
 
20
  ![zaya1_scaling_barchart_v3](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/K7EnqZ1nYX_OJBVqvs8tM.png)
21
 
22
+ First we compare ZAYA1-8B to the SOTA Qwen3 and Qwen3.5 model series of approximately the same parameter count as well as the recently released Gemma4 models and secondly to a variety of larger open-weights models.
23
 
24
  ### In-class comparison against open-source reasoning models
25
 
26
+ | Category | Benchmark | ZAYA1-8B<br>(0.7B / 8.0B) | Qwen3-4B-Thinking-2507<br>(4.0B / 4.0B) | Qwen3.5-4B<br>(4.0B / 4.0B) | Gemma-4-E4B-it<br>(4.0B / 8.0B*) |
27
  |---|---|---:|---:|---:|---:|
28
  | Math | AIME'26 | 89.1 | 77.5 | 84.5 | 50.3 |
29
  | Math | HMMT Feb.'26 | 71.6 | 60.8 | 63.6 | 32.1 |
 
45
 
46
  | Model | Active | Total | AIME'26 | HMMT'26 | LCB-v6 | IFEval | GPQA-D | MMLU-Pro |
47
  |---|---:|---:|---:|---:|---:|---:|---:|---:|
48
+ | ZAYA1-8B | 0.7B | 8B | 89.1 | 71.6 | 63.8 | 85.8 | 71.0 | 74.2 |
49
  | Arcee-Trinity-Mini | 3B | 26B | 59.6 | 36.9 | 33.3 | 62.0 | 46.8 | 70.6 |
50
  | N3-Nano-30B | 3B | 30B | 90.1 | 75.5 | 64.6 | 92.8 | 75.1 | 78.9 |
51
  | OLMo-3.1-32B-Think | 32B | 32B | 78.9 | 50.6 | 58.3 | 93.2 | 59.6 | 75.8 |
 
60
  ### Prerequisites
61
  We recommend installing the following libraries in a fresh python environment (tested with python 3.12).
62
 
63
+ To use ZAYA1-8B, install `zaya1` branch from our fork of `vllm` library (the command will trigger a full build of vLLM from source):
64
  ```bash
65
  pip install "vllm @ git+https://github.com/Zyphra/vllm.git@zaya1"
66
  ```
 
73
  ### Deployment
74
  To start vLLM server, run the following command:
75
  ```bash
76
+ vllm serve Zyphra/ZAYA1-8B --port 8010 \
77
  --mamba-cache-dtype float32 --dtype bfloat16 \
78
  --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser zaya_xml
79
  ```
 
83
  curl http://localhost:8010/v1/chat/completions \
84
  -H "Content-Type: application/json" \
85
  -d '{
86
+ "model": "Zyphra/ZAYA1-8B",
87
  "messages": [
88
  {"role": "system", "content": "You are a helpful assistant."},
89
  {"role": "user", "content": "Hello. How is it going?"}