Instructions to use kyr0/zaya1-base-8b-8bit-MLX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kyr0/zaya1-base-8b-8bit-MLX with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("kyr0/zaya1-base-8b-8bit-MLX")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use kyr0/zaya1-base-8b-8bit-MLX with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "kyr0/zaya1-base-8b-8bit-MLX"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "kyr0/zaya1-base-8b-8bit-MLX"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use kyr0/zaya1-base-8b-8bit-MLX with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "kyr0/zaya1-base-8b-8bit-MLX"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default kyr0/zaya1-base-8b-8bit-MLX

Run Hermes

hermes

MLX LM

How to use kyr0/zaya1-base-8b-8bit-MLX with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "kyr0/zaya1-base-8b-8bit-MLX"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "kyr0/zaya1-base-8b-8bit-MLX"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "kyr0/zaya1-base-8b-8bit-MLX",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

kyr0 commited on 16 days ago

Commit

09d16cd

verified ·

1 Parent(s): 69c303a

Add files using upload-large-folder tool

Browse files

Files changed (10) hide show

.gitattributes +1 -0
README.md +76 -0
chat_template.jinja +205 -0
config.json +55 -0
generation_config.json +10 -0
model-00001-of-00002.safetensors +3 -0
model-00002-of-00002.safetensors +3 -0
model.safetensors.index.json +0 -0
tokenizer.json +3 -0
tokenizer_config.json +25 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,76 @@

+---
+license: apache-2.0
+library_name: mlx
+base_model: Zyphra/ZAYA1-8B
+tags:
+- mlx
+pipeline_tag: text-generation
+---
+# MLX convert - 8 bit quant
+This model depends on the following PR merge status and mlx-ml release status: https://github.com/ml-explore/mlx-lm/pull/1261
+You can either clone my fork: https://github.com/kyr0/mlx-lm/tree/feat/zaya-support or wait for mainline support for this model being merged by the MLX-ML team.
+# Model Card for ZAYA1-base
+ZAYA1 is an 800m active/8.3B total parameter MoE model, and the first trained entirely end-to-end on AMD’s hardware, software, and networking stack.
+Our ZAYA1 base model benchmark performance is extremely competitive with the SoTA Qwen3 series of models of comparable scale, and outperforms comparable western open-source models such as SmolLM3, and Phi4. ZAYA1-base excels especially at complex and challenging mathematical and STEM reasoning tasks, nearly matching the performance of SoTA Qwen3 thinking models under high pass@k settings even prior to explicit post-training for reasoning, and exceeds other strong reasoning models such as Phi4-reasoning, and Deepseek-R1-Distill.
+Details of our pretraining efforts, hardware specific optimizations, and ZAYA1 base model benchmarks are described in the [accompanying technical report](https://arxiv.org/abs/2511.17127).
+## Model Details
+ZAYA1's architecture includes several innovations developed at Zyphra. These include:
+- **Compressed Convolutional Attention (CCA)**: [This novel attention](https://arxiv.org/abs/2510.04476) mechanism performs attention entirely in the latent space enabling significant reductions in parameter count, prefill compute, and KV cache size compared to alternative attention mechanisms, while also being more performant in loss/flop.
+- **ZAYA1 Router**: The ZAYA1 router makes fundamental improvements to the linear router used in almost all existing large-scale MoE models. The ZAYA1 router replaces the linear with a downprojection followed by a depth-mixing EDA layer then a three-layer MLP per expert to add significant nonlinear expressivity to the router.
+- **Residual Scaling**: We add learnable scalar gates and biases to the residual stream and the outputs of each block. This provides a lightweight method to allow the model to carefully control its own norm and degree of forgetting across depth.
+![zaya_arch](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/Ih8RnOPNbtRzaVcH16ar-.png)
+ZAYA1-base uses the [Gemma3](https://ai.google.dev/gemma/terms) tokenizer.
+## Performance
+ZAYA1-base performs extremely competitively against other base models of a similar and even greater scale.
+![mmlu_pro_vs_ttft](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/nyWieuzXks9H4GM71XAzn.png)
+![Screenshot 2025-11-20 at 00.44.44](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/tsdgc4KWWs4SXfo4orOp4.png)
+## Quick start
+### Prerequisites
+To use ZAYA1, install `zaya` branch from our fork of `transformers` library, which is based on the v4.57.1 of `transformers`:
+```bash
+pip install "transformers @ git+https://github.com/Zyphra/transformers.git@zaya"
+```
+The command above relies on requirements for `transformers v4.57.1` being installed in your environment. If you're installing in a fresh Python environment, you might want to specify a specific extra, like `[dev-torch]`, to install all the dependencies:
+```bash
+pip install "transformers[dev-torch] @ git+https://github.com/Zyphra/transformers.git@zaya"
+```
+### Inference
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+tokenizer = AutoTokenizer.from_pretrained("Zyphra/ZAYA1-base")
+model = AutoModelForCausalLM.from_pretrained("Zyphra/ZAYA1-base", device_map="cuda", dtype=torch.bfloat16)
+input_text = "What factors contributed to the fall of the Roman Empire?"
+input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids, max_new_tokens=100)
+print(tokenizer.decode(outputs[0]))

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,205 @@

+{% macro render_extra_keys(json_dict, handled_keys) %}
+    {%- if json_dict is mapping %}
+        {%- for json_key in json_dict if json_key not in handled_keys %}
+            {%- if json_dict[json_key] is mapping or (json_dict[json_key] is sequence and json_dict[json_key] is not string) %}
+                {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' }}
+            {%- else %}
+                {{-'\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
+            {%- endif %}
+        {%- endfor %}
+    {%- endif %}
+{% endmacro %}
+{%- set enable_thinking = enable_thinking if enable_thinking is defined else True %}
+{# TODO: set truncate to true for deployment & agent evals. Keep on for SFT. #}
+{%- set truncate_history_thinking = truncate_history_thinking if truncate_history_thinking is defined else False %}
+{{- bos_token }}
+{%- set ns = namespace(last_user_idx = -1) %}
+{%- set loop_messages = messages %}
+{%- for m in loop_messages %}
+  {%- if m["role"] == "user" %}
+    {%- set ns.last_user_idx = loop.index0 %}
+  {%- endif %}
+{%- endfor %}
+{%- if messages[0]["role"] == "system" %}
+    {%- set system_message = messages[0]["content"] %}
+    {%- set loop_messages = messages[1:] %}
+{%- else %}
+    {%- set system_message = "" %}
+    {%- set loop_messages = messages %}
+{%- endif %}
+{%- if not tools is defined %}
+    {%- set tools = [] %}
+{%- endif %}
+{# Recompute last_user_idx relative to loop_messages after handling system #}
+{%- set ns = namespace(last_user_idx = -1) %}
+{%- for m in loop_messages %}
+  {%- if m["role"] == "user" %}
+    {%- set ns.last_user_idx = loop.index0 %}
+  {%- endif %}
+{%- endfor %}
+{%- if system_message is defined %}
+    {{- "<|im_start|>system\n" + system_message }}
+{%- else %}
+    {%- if tools is iterable and tools | length > 0 %}
+        {{- "<|im_start|>system\n" }}
+    {%- endif %}
+{%- endif %}
+{%- if tools is iterable and tools | length > 0 %}
+    {%- if system_message is defined and system_message | length > 0 %}
+        {{- "\n\n" }}
+    {%- endif %}
+    {{- "# Tools\n\nYou have access to the following functions:\n\n" }}
+    {{- "<tools>" }}
+    {%- for tool in tools %}
+        {%- if tool.function is defined %}
+            {%- set tool = tool.function %}
+        {%- endif %}
+        {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
+        {%- if tool.description is defined %}
+            {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
+        {%- endif %}
+        {{- '\n<parameters>' }}
+        {%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
+            {%- for param_name, param_fields in tool.parameters.properties|items %}
+                {{- '\n<parameter>' }}
+                {{- '\n<name>' ~ param_name ~ '</name>' }}
+                {%- if param_fields.type is defined %}
+                    {{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
+                {%- endif %}
+                {%- if param_fields.description is defined %}
+                    {{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
+                {%- endif %}
+                {%- if param_fields.enum is defined %}
+                    {{- '\n<enum>' ~ (param_fields.enum | tojson | safe) ~ '</enum>' }}
+                {%- endif %}
+                {%- set handled_keys = ['name', 'type', 'description', 'enum'] %}
+                {{- render_extra_keys(param_fields, handled_keys) }}
+                {{- '\n</parameter>' }}
+            {%- endfor %}
+        {%- endif %}
+        {% set handled_keys = ['type', 'properties', 'required'] %}
+        {{- render_extra_keys(tool.parameters, handled_keys) }}
+        {%- if tool.parameters is defined and tool.parameters.required is defined %}
+            {{- '\n<required>' ~ (tool.parameters.required | tojson | safe) ~ '</required>' }}
+        {%- endif %}
+        {{- '\n</parameters>' }}
+        {%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
+        {{- render_extra_keys(tool, handled_keys) }}
+        {{- '\n</function>' }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<zyphra_tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</zyphra_tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <zyphra_tool_call></zyphra_tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
+{%- endif %}
+{%- if system_message is defined %}
+    {{- '<|im_end|>\n' }}
+{%- else %}
+    {%- if tools is iterable and tools | length > 0 %}
+        {{- '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in loop_messages %}
+    {%- if message.role == "assistant" %}
+        {# Add reasoning content in to content field for unified processing below. #}
+        {%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %}
+            {%- set content = "<think>\n" ~ message.reasoning_content ~ "\n</think>\n\n" ~ (message.content | default('', true)) %}
+        {%- else %}
+            {%- set content = message.content | default('', true) %}
+            {%- if content is string -%}
+                {# Allow downstream logic to to take care of broken thought, only handle coherent reasoning here. #}
+                {%- if '<think>' not in content and '</think>' not in content -%}
+                    {%- set content = "<think>\n</think>\n\n" ~ content -%}
+                {%- endif -%}
+            {%- else -%}
+                {%- set content = content -%}
+            {%- endif -%}
+        {%- endif %}
+        {%- if message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
+            {# Assistant message has tool calls. #}
+            {{- '<|im_start|>assistant\n' }}
+                {%- set include_content = not (truncate_history_thinking and loop.index0 < ns.last_user_idx) %}
+                {%- if content is string and content | trim | length > 0 %}
+                    {%- if include_content %}
+                        {{- (content | trim) ~ '\n\n' -}}
+                    {%- else %}
+                        {%- set c = (content | string) %}
+                        {%- if '</think>' in c %}
+                            {# Keep only content after the last closing think. Also generation prompt causes this. #}
+                            {%- set c = c.split('</think>')[-1] %}
+                        {%- elif '<think>' in c %}
+                            {# If <think> was opened but never closed, drop the trailing think segment #}
+                            {%- set c = c.split('<think>')[0] %}
+                        {%- endif %}
+                        {%- set c = "<think>\n</think>\n\n" ~ c | trim %}
+                        {%- if c | length > 0 %}
+                            {{- c ~ '\n' -}}
+                        {%- endif %}
+                    {%- endif %}
+                {%- else %}
+                    {{- "<think>\n</think>\n\n" -}}
+                {%- endif %}
+                {%- for tool_call in message.tool_calls %}
+                    {%- if tool_call.function is defined %}
+                        {%- set tool_call = tool_call.function %}
+                    {%- endif %}
+                    {{- '<zyphra_tool_call>\n<function=' ~ tool_call.name ~ '>\n' -}}
+                        {%- if tool_call.arguments is defined %}
+                            {%- for args_name, args_value in tool_call.arguments|items %}
+                                {{- '<parameter=' ~ args_name ~ '>\n' -}}
+                                    {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
+                                {{- args_value ~ '\n</parameter>\n' -}}
+                            {%- endfor %}
+                        {%- endif %}
+                    {{- '</function>\n</zyphra_tool_call>\n' -}}
+                {%- endfor %}
+                {{- '<|im_end|>\n' }}
+        {%- else %}
+            {# Assistant message doesn't have tool calls. #}
+            {%- if not (truncate_history_thinking and loop.index0 < ns.last_user_idx) %}
+                {{- '<|im_start|>assistant\n' ~ (content | default('', true) | string | trim) ~ '<|im_end|>\n' }}
+            {%- else %}
+                {%- set c = (content | default('', true) | string) %}
+                {%- if '<think>' in c and '</think>' in c %}
+                    {%- set c = "<think>\n</think>\n\n" ~ (c.split('</think>')[-1] | trim) %}
+                {%- endif %}
+                {%- set c = c | trim %}
+                {%- if c | length > 0 %}
+                    {{- '<|im_start|>assistant\n' ~ c ~ '<|im_end|>\n' }}
+                {%- else %}
+                    {{- '<|im_start|>assistant\n<|im_end|>\n' }}
+                {%- endif %}
+            {%- endif %}
+        {%- endif %}
+    {%- elif message.role == "user" or message.role == "system" %}
+        {{- '<|im_start|>' + message.role + '\n' }}
+        {%- set content = message.content | string %}
+        {{- content }}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.previtem and loop.previtem.role != "tool" %}
+            {{- '<|im_start|>user\n' }}
+        {%- endif %}
+        {{- '<zyphra_tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</zyphra_tool_response>\n' }}
+        {%- if not loop.last and loop.nextitem.role != "tool" %}
+            {{- '<|im_end|>\n' }}
+        {%- elif loop.last %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- else %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {%- if enable_thinking %}
+        {{- '<|im_start|>assistant\n<think>\n' }}
+    {%- else %}
+        {{- '<|im_start|>assistant\n<think>\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,55 @@

+{
+    "activation_func": "swiglu",
+    "activation_func_fp8_input_store": false,
+    "add_bias_linear": false,
+    "architectures": [
+        "ZayaForCausalLM"
+    ],
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "bias_activation_fusion": true,
+    "bos_token_id": 2,
+    "cca": true,
+    "dtype": "bfloat16",
+    "eos_token_id": 106,
+    "ffn_hidden_size": 4096,
+    "gated_linear_unit": true,
+    "head_dim": 128,
+    "hidden_size": 2048,
+    "kv_channels": 128,
+    "lm_head_bias": false,
+    "mamba_cache_dtype": "float32",
+    "max_position_embeddings": 131072,
+    "model_type": "zaya",
+    "moe_router_topk": 1,
+    "norm_epsilon": 1e-05,
+    "normalization": "RMSNorm",
+    "num_attention_heads": 8,
+    "num_experts": 16,
+    "num_hidden_layers": 80,
+    "num_key_value_heads": 2,
+    "num_query_groups": 2,
+    "pad_token_id": 0,
+    "partial_rotary_factor": 0.5,
+    "quantization": {
+        "group_size": 64,
+        "bits": 8,
+        "mode": "affine"
+    },
+    "quantization_config": {
+        "group_size": 64,
+        "bits": 8,
+        "mode": "affine"
+    },
+    "residual_in_fp32": true,
+    "rope_scaling": false,
+    "rope_theta": 5000000,
+    "scale_residual_merge": true,
+    "sliding_window": null,
+    "transformers_version": "4.57.1",
+    "use_cache": true,
+    "vocab_size": 262272,
+    "zaya_mlp_expansion": 256,
+    "zaya_use_eda": true,
+    "zaya_use_mod": true
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 2,
+  "eos_token_id": 106,
+  "pad_token_id": 0,
+  "temperature": 1.0,
+  "top_k": -1,
+  "top_p": 0.95,
+  "transformers_version": "4.57.1"
+}

model-00001-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2387137c04539ff12521b6636657c245a708fcaad26e26d9d09ebefad72c9bc8
+size 5368501086

model-00002-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a35aaeb0a7bcb653d736ca1ea901147b3cb7a8c054f1a1bafe47379cdc5af1fc
+size 4062798556

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:73aeac3336eaf0c25d1a803e7b21ca4e90f974cd17f189b8aee0eb7e958bede5
+size 33385480

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "backend": "tokenizers",
+  "boi_token": "<start_of_image>",
+  "bos_token": "<bos>",
+  "clean_up_tokenization_spaces": false,
+  "eoi_token": "<end_of_image>",
+  "eos_token": "<|im_end|>",
+  "image_token": "<image_soft_token>",
+  "is_local": true,
+  "local_files_only": false,
+  "mask_token": "<mask>",
+  "model_max_length": 1000000000000000019884624838656,
+  "model_specific_special_tokens": {
+    "boi_token": "<start_of_image>",
+    "eoi_token": "<end_of_image>",
+    "image_token": "<image_soft_token>"
+  },
+  "pad_token": "<pad>",
+  "processor_class": "Gemma3Processor",
+  "sp_model_kwargs": null,
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "GemmaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}