Upload ZAYA1-8B-MXFP4 via jang-tools

Browse files

Files changed (17) hide show

.gitattributes +1 -0
README.md +110 -0
chat_template.jinja +205 -0
config.json +66 -0
generation_config.json +7 -0
jang_config.json +28 -0
model-00001-of-00006.safetensors +3 -0
model-00002-of-00006.safetensors +3 -0
model-00003-of-00006.safetensors +3 -0
model-00004-of-00006.safetensors +3 -0
model-00005-of-00006.safetensors +3 -0
model-00006-of-00006.safetensors +3 -0
model.safetensors.index.json +0 -0
osaurus-x-banner.png +0 -0
special_tokens_map.json +33 -0
tokenizer.json +3 -0
tokenizer_config.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,110 @@

+---
+license: apache-2.0
+library_name: mlx
+base_model: Zyphra/ZAYA1-8B
+base_model_relation: quantized
+pipeline_tag: text-generation
+tags:
+  - zaya
+  - mixture-of-experts
+  - hybrid-attention
+  - cca-attention
+  - mlx
+  - apple-silicon
+  - reasoning
+  - tool-use
+  - quantized
+  - mxfp4
+  - jang
+  - osaurus
+quantization_config:
+  family: mxfp4
+  profile: MXFP4
+  group_size: 32
+  expert_layout: split_switch_mlp
+---
+<p align="center"><img src="osaurus-x-banner.png" width="100%" alt="OsaurusAI"/></p>
+# ZAYA1-8B-MXFP4
+Quantized **Zyphra/ZAYA1-8B** for Apple Silicon runtimes.
+| | |
+|---|---|
+| Source | [Zyphra/ZAYA1-8B](https://huggingface.co/Zyphra/ZAYA1-8B) |
+| License | Apache-2.0, inherited from upstream |
+| Format | MXFP4 |
+| Bundle size | 5.48 GiB |
+| Tensor keys | 1965 |
+| Expert layout | Pre-stacked `zaya_block.experts.switch_mlp` |
+| Runtime status | Generation coherence: NOT INDEPENDENTLY PASSED for the quantized runtime bundle (coherence report did not pass); published as a format/runtime bundle pending downstream ZAYA runtime validation. |
+## Important Runtime Note
+This bundle requires a ZAYA-aware MLX/JANG runtime that implements CCA attention state and the converted pre-stacked expert layout.
+ZAYA is not a stock `mlx_lm` architecture. It alternates CCA attention layers
+and top-1 MoE layers. Use this bundle only with a runtime that implements the
+ZAYA CCA state contract and the converted pre-stacked expert layout.
+## Architecture Summary
+- 80 decoder layers: 40 CCA attention layers and 40 top-1 MoE layers
+- Hidden size 2048, 16 query heads, 2 KV heads, head dim 128
+- CCA state per attention layer: standard KV plus `conv_state [B,1280,2]`
+  and `prev_hs [B,2048]`
+- 16 routed experts per MoE layer, top-1 routing with MOD skip route
+- Context length 131072, `rope_theta=5000000`
+## Quantization
+4-bit affine linears + 8-bit embeddings + passthrough router/CCA state tensors.
+Passthrough floor for first release prep:
+- `conv_qk.*`, `temp`, norms, residual scaling, router path, biases, and
+  balancing biases are preserved as float tensors.
+- Embeddings and `lm_head` use 8-bit affine in the prepared bundles.
+- `jangtq_runtime.safetensors` is not applicable to MXFP4.
+`mxtq_bits`:
+```json
+null
+```
+## Bundle Verification
+- Safetensor headers scanned.
+- Source tensor coverage checked.
+- Converted bundles checked for `local_experts` removal.
+- Converted expert tensors checked for pre-stacked `switch_mlp` layout.
+- JANGTQ sidecars checked for the Swift runtime contract.
+- Runtime coherence status recorded above.
+## Runtime Smoke Tests
+Before production use, run short deterministic prompts through the exact target
+runtime:
+- `What is 2+2? Answer with only the number.`
+- `What is the capital of France? Answer with one word.`
+- One chat-template prompt with thinking disabled.
+- One chat-template prompt with thinking enabled and enough output budget for
+  the final answer.
+The first public bundle release records bundle integrity and runtime contract
+checks. Full generation quality depends on a ZAYA-aware runtime implementation.
+## Korean Summary
+이 번들은 Zyphra/ZAYA1-8B를 Apple Silicon MLX/JANG 런타임용으로 양자화한 모델입니다. ZAYA의 CCA attention 상태와 MoE 라우팅을 정확히 구현한 런타임에서만 사용해야 합니다.
+## Files
+- `config.json` carries `weight_format=mxfp4` and
+  `zaya_expert_layout=split_switch_mlp`.
+- `jang_config.json` carries `cache_subtype=zaya_cca`.
+- Tokenizer files and `chat_template.jinja` are preserved from the upstream
+  source snapshot.

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,205 @@

+{% macro render_extra_keys(json_dict, handled_keys) %}
+    {%- if json_dict is mapping %}
+        {%- for json_key in json_dict if json_key not in handled_keys %}
+            {%- if json_dict[json_key] is mapping or (json_dict[json_key] is sequence and json_dict[json_key] is not string) %}
+                {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' }}
+            {%- else %}
+                {{-'\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
+            {%- endif %}
+        {%- endfor %}
+    {%- endif %}
+{% endmacro %}
+{%- set enable_thinking = enable_thinking if enable_thinking is defined else True %}
+{# TODO: set truncate to true for deployment & agent evals. Keep on for SFT. #}
+{%- set truncate_history_thinking = truncate_history_thinking if truncate_history_thinking is defined else False %}
+{{- bos_token }}
+{%- set ns = namespace(last_user_idx = -1) %}
+{%- set loop_messages = messages %}
+{%- for m in loop_messages %}
+  {%- if m["role"] == "user" %}
+    {%- set ns.last_user_idx = loop.index0 %}
+  {%- endif %}
+{%- endfor %}
+{%- if messages[0]["role"] == "system" %}
+    {%- set system_message = messages[0]["content"] %}
+    {%- set loop_messages = messages[1:] %}
+{%- else %}
+    {%- set system_message = "" %}
+    {%- set loop_messages = messages %}
+{%- endif %}
+{%- if not tools is defined %}
+    {%- set tools = [] %}
+{%- endif %}
+{# Recompute last_user_idx relative to loop_messages after handling system #}
+{%- set ns = namespace(last_user_idx = -1) %}
+{%- for m in loop_messages %}
+  {%- if m["role"] == "user" %}
+    {%- set ns.last_user_idx = loop.index0 %}
+  {%- endif %}
+{%- endfor %}
+{%- if system_message is defined %}
+    {{- "<|im_start|>system\n" + system_message }}
+{%- else %}
+    {%- if tools is iterable and tools | length > 0 %}
+        {{- "<|im_start|>system\n" }}
+    {%- endif %}
+{%- endif %}
+{%- if tools is iterable and tools | length > 0 %}
+    {%- if system_message is defined and system_message | length > 0 %}
+        {{- "\n\n" }}
+    {%- endif %}
+    {{- "# Tools\n\nYou have access to the following functions:\n\n" }}
+    {{- "<tools>" }}
+    {%- for tool in tools %}
+        {%- if tool.function is defined %}
+            {%- set tool = tool.function %}
+        {%- endif %}
+        {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
+        {%- if tool.description is defined %}
+            {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
+        {%- endif %}
+        {{- '\n<parameters>' }}
+        {%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
+            {%- for param_name, param_fields in tool.parameters.properties|items %}
+                {{- '\n<parameter>' }}
+                {{- '\n<name>' ~ param_name ~ '</name>' }}
+                {%- if param_fields.type is defined %}
+                    {{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
+                {%- endif %}
+                {%- if param_fields.description is defined %}
+                    {{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
+                {%- endif %}
+                {%- if param_fields.enum is defined %}
+                    {{- '\n<enum>' ~ (param_fields.enum | tojson | safe) ~ '</enum>' }}
+                {%- endif %}
+                {%- set handled_keys = ['name', 'type', 'description', 'enum'] %}
+                {{- render_extra_keys(param_fields, handled_keys) }}
+                {{- '\n</parameter>' }}
+            {%- endfor %}
+        {%- endif %}
+        {% set handled_keys = ['type', 'properties', 'required'] %}
+        {{- render_extra_keys(tool.parameters, handled_keys) }}
+        {%- if tool.parameters is defined and tool.parameters.required is defined %}
+            {{- '\n<required>' ~ (tool.parameters.required | tojson | safe) ~ '</required>' }}
+        {%- endif %}
+        {{- '\n</parameters>' }}
+        {%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
+        {{- render_extra_keys(tool, handled_keys) }}
+        {{- '\n</function>' }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<zyphra_tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</zyphra_tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <zyphra_tool_call></zyphra_tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
+{%- endif %}
+{%- if system_message is defined %}
+    {{- '<|im_end|>\n' }}
+{%- else %}
+    {%- if tools is iterable and tools | length > 0 %}
+        {{- '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in loop_messages %}
+    {%- if message.role == "assistant" %}
+        {# Add reasoning content in to content field for unified processing below. #}
+        {%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %}
+            {%- set content = "<think>\n" ~ message.reasoning_content ~ "\n</think>\n\n" ~ (message.content | default('', true)) %}
+        {%- else %}
+            {%- set content = message.content | default('', true) %}
+            {%- if content is string -%}
+                {# Allow downstream logic to to take care of broken thought, only handle coherent reasoning here. #}
+                {%- if '<think>' not in content and '</think>' not in content -%}
+                    {%- set content = "<think>\n</think>\n\n" ~ content -%}
+                {%- endif -%}
+            {%- else -%}
+                {%- set content = content -%}
+            {%- endif -%}
+        {%- endif %}
+        {%- if message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
+            {# Assistant message has tool calls. #}
+            {{- '<|im_start|>assistant\n' }}
+                {%- set include_content = not (truncate_history_thinking and loop.index0 < ns.last_user_idx) %}
+                {%- if content is string and content | trim | length > 0 %}
+                    {%- if include_content %}
+                        {{- (content | trim) ~ '\n\n' -}}
+                    {%- else %}
+                        {%- set c = (content | string) %}
+                        {%- if '</think>' in c %}
+                            {# Keep only content after the last closing think. Also generation prompt causes this. #}
+                            {%- set c = c.split('</think>')[-1] %}
+                        {%- elif '<think>' in c %}
+                            {# If <think> was opened but never closed, drop the trailing think segment #}
+                            {%- set c = c.split('<think>')[0] %}
+                        {%- endif %}
+                        {%- set c = "<think>\n</think>\n\n" ~ c | trim %}
+                        {%- if c | length > 0 %}
+                            {{- c ~ '\n' -}}
+                        {%- endif %}
+                    {%- endif %}
+                {%- else %}
+                    {{- "<think>\n</think>\n\n" -}}
+                {%- endif %}
+                {%- for tool_call in message.tool_calls %}
+                    {%- if tool_call.function is defined %}
+                        {%- set tool_call = tool_call.function %}
+                    {%- endif %}
+                    {{- '<zyphra_tool_call>\n<function=' ~ tool_call.name ~ '>\n' -}}
+                        {%- if tool_call.arguments is defined %}
+                            {%- for args_name, args_value in tool_call.arguments|items %}
+                                {{- '<parameter=' ~ args_name ~ '>\n' -}}
+                                    {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
+                                {{- args_value ~ '\n</parameter>\n' -}}
+                            {%- endfor %}
+                        {%- endif %}
+                    {{- '</function>\n</zyphra_tool_call>\n' -}}
+                {%- endfor %}
+                {{- '<|im_end|>\n' }}
+        {%- else %}
+            {# Assistant message doesn't have tool calls. #}
+            {%- if not (truncate_history_thinking and loop.index0 < ns.last_user_idx) %}
+                {{- '<|im_start|>assistant\n' ~ (content | default('', true) | string | trim) ~ '<|im_end|>\n' }}
+            {%- else %}
+                {%- set c = (content | default('', true) | string) %}
+                {%- if '<think>' in c and '</think>' in c %}
+                    {%- set c = "<think>\n</think>\n\n" ~ (c.split('</think>')[-1] | trim) %}
+                {%- endif %}
+                {%- set c = c | trim %}
+                {%- if c | length > 0 %}
+                    {{- '<|im_start|>assistant\n' ~ c ~ '<|im_end|>\n' }}
+                {%- else %}
+                    {{- '<|im_start|>assistant\n<|im_end|>\n' }}
+                {%- endif %}
+            {%- endif %}
+        {%- endif %}
+    {%- elif message.role == "user" or message.role == "system" %}
+        {{- '<|im_start|>' + message.role + '\n' }}
+        {%- set content = message.content | string %}
+        {{- content }}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.previtem and loop.previtem.role != "tool" %}
+            {{- '<|im_start|>user\n' }}
+        {%- endif %}
+        {{- '<zyphra_tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</zyphra_tool_response>\n' }}
+        {%- if not loop.last and loop.nextitem.role != "tool" %}
+            {{- '<|im_end|>\n' }}
+        {%- elif loop.last %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- else %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {%- if enable_thinking %}
+        {{- '<|im_start|>assistant\n<think>\n' }}
+    {%- else %}
+        {{- '<|im_start|>assistant\n<think>\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,66 @@

+{
+  "activation_func": "swiglu",
+  "activation_func_fp8_input_store": false,
+  "add_bias_linear": false,
+  "architectures": [
+    "ZayaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bias_activation_fusion": true,
+  "bos_token_id": 2,
+  "cca": true,
+  "cca_num_q_heads": 8,
+  "dtype": "bfloat16",
+  "eos_token_id": 106,
+  "ffn_hidden_size": 4096,
+  "gated_linear_unit": true,
+  "hidden_size": 2048,
+  "kv_channels": 128,
+  "lm_head_bias": false,
+  "mamba_cache_dtype": "float32",
+  "max_position_embeddings": 131072,
+  "model_type": "zaya",
+  "moe_router_topk": 1,
+  "norm_epsilon": 1e-05,
+  "normalization": "RMSNorm",
+  "num_attention_heads": 16,
+  "num_experts": 16,
+  "num_hidden_layers": 80,
+  "num_key_value_heads": 2,
+  "num_query_groups": 2,
+  "pad_token_id": 0,
+  "partial_rotary_factor": 0.5,
+  "residual_in_fp32": true,
+  "rope_scaling": false,
+  "rope_theta": 5000000,
+  "scale_residual_merge": true,
+  "sliding_window": null,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "vocab_size": 262272,
+  "zaya_mlp_expansion": 256,
+  "zaya_use_eda": true,
+  "zaya_use_mod": true,
+  "weight_format": "mxfp4",
+  "zaya_expert_layout": "split_switch_mlp",
+  "tie_word_embeddings": true,
+  "quantization": {
+    "bits": 4,
+    "group_size": 32,
+    "mode": "affine",
+    "embed_bits": 8,
+    "router_bits": 16,
+    "expert_layout": "split_switch_mlp"
+  },
+  "capabilities": {
+    "reasoning_parser": "qwen3",
+    "tool_parser": "zaya_xml",
+    "think_in_template": true,
+    "supports_tools": true,
+    "supports_thinking": true,
+    "family": "zaya",
+    "modality": "text",
+    "cache_type": "hybrid"
+  }
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 2,
+  "eos_token_id": 106,
+  "pad_token_id": 0,
+  "transformers_version": "4.57.1"
+}

jang_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "version": 2,
+  "weight_format": "mxfp4",
+  "profile": "MXFP4",
+  "cache_subtype": "zaya_cca",
+  "source_model": {
+    "name": "ZAYA1-8B",
+    "org": "Zyphra",
+    "architecture": "zaya"
+  },
+  "expert_layout": "split_switch_mlp",
+  "quantization": {
+    "method": "affine",
+    "group_size": 32,
+    "bits": 4,
+    "embed_bits": 8
+  },
+  "capabilities": {
+    "reasoning_parser": "qwen3",
+    "tool_parser": "zaya_xml",
+    "think_in_template": true,
+    "supports_tools": true,
+    "supports_thinking": true,
+    "family": "zaya",
+    "modality": "text",
+    "cache_type": "hybrid"
+  }
+}

model-00001-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2a6c1353ba873dc0e90ef202a26243b11b9f034f6cb6b07bbb37afb99f4437c
+size 1018018312

model-00002-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3720d5fb64d76525cf3c25a6c3cb4ae0edabc3585fd21877472ddf9d60137008
+size 1006642912

model-00003-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02084bebef2348c97b0085bad90ee4e9fcfa53bad66511624c1c9584129b6a73
+size 1006642944

model-00004-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:af7aa7765edef974db93ae47313db89722e71a0014f4c5e37c0d422ed3b74ba4
+size 1006642944

model-00005-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b1ef2438c86cac955f9b8c0069a18dc485f36500c24074c2de0d5578a7894199
+size 1006642944

model-00006-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1c5c7ea17cf64d3aec7a3e22ded4c9ee065b6feaf1551145d65e6fb8ce06f9e
+size 805314552

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

osaurus-x-banner.png ADDED Viewed

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "boi_token": "<start_of_image>",
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eoi_token": "<end_of_image>",
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<image_soft_token>",
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:29299fe0a5dfb9f41cacc71436a714a412e1163858fbf7085b84adbf9544133a
+size 33385481

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff