Osaurus-AI commited on
Commit
e58d7f7
·
verified ·
1 Parent(s): 2a6dbb0

Initial upload: ZAYA1-VL-8B-MXFP4 from Zyphra/ZAYA1-VL-8B

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: mlx
4
+ base_model: Zyphra/ZAYA1-VL-8B
5
+ base_model_relation: quantized
6
+ pipeline_tag: image-text-to-text
7
+ tags:
8
+ - zaya
9
+ - mixture-of-experts
10
+ - hybrid-attention
11
+ - cca-attention
12
+ - mlx
13
+ - apple-silicon
14
+ - reasoning
15
+ - tool-use
16
+ - quantized
17
+ - vision
18
+ - multimodal
19
+ - image-text-to-text
20
+ - vision-language
21
+ - qwen2_5_vl-vit
22
+ - mxfp4
23
+ - jang
24
+ - osaurus
25
+ quantization_config:
26
+ family: mxfp4
27
+ profile: MXFP4
28
+ group_size: 32
29
+ expert_layout: split_switch_mlp
30
+ ---
31
+
32
+ <p align="center"><img src="osaurus-x-banner.png" width="100%" alt="OsaurusAI"/></p>
33
+
34
+ # ZAYA1-VL-8B-MXFP4
35
+
36
+ Quantized **Zyphra/ZAYA1-VL-8B** for Apple Silicon runtimes.
37
+
38
+ | | |
39
+ |---|---|
40
+ | Source | [Zyphra/ZAYA1-VL-8B](https://huggingface.co/Zyphra/ZAYA1-VL-8B) |
41
+ | License | Apache-2.0, inherited from upstream |
42
+ | Format | MXFP4 |
43
+ | Modality | image+text |
44
+ | Bundle size | 7.12 GiB |
45
+ | Tensor keys | 5315 |
46
+ | Expert layout | Pre-stacked `zaya_block.experts.switch_mlp` |
47
+ | Runtime status | Generation coherence: NOT INDEPENDENTLY PASSED for the quantized runtime bundle (missing coherence report); published as a format/runtime bundle pending downstream ZAYA runtime validation. |
48
+
49
+ ## Important Runtime Note
50
+
51
+ This bundle requires a ZAYA-aware MLX/JANG runtime that implements CCA attention state and the converted pre-stacked expert layout.
52
+
53
+ ZAYA1-VL fuses Zyphra's text-ZAYA decoder (CCA attention + top-1 MoE) with the Qwen2.5-VL vision tower. Vision-LoRA modulates the LM trunk only at vision-token positions; text positions decode unmodified. Use this bundle only with a runtime that implements the ZAYA CCA state contract and the converted pre-stacked expert layout.
54
+
55
+ ## Runtime Pin Required
56
+
57
+ Zyphra `model_type=zaya1_vl` is not yet implemented in `vmlx-swift-lm` or stock `mlx_vlm`. The bundle is **conversion-ready and structurally verified** but image-text decoding requires either:
58
+
59
+ - Zyphra's `transformers @ git+https://github.com/Zyphra/transformers.git@zaya1-vl` fork (BF16 source-side reference), or
60
+ - A `Zaya1VL` MLX adapter (in development at `jang-runtime/Sources/JANG/Zaya1VL/`).
61
+
62
+ Until the MLX adapter ships, treat this bundle as a runtime-pending preview.
63
+
64
+
65
+ ## Architecture Summary
66
+
67
+ - 40 hybrid decoder layers: each layer has CCA attention + top-1 MoE
68
+ - Hidden size 2048, 8 query heads, 2 KV heads, head dim 128
69
+ - 16 routed experts per MoE layer, top-1 routing
70
+ - Vision tower: Qwen2.5-VL ViT (`hidden=1280`, `out=2048`, `patch=14`)
71
+ - Vision-LoRA on the LM trunk: rank-8 attn, rank-32 MLP, gated to vision tokens only
72
+ - Image tokens: `image_token_id=262147`, start=255999, end=256000
73
+ - Context length 32768, `rope_theta=1000000`, partial RoPE (0.5 of head dim)
74
+
75
+ ## Quantization
76
+
77
+ 4-bit affine LM linears + 8-bit embeddings + passthrough vision tower / LoRA / router / CCA state.
78
+
79
+ Passthrough floor for first release prep:
80
+
81
+ - `conv_qk.*`, `temp`, norms, residual scaling, router path, biases, and balancing biases are preserved as float tensors.
82
+ - Embeddings and `lm_head` use 8-bit affine in the prepared bundles.
83
+ - Vision tower (`vision_tower.*`) and all LoRA tensors (`*.lora_*.[01].weight`) are kept in float passthrough.
84
+ - `jangtq_runtime.safetensors` is not applicable to MXFP4.
85
+
86
+ `mxtq_bits`:
87
+
88
+ ```json
89
+ null
90
+ ```
91
+
92
+ ## Bundle Verification
93
+
94
+ - Safetensor headers scanned.
95
+ - Source tensor coverage checked.
96
+ - Converted bundles checked for `local_experts` removal.
97
+ - Converted expert tensors checked for pre-stacked `switch_mlp` layout.
98
+ - JANGTQ sidecars checked for the Swift runtime contract.
99
+ - Vision tower + LoRA tensors verified passthrough; image_token_id, vision_start/end preserved.
100
+ - Runtime coherence status recorded above.
101
+
102
+ ## Runtime Smoke Tests
103
+
104
+ Before production use, run short deterministic prompts through the exact target runtime:
105
+
106
+ - `What is 2+2? Answer with only the number.`
107
+ - `What is the capital of France? Answer with one word.`
108
+ - One chat-template prompt with thinking disabled.
109
+ - One image+text prompt exercising vision-token interleave and the vision-LoRA gate.
110
+
111
+ The first public bundle release records bundle integrity and runtime contract checks. Full generation quality depends on a ZAYA-aware runtime implementation.
112
+
113
+ ## Korean Summary
114
+
115
+ 이 번들은 Zyphra/ZAYA1-VL-8B를 Apple Silicon MLX/JANG 런타임용으로 양자화한 모델입니다. ZAYA의 CCA attention 상태와 MoE 라우팅을 정확히 구현한 런타임에서만 사용해야 합니다. 이미지 입력은 Qwen2.5-VL ViT 경로를 거치며, vision-LoRA는 이미지 토큰 위치에서만 적용됩니다.
116
+
117
+ ## Files
118
+
119
+ - `config.json` carries `weight_format=mxfp4`, `zaya_expert_layout=split_switch_mlp`, and a preserved `vision_config`.
120
+ - `jang_config.json` carries `cache_subtype=zaya_cca`.
121
+ - Tokenizer files and chat template are preserved from the upstream source snapshot.
122
+ - `preprocessor_config.json` (Qwen2VLImageProcessor) is included for image input.
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{% for message in messages %}{% if message['role'] == 'user' %}{% for content in message['content'] | selectattr('type', 'equalto', 'image') %}{{ '<|vision_start|><image><|vision_end|>\\n' }}{% endfor %}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{{ '<|im_start|>' ~ message['role'] ~ '\n' ~ content['text'] ~ '<|im_end|>' ~ '\n' }}{% endfor %}{% elif message['role'] == 'question' %}{{ '<|im_start|>user\\n' }}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{{ content['text'] ~ '<|im_end|>\\n' }}{% endfor %}{% else %}{{ '<|im_start|>' ~ message['role'] ~ '\\n' }}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{{ content['text'] ~ '<|im_end|>' }}{% endfor %}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\\n' }}{% endif %}"
3
+ }
config.json ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_func": "swiglu",
3
+ "activation_func_fp8_input_store": false,
4
+ "add_bias_linear": false,
5
+ "apply_rope_fusion": true,
6
+ "ar_threshold": 1,
7
+ "architectures": [
8
+ "Zaya1VLForConditionalGeneration"
9
+ ],
10
+ "attention_bias": false,
11
+ "bias_activation_fusion": true,
12
+ "bos_token_id": 2,
13
+ "cca": true,
14
+ "clamp_temp": false,
15
+ "eos_token_id": 262143,
16
+ "ffn_hidden_size": 4096,
17
+ "fused_add_norm": false,
18
+ "gated_linear_unit": true,
19
+ "hidden_size": 2048,
20
+ "head_dim": 128,
21
+ "image_token_id": 262147,
22
+ "lm_head_bias": false,
23
+ "lora_rank": 0,
24
+ "max_position_embeddings": 32768,
25
+ "model_type": "zaya1_vl",
26
+ "moe_router_topk": 1,
27
+ "norm_epsilon": 1e-05,
28
+ "normalization": "RMSNorm",
29
+ "num_attention_heads": 8,
30
+ "num_experts": 16,
31
+ "num_hidden_layers": 40,
32
+ "num_key_value_heads": 2,
33
+ "num_query_groups": 2,
34
+ "pad_token_id": 0,
35
+ "padding_side": "right",
36
+ "projector_hidden_act": "gelu",
37
+ "residual_in_fp32": false,
38
+ "rope_pct": 0.5,
39
+ "rotary_base": 1000000,
40
+ "scale_residual_merge": true,
41
+ "sliding_window": null,
42
+ "temporal_patch_size": 1,
43
+ "tie_word_embeddings": true,
44
+ "torch_dtype": "bfloat16",
45
+ "transformers_version": "4.57.1",
46
+ "use_lora_att": false,
47
+ "use_rope_scaling": false,
48
+ "vision_config": {
49
+ "_attn_implementation_autoset": true,
50
+ "hidden_size": 1280,
51
+ "in_chans": 3,
52
+ "model_type": "qwen2_5_vl",
53
+ "out_hidden_size": 2048,
54
+ "spatial_patch_size": 14,
55
+ "temporal_patch_size": 1,
56
+ "tokens_per_second": 2,
57
+ "torch_dtype": "bfloat16"
58
+ },
59
+ "vision_end_token_id": 256000,
60
+ "vision_lora": true,
61
+ "vision_lora_rank_attn": 8,
62
+ "vision_lora_rank_mlp": 32,
63
+ "vision_start_token_id": 255999,
64
+ "vocab_size": 262272,
65
+ "zaya_mlp_expansion": 256,
66
+ "zaya_use_eda": true,
67
+ "zaya_use_mod": true,
68
+ "weight_format": "mxfp4",
69
+ "zaya_expert_layout": "split_switch_mlp",
70
+ "quantization": {
71
+ "bits": 4,
72
+ "group_size": 32,
73
+ "mode": "affine",
74
+ "router_bits": 16,
75
+ "expert_layout": "split_switch_mlp"
76
+ },
77
+ "capabilities": {
78
+ "reasoning_parser": "qwen3",
79
+ "tool_parser": "zaya_xml",
80
+ "think_in_template": true,
81
+ "supports_tools": true,
82
+ "supports_thinking": false,
83
+ "family": "zaya1_vl",
84
+ "modality": "vision",
85
+ "cache_type": "hybrid"
86
+ }
87
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 2,
4
+ "eos_token_id": 262143,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.50.0.dev0"
7
+ }
jang_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": 2,
3
+ "weight_format": "mxfp4",
4
+ "profile": "MXFP4",
5
+ "cache_subtype": "zaya_cca",
6
+ "source_model": {
7
+ "name": "ZAYA1-VL-8B",
8
+ "org": "Zyphra",
9
+ "architecture": "zaya1_vl"
10
+ },
11
+ "has_vision": true,
12
+ "expert_layout": "split_switch_mlp",
13
+ "quantization": {
14
+ "method": "affine",
15
+ "group_size": 32,
16
+ "bits": 4,
17
+ "embed_bits": 8,
18
+ "router_bits": 16
19
+ },
20
+ "capabilities": {
21
+ "reasoning_parser": "qwen3",
22
+ "tool_parser": "zaya_xml",
23
+ "think_in_template": true,
24
+ "supports_tools": true,
25
+ "supports_thinking": false,
26
+ "family": "zaya1_vl",
27
+ "modality": "vision",
28
+ "cache_type": "hybrid"
29
+ }
30
+ }
model-00001-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ef3c7752093861769e4e29ceeb25734ac78acfb0f0f4fd931b9a83b9e27907e
3
+ size 1003545410
model-00002-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:383a4e2bf1abd7f5a00c94fe2da83ef1e364f4f2bd05caa8e47effae58458b37
3
+ size 1015709496
model-00003-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e71759090088ae99618b5b1efd9e1a3c69c27941902ecc47bfb84051d1c480c3
3
+ size 1014385726
model-00004-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5fd7f7cfb8c006317e8fca905a5c03e378529e4840f0930e94e0fc6f99acceac
3
+ size 1006642888
model-00005-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51b00e149c29e3fdeae4738ad1c91427371df0601f7bdb7111950baa1d056b63
3
+ size 1006642944
model-00006-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d2a5b084e5f391fa137d7dc948ee242b3475c775e65197ffbe56669c33eb74a
3
+ size 1006642944
model-00007-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ef47edd1e35b5f5f8f25e1702cbaf00f1badc619585d4aaa44fe25b562bbfda
3
+ size 1006642944
model-00008-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0dca56cbd0beeb36c6531f959c5b46582ad5ffb8b351c29dad811bb218c8475e
3
+ size 553653816
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
osaurus-x-banner.png ADDED
preprocessor_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.48145466,
8
+ 0.4578275,
9
+ 0.40821073
10
+ ],
11
+ "image_processor_type": "Qwen2VLImageProcessor",
12
+ "image_std": [
13
+ 0.26862954,
14
+ 0.26130258,
15
+ 0.27577711
16
+ ],
17
+ "max_pixels": 12845056,
18
+ "merge_size": 2,
19
+ "min_pixels": 3136,
20
+ "patch_size": 14,
21
+ "processor_class": "Zaya1VLProcessor",
22
+ "resample": 3,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "longest_edge": 12845056,
26
+ "shortest_edge": 3136
27
+ },
28
+ "temporal_patch_size": 1
29
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "answer_token": "<|im_start|>",
3
+ "boi_token": "<|vision_start|>",
4
+ "bos_token": {
5
+ "content": "<bos>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false
10
+ },
11
+ "eoi_token": "<|vision_end|>",
12
+ "eos_token": {
13
+ "content": "<|im_end|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false
18
+ },
19
+ "image_token": "<image>",
20
+ "pad_token": {
21
+ "content": "<pad>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false
26
+ },
27
+ "unk_token": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ },
34
+ "video_token": "<video>"
35
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a90181b1298e5d8c2f211f15dca261650d85c1f2a5a3bbfff852a853d21bed8f
3
+ size 33385329
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff