Image-Text-to-Text
MLX
Safetensors
zaya1_vl
zaya
mixture-of-experts
hybrid-attention
cca-attention
apple-silicon
reasoning
tool-use
quantized
vision
multimodal
vision-language
qwen2_5_vl-vit
mxfp4
jang
osaurus
conversational
Instructions to use OsaurusAI/ZAYA1-VL-8B-MXFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OsaurusAI/ZAYA1-VL-8B-MXFP4 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("OsaurusAI/ZAYA1-VL-8B-MXFP4") config = load_config("OsaurusAI/ZAYA1-VL-8B-MXFP4") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
File size: 4,885 Bytes
e58d7f7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | ---
license: apache-2.0
library_name: mlx
base_model: Zyphra/ZAYA1-VL-8B
base_model_relation: quantized
pipeline_tag: image-text-to-text
tags:
- zaya
- mixture-of-experts
- hybrid-attention
- cca-attention
- mlx
- apple-silicon
- reasoning
- tool-use
- quantized
- vision
- multimodal
- image-text-to-text
- vision-language
- qwen2_5_vl-vit
- mxfp4
- jang
- osaurus
quantization_config:
family: mxfp4
profile: MXFP4
group_size: 32
expert_layout: split_switch_mlp
---
<p align="center"><img src="osaurus-x-banner.png" width="100%" alt="OsaurusAI"/></p>
# ZAYA1-VL-8B-MXFP4
Quantized **Zyphra/ZAYA1-VL-8B** for Apple Silicon runtimes.
| | |
|---|---|
| Source | [Zyphra/ZAYA1-VL-8B](https://huggingface.co/Zyphra/ZAYA1-VL-8B) |
| License | Apache-2.0, inherited from upstream |
| Format | MXFP4 |
| Modality | image+text |
| Bundle size | 7.12 GiB |
| Tensor keys | 5315 |
| Expert layout | Pre-stacked `zaya_block.experts.switch_mlp` |
| Runtime status | Generation coherence: NOT INDEPENDENTLY PASSED for the quantized runtime bundle (missing coherence report); published as a format/runtime bundle pending downstream ZAYA runtime validation. |
## Important Runtime Note
This bundle requires a ZAYA-aware MLX/JANG runtime that implements CCA attention state and the converted pre-stacked expert layout.
ZAYA1-VL fuses Zyphra's text-ZAYA decoder (CCA attention + top-1 MoE) with the Qwen2.5-VL vision tower. Vision-LoRA modulates the LM trunk only at vision-token positions; text positions decode unmodified. Use this bundle only with a runtime that implements the ZAYA CCA state contract and the converted pre-stacked expert layout.
## Runtime Pin Required
Zyphra `model_type=zaya1_vl` is not yet implemented in `vmlx-swift-lm` or stock `mlx_vlm`. The bundle is **conversion-ready and structurally verified** but image-text decoding requires either:
- Zyphra's `transformers @ git+https://github.com/Zyphra/transformers.git@zaya1-vl` fork (BF16 source-side reference), or
- A `Zaya1VL` MLX adapter (in development at `jang-runtime/Sources/JANG/Zaya1VL/`).
Until the MLX adapter ships, treat this bundle as a runtime-pending preview.
## Architecture Summary
- 40 hybrid decoder layers: each layer has CCA attention + top-1 MoE
- Hidden size 2048, 8 query heads, 2 KV heads, head dim 128
- 16 routed experts per MoE layer, top-1 routing
- Vision tower: Qwen2.5-VL ViT (`hidden=1280`, `out=2048`, `patch=14`)
- Vision-LoRA on the LM trunk: rank-8 attn, rank-32 MLP, gated to vision tokens only
- Image tokens: `image_token_id=262147`, start=255999, end=256000
- Context length 32768, `rope_theta=1000000`, partial RoPE (0.5 of head dim)
## Quantization
4-bit affine LM linears + 8-bit embeddings + passthrough vision tower / LoRA / router / CCA state.
Passthrough floor for first release prep:
- `conv_qk.*`, `temp`, norms, residual scaling, router path, biases, and balancing biases are preserved as float tensors.
- Embeddings and `lm_head` use 8-bit affine in the prepared bundles.
- Vision tower (`vision_tower.*`) and all LoRA tensors (`*.lora_*.[01].weight`) are kept in float passthrough.
- `jangtq_runtime.safetensors` is not applicable to MXFP4.
`mxtq_bits`:
```json
null
```
## Bundle Verification
- Safetensor headers scanned.
- Source tensor coverage checked.
- Converted bundles checked for `local_experts` removal.
- Converted expert tensors checked for pre-stacked `switch_mlp` layout.
- JANGTQ sidecars checked for the Swift runtime contract.
- Vision tower + LoRA tensors verified passthrough; image_token_id, vision_start/end preserved.
- Runtime coherence status recorded above.
## Runtime Smoke Tests
Before production use, run short deterministic prompts through the exact target runtime:
- `What is 2+2? Answer with only the number.`
- `What is the capital of France? Answer with one word.`
- One chat-template prompt with thinking disabled.
- One image+text prompt exercising vision-token interleave and the vision-LoRA gate.
The first public bundle release records bundle integrity and runtime contract checks. Full generation quality depends on a ZAYA-aware runtime implementation.
## Korean Summary
이 번들은 Zyphra/ZAYA1-VL-8B를 Apple Silicon MLX/JANG 런타임용으로 양자화한 모델입니다. ZAYA의 CCA attention 상태와 MoE 라우팅을 정확히 구현한 런타임에서만 사용해야 합니다. 이미지 입력은 Qwen2.5-VL ViT 경로를 거치며, vision-LoRA는 이미지 토큰 위치에서만 적용됩니다.
## Files
- `config.json` carries `weight_format=mxfp4`, `zaya_expert_layout=split_switch_mlp`, and a preserved `vision_config`.
- `jang_config.json` carries `cache_subtype=zaya_cca`.
- Tokenizer files and chat template are preserved from the upstream source snapshot.
- `preprocessor_config.json` (Qwen2VLImageProcessor) is included for image input.
|