OsaurusAI

ZAYA1-VL-8B-JANGTQ4

Quantized Zyphra/ZAYA1-VL-8B for Apple Silicon runtimes.

Source Zyphra/ZAYA1-VL-8B
License Apache-2.0, inherited from upstream
Format JANGTQ4
Modality image+text
Bundle size 6.29 GiB
Tensor keys 5315
Expert layout Pre-stacked zaya_block.experts.switch_mlp
Runtime status Generation coherence: NOT INDEPENDENTLY PASSED for the quantized runtime bundle (missing coherence report); published as a format/runtime bundle pending downstream ZAYA runtime validation.

Important Runtime Note

This bundle requires a ZAYA-aware JANGTQ runtime that implements CCA attention state plus pre-stacked switch_mlp TurboQuant experts.

ZAYA1-VL fuses Zyphra's text-ZAYA decoder (CCA attention + top-1 MoE) with the Qwen2.5-VL vision tower. Vision-LoRA modulates the LM trunk only at vision-token positions; text positions decode unmodified. Use this bundle only with a runtime that implements the ZAYA CCA state contract and the converted pre-stacked expert layout.

Runtime Pin Required

Zyphra model_type=zaya1_vl is not yet implemented in vmlx-swift-lm or stock mlx_vlm. The bundle is conversion-ready and structurally verified but image-text decoding requires either:

  • Zyphra's transformers @ git+https://github.com/Zyphra/transformers.git@zaya1-vl fork (BF16 source-side reference), or
  • A Zaya1VL MLX adapter (in development at jang-runtime/Sources/JANG/Zaya1VL/).

Until the MLX adapter ships, treat this bundle as a runtime-pending preview.

Architecture Summary

  • 40 hybrid decoder layers: each layer has CCA attention + top-1 MoE
  • Hidden size 2048, 8 query heads, 2 KV heads, head dim 128
  • 16 routed experts per MoE layer, top-1 routing
  • Vision tower: Qwen2.5-VL ViT (hidden=1280, out=2048, patch=14)
  • Vision-LoRA on the LM trunk: rank-8 attn, rank-32 MLP, gated to vision tokens only
  • Image tokens: image_token_id=262147, start=255999, end=256000
  • Context length 32768, rope_theta=1000000, partial RoPE (0.5 of head dim)

Quantization

4-bit MXTQ routed experts + 8-bit affine LM linears + passthrough vision tower / LoRA.

Passthrough floor for first release prep:

  • conv_qk.*, temp, norms, residual scaling, router path, biases, and balancing biases are preserved as float tensors.
  • Embeddings and lm_head use 8-bit affine in the prepared bundles.
  • Vision tower (vision_tower.*) and all LoRA tensors (*.lora_*.[01].weight) are kept in float passthrough.
  • jangtq_runtime.safetensors is included: true.

mxtq_bits:

{
  "routed_expert": 4,
  "attention": 8,
  "router": 16,
  "embed_tokens": 8,
  "lm_head": 8,
  "cca_conv": 16,
  "norms_residual": 16
}

Bundle Verification

  • Safetensor headers scanned.
  • Source tensor coverage checked.
  • Converted bundles checked for local_experts removal.
  • Converted expert tensors checked for pre-stacked switch_mlp layout.
  • JANGTQ sidecars checked for the Swift runtime contract.
  • Vision tower + LoRA tensors verified passthrough; image_token_id, vision_start/end preserved.
  • Runtime coherence status recorded above.

Runtime Smoke Tests

Before production use, run short deterministic prompts through the exact target runtime:

  • What is 2+2? Answer with only the number.
  • What is the capital of France? Answer with one word.
  • One chat-template prompt with thinking disabled.
  • One image+text prompt exercising vision-token interleave and the vision-LoRA gate.

The first public bundle release records bundle integrity and runtime contract checks. Full generation quality depends on a ZAYA-aware runtime implementation.

Korean Summary

이 번들은 Zyphra/ZAYA1-VL-8B를 Apple Silicon MLX/JANG 런타임용으로 양자화한 모델입니다. ZAYA의 CCA attention 상태와 MoE 라우팅을 정확히 구현한 런타임에서만 사용해야 합니다. 이미지 입력은 Qwen2.5-VL ViT 경로를 거치며, vision-LoRA는 이미지 토큰 위치에서만 적용됩니다.

Files

  • config.json carries weight_format=mxtq, zaya_expert_layout=split_switch_mlp, and a preserved vision_config.
  • jang_config.json carries cache_subtype=zaya_cca.
  • Tokenizer files and chat template are preserved from the upstream source snapshot.
  • preprocessor_config.json (Qwen2VLImageProcessor) is included for image input.
Downloads last month
195
Safetensors
Model size
2B params
Tensor type
U32
·
F16
·
U8
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/ZAYA1-VL-8B-JANGTQ4

Quantized
(5)
this model