Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2 MLX MXFP8 + Vision

MXFP8 quantized MLX version with vision support grafted from Qwen/Qwen3.5-9B.

Model Details

Architecture: Qwen 3.5 9B (hybrid linear attention + full attention, 32 layers)
Quantization: MXFP8 (E4M3 with block-level scaling), group_size=32
Size: ~10 GB
Context Length: 262,144 tokens
Vision: Full image and video understanding (27 ViT blocks, kept in bf16)
Tool Use: Native function calling support
Thinking: Chain-of-thought reasoning mode

Vision tower grafted from the base Qwen/Qwen3.5-9B model (out_hidden_size=4096 matches 9B text hidden_size). Kept in bf16 for maximum quality.

Works with LM Studio, MLX, and other MLX-compatible frameworks.

Safetensors

Model size

9B params

Tensor type

U32

BF16

F32

MLX

Hardware compatibility

8-bit

Base model

Finetuned

Adapter

Quantized

(11)

this model