Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2 MLX MXFP8 + Vision

MXFP8 quantized MLX version with vision support grafted from Qwen/Qwen3.5-9B.

Model Details

  • Architecture: Qwen 3.5 9B (hybrid linear attention + full attention, 32 layers)
  • Quantization: MXFP8 (E4M3 with block-level scaling), group_size=32
  • Size: ~10 GB
  • Context Length: 262,144 tokens
  • Vision: Full image and video understanding (27 ViT blocks, kept in bf16)
  • Tool Use: Native function calling support
  • Thinking: Chain-of-thought reasoning mode

Vision

Vision tower grafted from the base Qwen/Qwen3.5-9B model (out_hidden_size=4096 matches 9B text hidden_size). Kept in bf16 for maximum quality.

Usage

Works with LM Studio, MLX, and other MLX-compatible frameworks.

Downloads last month
1,014
Safetensors
Model size
9B params
Tensor type
U8
U32
BF16
F32
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for AITRADER/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-mlx-mxfp8