Qwen3.5-35B-A3B-3bit โ€” MLX VLM

3-bit quantized Qwen3.5-35B-A3B in MLX format with full vision support for Apple Silicon.

Converted with mlx-vlm to preserve the complete multimodal architecture including vision tower weights.

Model Details

Property Value
Base Model Qwen/Qwen3.5-35B-A3B
Architecture Qwen3.5 MoE (Mixture of Experts)
Total Parameters 35B (3B active per token)
Experts 256 total, 8 active per token
Hidden Layers 40
Quantization 3-bit affine, group size 64
Avg Bits/Weight 3.659
Model Size ~15 GB
Peak Memory ~16.5 GB
Vision Tower 27 layers, 1152 hidden size, 16 attention heads
Total Weights 2090 (1757 language + 333 vision)
Pipeline image-text-to-text
Conversion Tool mlx-vlm v0.3.12

Capabilities

  • Text generation with thinking/reasoning
  • Image understanding and description
  • Visual question answering

Usage

Install

pip install mlx-vlm

Generate with Image

python -m mlx_vlm.generate \
  --model andrevp/Qwen3.5-35B-A3B-3bit \
  --max-tokens 512 \
  --temperature 0.6 \
  --prompt "Describe this image in detail." \
  --image path/to/image.png

Generate Text Only

python -m mlx_vlm.generate \
  --model andrevp/Qwen3.5-35B-A3B-3bit \
  --max-tokens 512 \
  --temperature 0.6 \
  --prompt "Explain quantum computing."

Python API

from mlx_vlm import load, generate

model, processor = load("andrevp/Qwen3.5-35B-A3B-3bit")

# With image
output = generate(
    model,
    processor,
    prompt="Describe this image.",
    image="path/to/image.png",
    max_tokens=512,
)
print(output)

# Text only
output = generate(
    model,
    processor,
    prompt="Explain quantum computing.",
    max_tokens=512,
)
print(output)

Performance

Tested on Apple Silicon (M-series):

  • Generation speed: ~80 tokens/sec
  • Peak memory: ~16.5 GB

Note

The Qwen3.5 video processor requires torchvision. For image-only usage, mlx-vlm works without PyTorch. If you need video input support, install torch and torchvision.

Downloads last month
567
Safetensors
Model size
5B params
Tensor type
BF16
ยท
U32
ยท
F32
ยท
MLX
Hardware compatibility
Log In to add your hardware

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for andrevp/Qwen3.5-35B-A3B-MLX-VLM-3bit

Quantized
(243)
this model