Qwen3.5-35B-A3B-3bit — MLX VLM

3-bit quantized Qwen3.5-35B-A3B in MLX format with full vision support for Apple Silicon.

Converted with mlx-vlm to preserve the complete multimodal architecture including vision tower weights.

Model Details

Property	Value
Base Model	Qwen/Qwen3.5-35B-A3B
Architecture	Qwen3.5 MoE (Mixture of Experts)
Total Parameters	35B (3B active per token)
Experts	256 total, 8 active per token
Hidden Layers	40
Quantization	3-bit affine, group size 64
Avg Bits/Weight	3.659
Model Size	~15 GB
Peak Memory	~16.5 GB
Vision Tower	27 layers, 1152 hidden size, 16 attention heads
Total Weights	2090 (1757 language + 333 vision)
Pipeline	image-text-to-text
Conversion Tool	mlx-vlm v0.3.12

Capabilities

Text generation with thinking/reasoning
Image understanding and description
Visual question answering

Usage

Install

pip install mlx-vlm

Generate with Image

python -m mlx_vlm.generate \
  --model andrevp/Qwen3.5-35B-A3B-3bit \
  --max-tokens 512 \
  --temperature 0.6 \
  --prompt "Describe this image in detail." \
  --image path/to/image.png

Generate Text Only

python -m mlx_vlm.generate \
  --model andrevp/Qwen3.5-35B-A3B-3bit \
  --max-tokens 512 \
  --temperature 0.6 \
  --prompt "Explain quantum computing."

Python API

from mlx_vlm import load, generate

model, processor = load("andrevp/Qwen3.5-35B-A3B-3bit")

# With image
output = generate(
    model,
    processor,
    prompt="Describe this image.",
    image="path/to/image.png",
    max_tokens=512,
)
print(output)

# Text only
output = generate(
    model,
    processor,
    prompt="Explain quantum computing.",
    max_tokens=512,
)
print(output)

Performance

Tested on Apple Silicon (M-series):

Generation speed: ~80 tokens/sec
Peak memory: ~16.5 GB

Note

The Qwen3.5 video processor requires torchvision. For image-only usage, mlx-vlm works without PyTorch. If you need video input support, install torch and torchvision.

Downloads last month: 567

Safetensors

Model size

5B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

3-bit

Model tree for andrevp/Qwen3.5-35B-A3B-MLX-VLM-3bit

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Quantized

(243)

this model