Qwen3.5-35B-A3B-3bit โ MLX VLM
3-bit quantized Qwen3.5-35B-A3B in MLX format with full vision support for Apple Silicon.
Converted with mlx-vlm to preserve the complete multimodal architecture including vision tower weights.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3.5-35B-A3B |
| Architecture | Qwen3.5 MoE (Mixture of Experts) |
| Total Parameters | 35B (3B active per token) |
| Experts | 256 total, 8 active per token |
| Hidden Layers | 40 |
| Quantization | 3-bit affine, group size 64 |
| Avg Bits/Weight | 3.659 |
| Model Size | ~15 GB |
| Peak Memory | ~16.5 GB |
| Vision Tower | 27 layers, 1152 hidden size, 16 attention heads |
| Total Weights | 2090 (1757 language + 333 vision) |
| Pipeline | image-text-to-text |
| Conversion Tool | mlx-vlm v0.3.12 |
Capabilities
- Text generation with thinking/reasoning
- Image understanding and description
- Visual question answering
Usage
Install
pip install mlx-vlm
Generate with Image
python -m mlx_vlm.generate \
--model andrevp/Qwen3.5-35B-A3B-3bit \
--max-tokens 512 \
--temperature 0.6 \
--prompt "Describe this image in detail." \
--image path/to/image.png
Generate Text Only
python -m mlx_vlm.generate \
--model andrevp/Qwen3.5-35B-A3B-3bit \
--max-tokens 512 \
--temperature 0.6 \
--prompt "Explain quantum computing."
Python API
from mlx_vlm import load, generate
model, processor = load("andrevp/Qwen3.5-35B-A3B-3bit")
# With image
output = generate(
model,
processor,
prompt="Describe this image.",
image="path/to/image.png",
max_tokens=512,
)
print(output)
# Text only
output = generate(
model,
processor,
prompt="Explain quantum computing.",
max_tokens=512,
)
print(output)
Performance
Tested on Apple Silicon (M-series):
- Generation speed: ~80 tokens/sec
- Peak memory: ~16.5 GB
Note
The Qwen3.5 video processor requires torchvision. For image-only usage, mlx-vlm works without PyTorch. If you need video input support, install torch and torchvision.
- Downloads last month
- 567
Model size
5B params
Tensor type
BF16
ยท
U32 ยท
F32 ยท
Hardware compatibility
Log In to add your hardware
3-bit