Qwen3.5-35B-A3B-8bit (MLX)
This repository provides an MLX quantized conversion of Qwen/Qwen3.5-35B-A3B, prepared for local inference on Apple Silicon.
This is an unofficial conversion published by henrybravo. All model capabilities, limitations, and licensing inherit from the original Qwen release.
Model summary
- Base model:
Qwen/Qwen3.5-35B-A3B - Architecture:
Qwen3_5MoeForConditionalGeneration(qwen3_5_moe) - Quantization: affine, group size
64, weights at8-bit - Weight format:
safetensors(8 shards) - Intended runtime:
mlx-vlm/ MLX ecosystem on macOS
Why this variant
This 8-bit build prioritizes output quality and fidelity over speed.
- Typical local memory target used in router config: about
40 GB - Local default tested in this setup:
max_tokens = 16384 - Expect slower generation than the 4-bit variant
Install
pip install -U mlx==0.31.0 mlx-lm==0.31.0 mlx-vlm==0.4.0 transformers==5.3.0
For OpenAI-compatible local serving and routing on top of MLX models, see mlx-router.
Quick usage (mlx-vlm)
You can run the examples below directly with mlx_vlm.generate, or serve the model through mlx-router.
Text prompt
python -m mlx_vlm.generate \
--model /path/to/Qwen3.5-35B-A3B-8bit \
--max-tokens 100 \
--temperature 0.0 \
--prompt "Hello, what model are you?"
Vision prompt
python -m mlx_vlm.generate \
--model /path/to/Qwen3.5-35B-A3B-8bit \
--max-tokens 200 \
--temperature 0.0 \
--prompt "Describe this image in detail." \
--image https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/320px-Camponotus_flavomarginatus_ant.jpg
Conversion command used
mlx_vlm.convert \
--hf-path Qwen/Qwen3.5-35B-A3B \
--mlx-path ~/models/models--henrybravo--Qwen3.5-35B-A3B-8bit \
-q \
--q-bits 8 \
--dtype bfloat16
Notes
- This repository stores MLX-converted weights; it is not a standard Transformers checkpoint.
- For best official guidance, benchmarks, and serving docs, see the upstream Qwen model card.
Upstream references
- Base model: https://huggingface.co/Qwen/Qwen3.5-35B-A3B
- Original license file: https://huggingface.co/Qwen/Qwen3.5-35B-A3B/blob/main/LICENSE
- Qwen blog: https://qwen.ai/blog?id=qwen3.5
- Downloads last month
- 45
Model size
10B params
Tensor type
BF16
路
U32 路
F32 路
Hardware compatibility
Log In to add your hardware
8-bit