Qwen3.5-35B-A3B-8bit (MLX)

This repository provides an MLX quantized conversion of Qwen/Qwen3.5-35B-A3B, prepared for local inference on Apple Silicon.

This is an unofficial conversion published by henrybravo. All model capabilities, limitations, and licensing inherit from the original Qwen release.

Model summary

Base model: Qwen/Qwen3.5-35B-A3B
Architecture: Qwen3_5MoeForConditionalGeneration (qwen3_5_moe)
Quantization: affine, group size 64, weights at 8-bit
Weight format: safetensors (8 shards)
Intended runtime: mlx-vlm / MLX ecosystem on macOS

Why this variant

This 8-bit build prioritizes output quality and fidelity over speed.

Typical local memory target used in router config: about 40 GB
Local default tested in this setup: max_tokens = 16384
Expect slower generation than the 4-bit variant

Install

pip install -U mlx==0.31.0 mlx-lm==0.31.0 mlx-vlm==0.4.0 transformers==5.3.0

For OpenAI-compatible local serving and routing on top of MLX models, see mlx-router.

Quick usage (mlx-vlm)

You can run the examples below directly with mlx_vlm.generate, or serve the model through mlx-router.

Text prompt

python -m mlx_vlm.generate \
  --model /path/to/Qwen3.5-35B-A3B-8bit \
  --max-tokens 100 \
  --temperature 0.0 \
  --prompt "Hello, what model are you?"

Vision prompt

python -m mlx_vlm.generate \
  --model /path/to/Qwen3.5-35B-A3B-8bit \
  --max-tokens 200 \
  --temperature 0.0 \
  --prompt "Describe this image in detail." \
  --image https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/320px-Camponotus_flavomarginatus_ant.jpg

Conversion command used

mlx_vlm.convert \
  --hf-path Qwen/Qwen3.5-35B-A3B \
  --mlx-path ~/models/models--henrybravo--Qwen3.5-35B-A3B-8bit \
  -q \
  --q-bits 8 \
  --dtype bfloat16

Notes

This repository stores MLX-converted weights; it is not a standard Transformers checkpoint.
For best official guidance, benchmarks, and serving docs, see the upstream Qwen model card.

Upstream references

Base model: https://huggingface.co/Qwen/Qwen3.5-35B-A3B
Original license file: https://huggingface.co/Qwen/Qwen3.5-35B-A3B/blob/main/LICENSE
Qwen blog: https://qwen.ai/blog?id=qwen3.5

Downloads last month: 45

Safetensors

Model size

10B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

8-bit

Model tree for henrybravo/Qwen3.5-35B-A3B-8bit

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Quantized

(242)

this model