Qwen3.5-35B-A3B-8bit (MLX)

This repository provides an MLX quantized conversion of Qwen/Qwen3.5-35B-A3B, prepared for local inference on Apple Silicon.

This is an unofficial conversion published by henrybravo. All model capabilities, limitations, and licensing inherit from the original Qwen release.

Model summary

  • Base model: Qwen/Qwen3.5-35B-A3B
  • Architecture: Qwen3_5MoeForConditionalGeneration (qwen3_5_moe)
  • Quantization: affine, group size 64, weights at 8-bit
  • Weight format: safetensors (8 shards)
  • Intended runtime: mlx-vlm / MLX ecosystem on macOS

Why this variant

This 8-bit build prioritizes output quality and fidelity over speed.

  • Typical local memory target used in router config: about 40 GB
  • Local default tested in this setup: max_tokens = 16384
  • Expect slower generation than the 4-bit variant

Install

pip install -U mlx==0.31.0 mlx-lm==0.31.0 mlx-vlm==0.4.0 transformers==5.3.0

For OpenAI-compatible local serving and routing on top of MLX models, see mlx-router.

Quick usage (mlx-vlm)

You can run the examples below directly with mlx_vlm.generate, or serve the model through mlx-router.

Text prompt

python -m mlx_vlm.generate \
  --model /path/to/Qwen3.5-35B-A3B-8bit \
  --max-tokens 100 \
  --temperature 0.0 \
  --prompt "Hello, what model are you?"

Vision prompt

python -m mlx_vlm.generate \
  --model /path/to/Qwen3.5-35B-A3B-8bit \
  --max-tokens 200 \
  --temperature 0.0 \
  --prompt "Describe this image in detail." \
  --image https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/320px-Camponotus_flavomarginatus_ant.jpg

Conversion command used

mlx_vlm.convert \
  --hf-path Qwen/Qwen3.5-35B-A3B \
  --mlx-path ~/models/models--henrybravo--Qwen3.5-35B-A3B-8bit \
  -q \
  --q-bits 8 \
  --dtype bfloat16

Notes

  • This repository stores MLX-converted weights; it is not a standard Transformers checkpoint.
  • For best official guidance, benchmarks, and serving docs, see the upstream Qwen model card.

Upstream references

Downloads last month
45
Safetensors
Model size
10B params
Tensor type
BF16
U32
F32
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for henrybravo/Qwen3.5-35B-A3B-8bit

Quantized
(242)
this model