Qwen3.5-35B-A3B optimized for MLX. This quant does not support image input.

For vision support: https://huggingface.co/spicyneuron/Qwen3.5-35B-A3B-MLX-4.9bit-vision

EDIT: Updated chat template to enable better prompt caching.

Usage

# Start server at http://localhost:8080/v1/chat/completions
uvx --from mlx-lm mlx_lm.server \
  --host 127.0.0.1 \
  --port 8080 \
  --model spicyneuron/Qwen3.5-35B-A3B-MLX-4.8bit

Methodology

Quantized with a mlx-lm fork, drawing inspiration from Unsloth/AesSedai/ubergarm style mixed-precision GGUFs. MLX quantization options differ than llama.cpp, but the principles are the same:

  • Sensitive layers like MoE routing, attention, and output embeddings get higher precision
  • More tolerant layers like MoE experts get lower precision
Downloads last month
1,749
Safetensors
Model size
35B params
Tensor type
BF16
U32
F32
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for spicyneuron/Qwen3.5-35B-A3B-MLX-4.8bit

Quantized
(237)
this model