Qwen3.5-35B-A3B optimized for MLX. This quant does not support image input.
For vision support: https://huggingface.co/spicyneuron/Qwen3.5-35B-A3B-MLX-4.9bit-vision
EDIT: Updated chat template to enable better prompt caching.
Usage
# Start server at http://localhost:8080/v1/chat/completions
uvx --from mlx-lm mlx_lm.server \
--host 127.0.0.1 \
--port 8080 \
--model spicyneuron/Qwen3.5-35B-A3B-MLX-4.8bit
Methodology
Quantized with a mlx-lm fork, drawing inspiration from Unsloth/AesSedai/ubergarm style mixed-precision GGUFs. MLX quantization options differ than llama.cpp, but the principles are the same:
- Sensitive layers like MoE routing, attention, and output embeddings get higher precision
- More tolerant layers like MoE experts get lower precision
- Downloads last month
- 1,749
Model size
35B params
Tensor type
BF16
路
U32 路
F32 路
Hardware compatibility
Log In to add your hardware
4-bit