Qwen3.5-35B-A3B optimized for MLX. This quant supports image input and requires a vision-enabled MLX server.
For the non-vision model: https://huggingface.co/spicyneuron/Qwen3.5-35B-A3B-MLX-4.8bit
EDIT: Updated chat template to enable better prompt caching.
Usage
# Start server at http://localhost:8080/chat/completions
uvx --from mlx-vlm --with torchvision \
mlx_vlm.server \
--host 127.0.0.1 \
--port 8080 \
--model spicyneuron/Qwen3.5-35B-A3B-MLX-4.9bit-vision
Methodology
Quantized using a custom script inspired by Unsloth/AesSedai/ubergarm style mixed-precision GGUFs. MLX quantization options differ than llama.cpp, but the principles are the same:
- Sensitive layers like MoE routing, attention, and output embeddings get higher precision
- More tolerant layers like MoE experts get lower precision
- Downloads last month
- 477
Model size
6B params
Tensor type
BF16
路
U32 路
F32 路
Hardware compatibility
Log In to add your hardware
4-bit