Qwen3.5-122B-A10B optimized for MLX. This quant supports image input and requires a vision-enabled MLX server.
For the non-vision model: https://huggingface.co/spicyneuron/Qwen3.5-122B-A10B-MLX-4.6bit
EDIT: Updated chat template to enable better prompt caching.
Usage
# Start server at http://localhost:8080/chat/completions
uvx --from mlx-vlm --with torchvision \
mlx_vlm.server \
--host 127.0.0.1 \
--port 8080 \
--model spicyneuron/Qwen3.5-122B-A10B-MLX-4.7bit-vision
Methodology
Quantized using a custom script inspired by Unsloth/AesSedai/ubergarm style mixed-precision GGUFs. MLX quantization options differ than llama.cpp, but the principles are the same:
- Sensitive layers like MoE routing, attention, and output embeddings get higher precision
- More tolerant layers like MoE experts get lower precision
- Downloads last month
- 872
Model size
20B params
Tensor type
BF16
路
U32 路
F32 路
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for spicyneuron/Qwen3.5-122B-A10B-MLX-4.7bit-vision
Base model
Qwen/Qwen3.5-122B-A10B