Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated MLX 8-bit
8-bit quantized MLX version of Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated.
Model Details
- Architecture: Qwen 3.5 27B (hybrid linear attention + full attention)
- Quantization: 8-bit affine, group_size=64
- Size: ~32 GB (down from ~55 GB bf16)
- Context Length: 262,144 tokens
- Vision: Full image and video understanding via integrated vision tower (27 ViT blocks)
- Tool Use: Native function calling support
- Thinking: Chain-of-thought reasoning mode
Quantization Strategy
- Quantized (8-bit): All large projection weights in 64 language model layers (MLP, attention, linear attention)
- Kept in bf16: Vision tower, embeddings, LM head, layer norms, SSM parameters (A_log, dt_bias, conv1d)
- This preserves vision quality and model stability while significantly reducing memory usage
Capabilities
- Image understanding and description
- Video understanding
- Tool use / function calling
- Multi-step agent reasoning
- Thinking/reasoning mode
- Multilingual support
- Long context (262K tokens)
Usage
Works with LM Studio, MLX, and other MLX-compatible frameworks.
- Downloads last month
- 1,172
Model size
10B params
Tensor type
BF16
路
F32 路
U32 路
Hardware compatibility
Log In to add your hardware
8-bit