Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated MLX MXFP8
MXFP8 (Microscaling FP8) quantized MLX version of Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated.
Model Details
- Architecture: Qwen 3.5 27B (hybrid linear attention + full attention)
- Quantization: MXFP8 (E4M3 with block-level scaling), group_size=32
- Size: ~29 GB
- Context Length: 262,144 tokens
- Vision: Full image and video understanding via integrated vision tower (27 ViT blocks, kept in bf16)
- Tool Use: Native function calling support
- Thinking: Chain-of-thought reasoning mode
Why MXFP8?
MXFP8 uses floating-point (E4M3) representation with per-block scaling instead of fixed-point integer quantization. This gives:
- Better handling of outlier weights (exponent absorbs magnitude)
- Lower quantization error across varying tensor ranges
- Native hardware acceleration on modern chips
Capabilities
- Image understanding and description
- Video understanding
- Tool use / function calling
- Multi-step agent reasoning
- Thinking/reasoning mode
- Multilingual support
- Long context (262K tokens)
Usage
Works with LM Studio, MLX, and other MLX-compatible frameworks.
- Downloads last month
- 1,319
Model size
27B params
Tensor type
U8
路
U32 路
BF16 路
F32 路
Hardware compatibility
Log In to add your hardware
8-bit