Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated MLX MXFP8

MXFP8 (Microscaling FP8) quantized MLX version of Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated.

Model Details

  • Architecture: Qwen 3.5 27B (hybrid linear attention + full attention)
  • Quantization: MXFP8 (E4M3 with block-level scaling), group_size=32
  • Size: ~29 GB
  • Context Length: 262,144 tokens
  • Vision: Full image and video understanding via integrated vision tower (27 ViT blocks, kept in bf16)
  • Tool Use: Native function calling support
  • Thinking: Chain-of-thought reasoning mode

Why MXFP8?

MXFP8 uses floating-point (E4M3) representation with per-block scaling instead of fixed-point integer quantization. This gives:

  • Better handling of outlier weights (exponent absorbs magnitude)
  • Lower quantization error across varying tensor ranges
  • Native hardware acceleration on modern chips

Capabilities

  • Image understanding and description
  • Video understanding
  • Tool use / function calling
  • Multi-step agent reasoning
  • Thinking/reasoning mode
  • Multilingual support
  • Long context (262K tokens)

Usage

Works with LM Studio, MLX, and other MLX-compatible frameworks.

Downloads last month
1,319
Safetensors
Model size
27B params
Tensor type
U8
U32
BF16
F32
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support