Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated MLX 8-bit

8-bit quantized MLX version of Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated.

Model Details

  • Architecture: Qwen 3.5 27B (hybrid linear attention + full attention)
  • Quantization: 8-bit affine, group_size=64
  • Size: ~32 GB (down from ~55 GB bf16)
  • Context Length: 262,144 tokens
  • Vision: Full image and video understanding via integrated vision tower (27 ViT blocks)
  • Tool Use: Native function calling support
  • Thinking: Chain-of-thought reasoning mode

Quantization Strategy

  • Quantized (8-bit): All large projection weights in 64 language model layers (MLP, attention, linear attention)
  • Kept in bf16: Vision tower, embeddings, LM head, layer norms, SSM parameters (A_log, dt_bias, conv1d)
  • This preserves vision quality and model stability while significantly reducing memory usage

Capabilities

  • Image understanding and description
  • Video understanding
  • Tool use / function calling
  • Multi-step agent reasoning
  • Thinking/reasoning mode
  • Multilingual support
  • Long context (262K tokens)

Usage

Works with LM Studio, MLX, and other MLX-compatible frameworks.

Downloads last month
1,172
Safetensors
Model size
10B params
Tensor type
BF16
F32
U32
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support