Qwen3.5-27B MLX 4-bit Quantization

MLX quantization of Qwen/Qwen3.5-27B optimized for Apple Silicon.

Details

  • Base model: Qwen/Qwen3.5-27B
  • Format: MLX 4-bit (4.501 bits per weight)
  • Quantized by: TxemAI
  • Quantization tool: mlx-lm

Performance (M4 Max, 128GB)

  • Throughput: 29.7 tokens/sec
  • Peak RAM: 15.3 GB

Usage

from mlx_lm import load, generate

model, tokenizer = load("TxemAI/Qwen3.5-27B-4bit-MLX")
response = generate(model, tokenizer, prompt="Your prompt here", verbose=True)

When to use this vs 8-bit

Use 4-bit if you want faster inference and lower RAM usage. Use 8-bit version if you prioritize output quality.

Downloads last month
261
Safetensors
Model size
27B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TxemAI/Qwen3.5-27B-4bit-MLX

Base model

Qwen/Qwen3.5-27B
Quantized
(199)
this model

Collections including TxemAI/Qwen3.5-27B-4bit-MLX