Qwen3.5-27B MLX 4-bit Quantization

MLX quantization of Qwen/Qwen3.5-27B optimized for Apple Silicon.

Details

Base model: Qwen/Qwen3.5-27B
Format: MLX 4-bit (4.501 bits per weight)
Quantized by: TxemAI
Quantization tool: mlx-lm

Performance (M4 Max, 128GB)

Throughput: 29.7 tokens/sec
Peak RAM: 15.3 GB

Usage

from mlx_lm import load, generate

model, tokenizer = load("TxemAI/Qwen3.5-27B-4bit-MLX")
response = generate(model, tokenizer, prompt="Your prompt here", verbose=True)

When to use this vs 8-bit

Use 4-bit if you want faster inference and lower RAM usage. Use 8-bit version if you prioritize output quality.

Downloads last month: 261

Safetensors

Model size

27B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

4-bit

Model tree for TxemAI/Qwen3.5-27B-4bit-MLX

Base model

Qwen/Qwen3.5-27B

Quantized

(199)

this model

Collections including TxemAI/Qwen3.5-27B-4bit-MLX

Qwen3.5 MLX — Apple Silicon Optimized

Collection

4 items • Updated 15 days ago

Gemma-4 MLX - Apple Silicon Optimized

Collection

2 items • Updated 13 days ago