Qwen3.5-27B MLX 8-bit Quantization

MLX quantization of Qwen/Qwen3.5-27B optimized for Apple Silicon.

Details

Base model: Qwen/Qwen3.5-27B
Format: MLX 8-bit (8.501 bits per weight)
Quantized by: TxemAI
Quantization tool: mlx-lm

Performance (M4 Max, 128GB)

Throughput: 29.7 tokens/sec
Peak RAM: ~28GB RAM

Usage

from mlx_lm import load, generate

model, tokenizer = load("TxemAI/Qwen3.5-27B-4bit-MLX")
response = generate(model, tokenizer, prompt="Your prompt here", verbose=True)

When to use this vs 4-bit

Use 4-bit if you want faster inference and lower RAM usage. Use 8-bit version if you prioritize output quality.

Downloads last month: 289

Safetensors

Model size

27B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

8-bit

Model tree for TxemAI/Qwen3.5-27B-8bit-MLX

Base model

Qwen/Qwen3.5-27B

Quantized

(199)

this model

Collection including TxemAI/Qwen3.5-27B-8bit-MLX

Qwen3.5 MLX — Apple Silicon Optimized

Collection

4 items • Updated 15 days ago