Qwen3.5-35B-A3B-MLX-2bit

This is a MLX Q2 quantized (2.504 average bits per parameter) version of Qwen/Qwen3.5-35B-A3B, aiming to fit on 16GB Unified Memory.

Quantized with mlx-lm

Usage

from mlx_lm import load, generate

model, tokenizer = load("MercuriusDream/Qwen3.5-35B-A3B-MLX-2bit")
messages = [{"role": "user", "content": "Hello!"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
text = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
257
Safetensors
Model size
35B params
Tensor type
BF16
U32
F32
MLX
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for MercuriusDream/Qwen3.5-35B-A3B-MLX-2bit

Quantized
(242)
this model