Qwen3.5-35B-A3B-MLX-2bit
This is a MLX Q2 quantized (2.504 average bits per parameter) version of Qwen/Qwen3.5-35B-A3B, aiming to fit on 16GB Unified Memory.
Quantized with mlx-lm
Usage
from mlx_lm import load, generate
model, tokenizer = load("MercuriusDream/Qwen3.5-35B-A3B-MLX-2bit")
messages = [{"role": "user", "content": "Hello!"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
text = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 257
Model size
35B params
Tensor type
BF16
路
U32 路
F32 路
Hardware compatibility
Log In to add your hardware
2-bit