Qwen3.5-35B-A3B-Text-qx64-hi-mlx

The video part was removed, no other changes.

For improved versions with vision support from NightmediaAI, check out our models

Brainwaves

         arc   arc/e boolq hswag obkqa piqa  wino
qx86-hi  0.420,0.457,0.379,0.671,0.354,0.777,0.702
qx64-hi  0.413,0.459,0.378,0.670,0.366,0.772,0.687
mxfp4    0.413,0.464,0.378,0.675,0.364,0.771,0.687

nightmedia/Qwen3.5-35B-A3B-Instruct
qx86-hi  0.554,0.670,0.891,...

Quant    Perplexity     Max Memory
qx86-hi  4.042 ± 0.026  41.52 GB
qx64-hi  4.073 ± 0.026  32.86 GB

Models from NightmediaAI

nightmedia/Qwen3.5-35B-A3B-Holodeck
qx86-hi  0.540,0.647,0.890,0.690,0.412,0.792,0.679

nightmedia/Qwen3.5-35B-A3B-Engineer
qx64-hi  0.543,0.666,0.886,0.692,0.412,0.785,0.684
mxfp4    0.438,0.546,0.894,0.645,0.384,0.719,0.569

nightmedia/Qwen3-30B-A3B-Element11b
mxfp8    0.575,0.712,0.880,0.745,0.470,0.796,0.706
qx86-hi  0.586,0.757,0.880,0.753,0.458,0.805,0.705
qx64-hi  0.576,0.759,0.876,0.752,0.470,0.803,0.698
mxfp4    0.550,0.714,0.877,0.747,0.432,0.798,0.695

Older models with VL

Qwen3-VL-30B-A3B-Instruct
qx86-hi  0.439,0.541,0.894,0.619,0.430,0.764,0.592
qx64-hi  0.454,0.544,0.893,0.618,0.428,0.749,0.590

Qwen3-VL-30B-A3B-Thinking
qx86-hi  0.393,0.466,0.751,0.648,0.366,0.776,0.667

For a thinking model, it performs above the previous VL MoE.

Deckard formula update

This quant has been updated with the new Deckard(qx) formula for Qwen3.5, and includes some attention layers new to the architecture.

While the changes seem minimal for the added weight, the model is more confident, with better attention to detail, and the vibe quality has improved.

Old formula metrics

qx86-hi  0.418,...
qx64-hi  0.419,...

Perplexity
qx86-hi  4.037 ± 0.026   37.46 GB
qx64-hi  4.079 ± 0.026   28.79 GB

More metrics will be available soon

-G

This model Qwen3.5-35B-A3B-Text-qx64-hi-mlx was converted to MLX format from Qwen/Qwen3.5-35B-A3B using mlx-lm version 0.30.8.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Qwen3.5-35B-A3B-Text-qx64-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
1,175
Safetensors
Model size
35B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nightmedia/Qwen3.5-35B-A3B-Text-qx64-hi-mlx

Quantized
(240)
this model

Collections including nightmedia/Qwen3.5-35B-A3B-Text-qx64-hi-mlx