Qwen3.5-35B-A3B-quantized.w4a16

This is a quantized version of Qwen/Qwen3.5-35B-A3B. This model accepts text and images as inputs and generates text as outputs. The weights were quantized to INT4 using GPTQ via llm-compressor, reducing the model size from 67.0 GB to 20.4 GB (~3.3x reduction) while maintaining 99.5% average accuracy recovery.

Inference

As of 2/27/2026, this model is supported in vLLM nightly. To serve the model:

vllm serve Kbenkhaled/Qwen3.5-35B-A3B-quantized.w4a16 \
    --reasoning-parser qwen3 \
    --enable-prefix-caching

Evaluation

Evaluated with lm-evaluation-harness, 0-shot, thinking mode ON.

Benchmark	Qwen3.5-35B-A3B	Qwen3.5-35B-A3B-quantized.w4a16 (this model)	Recovery
GPQA Diamond	81.31%	81.31%	100%
IFEval	95.56%	94.60%	99.0%
MMLU-Redux	92.51%	91.95%	99.4%
Average	89.79%	89.29%	99.5%

Downloads last month: 7,087

Safetensors

Model size

6B params

Tensor type

I64

I32

BF16

Model tree for apolo13x/Qwen3.5-35B-A3B-quantized.w4a16

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Quantized

(243)

this model