Qwen3.6-27B AWQ 4-bit (compressed-tensors)

Compressed-tensors output of GPTQ calibration of Qwen3.6-27B with thinking + vision preserved.

Which variant should I download?

Stack Recommended Why
SGLang + ROCm Native AWQ Faster on the fused Triton AWQ GEMM than the ROCm CT MoE path
SGLang + NVIDIA Native AWQ Avoids the same Qwen3_5Moe CT loader bug seen on 35B (not yet confirmed on 27B but same code path)
vLLM / autoawq / TGI on NVIDIA Either works CT loaders in those engines handle the gate correctly
Inspection / re-conversion This (CT) Raw GPTQ output from llmcompressor before AWQ repack

Model Details

Base model Qwen/Qwen3.6-27B
Architecture Qwen3.5 dense+DeltaNet hybrid + vision tower
Parameters 27B
Format compressed-tensors pack-quantized (W4A16, group_size=128)
Calibration GPTQ via llmcompressor, 256 samples 脳 1024 tokens, thinking_vision recipe

For ignore list and benchmark numbers see the native variant's README.

Convert to native AWQ

git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference
python scripts/quantize/convert_moe_ct_to_awq.py <local_path_to_this_repo> <output_dir> --group-size 128

Hardware origin

Calibrated on 2脳 AMD Radeon AI PRO R9700 (gfx1201, RDNA4) with ROCm 7.2 + SGLang v0.5.10 + RDNA4 patches.

Downloads last month
2,269
Safetensors
Model size
27B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for mattbucci/Qwen3.6-27B-AWQ-CT

Base model

Qwen/Qwen3.6-27B
Quantized
(280)
this model