Qwen3.6-27B AWQ 4-bit (compressed-tensors)

Compressed-tensors output of GPTQ calibration of Qwen3.6-27B with thinking + vision preserved.

Which variant should I download?

Stack	Recommended	Why
SGLang + ROCm	Native AWQ	Faster on the fused Triton AWQ GEMM than the ROCm CT MoE path
SGLang + NVIDIA	Native AWQ	Avoids the same `Qwen3_5Moe` CT loader bug seen on 35B (not yet confirmed on 27B but same code path)
vLLM / autoawq / TGI on NVIDIA	Either works	CT loaders in those engines handle the gate correctly
Inspection / re-conversion	This (CT)	Raw GPTQ output from llmcompressor before AWQ repack

Model Details


Base model	Qwen/Qwen3.6-27B
Architecture	Qwen3.5 dense+DeltaNet hybrid + vision tower
Parameters	27B
Format	compressed-tensors `pack-quantized` (W4A16, group_size=128)
Calibration	GPTQ via llmcompressor, 256 samples × 1024 tokens, `thinking_vision` recipe

For ignore list and benchmark numbers see the native variant's README.

Convert to native AWQ

git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference
python scripts/quantize/convert_moe_ct_to_awq.py <local_path_to_this_repo> <output_dir> --group-size 128

Hardware origin

Calibrated on 2× AMD Radeon AI PRO R9700 (gfx1201, RDNA4) with ROCm 7.2 + SGLang v0.5.10 + RDNA4 patches.

Downloads last month: 2,269

Safetensors

Model size

27B params

Tensor type

I64

I32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mattbucci/Qwen3.6-27B-AWQ-CT

Base model

Qwen/Qwen3.6-27B

Quantized

(280)

this model