Qwen3.6-27B AWQ 4-bit (compressed-tensors)
Compressed-tensors output of GPTQ calibration of Qwen3.6-27B with thinking + vision preserved.
Which variant should I download?
| Stack | Recommended | Why |
|---|---|---|
| SGLang + ROCm | Native AWQ | Faster on the fused Triton AWQ GEMM than the ROCm CT MoE path |
| SGLang + NVIDIA | Native AWQ | Avoids the same Qwen3_5Moe CT loader bug seen on 35B (not yet confirmed on 27B but same code path) |
| vLLM / autoawq / TGI on NVIDIA | Either works | CT loaders in those engines handle the gate correctly |
| Inspection / re-conversion | This (CT) | Raw GPTQ output from llmcompressor before AWQ repack |
Model Details
| Base model | Qwen/Qwen3.6-27B |
| Architecture | Qwen3.5 dense+DeltaNet hybrid + vision tower |
| Parameters | 27B |
| Format | compressed-tensors pack-quantized (W4A16, group_size=128) |
| Calibration | GPTQ via llmcompressor, 256 samples 脳 1024 tokens, thinking_vision recipe |
For ignore list and benchmark numbers see the native variant's README.
Convert to native AWQ
git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference
python scripts/quantize/convert_moe_ct_to_awq.py <local_path_to_this_repo> <output_dir> --group-size 128
Hardware origin
Calibrated on 2脳 AMD Radeon AI PRO R9700 (gfx1201, RDNA4) with ROCm 7.2 + SGLang v0.5.10 + RDNA4 patches.
- Downloads last month
- 2,269
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for mattbucci/Qwen3.6-27B-AWQ-CT
Base model
Qwen/Qwen3.6-27B