Qwen3.5-27B_PTQ (W4A16 and W8A16, Post-Training Quantization)

Overview

This repository provides a W4A16 and W8A16 PTQ (Post-Training Quantized) version of Qwen3.5-27B.

Unlike AWQ/GPTQ workflows, this model was quantized using a true PTQ pipeline with no calibration dataset.
The quantization is applied in a one-shot pass, making it extremely fast and simple while still maintaining strong fidelity.

Key Highlights

Quantization Type: PTQ (Post-Training Quantization)
Scheme: W4A16
- Weights: INT4 (per-channel symmetric)
- Activations: FP16/BF16 (unchanged)
Scheme: W8A16
- Weights: INT8 (per-channel symmetric)
- Activations: FP16/BF16 (unchanged)
Calibration Dataset: ❌ None (not required)
Method: llmcompressor.oneshot pipeline
Target Layers: Linear layers only
Ignored Layers:
- lm_head
- visual modules
- linear_attn
- mtp

Quantization Details

This quant was created using a QuantizationModifier recipe:

Targets: Linear layers
Scheme: W4A16, W8A16
Approach: One-shot PTQ (no iterative calibration)
Preserves: Model structure, tokenizer, and chat template

PTQ Quality Metrics

W4A16

Mean KLD: 0.054260
Total Positions: 204,700

W8A16

Mean KLD: 0.001895
Total Positions: 204,700

Example Usage (vLLM)

pip install -U vllm

vllm serve TheHouseOfTheDude/Qwen3.5-27B-Writer_PTQ \
    --quantization compressed-tensors \
    --tensor-parallel-size 2 \
    --dtype bfloat16

Notes

No calibration dataset required
Extremely fast quantization pipeline
Designed for vLLM runtime

Credits

Base Model: Qwen3.5-27B
Quantization: TheHouseOfTheDude

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for TheHouseOfTheDude/Qwen3.5-27B_PTQ

Base model

Qwen/Qwen3.5-27B

Quantized

(202)

this model