Qwen3.5-27B_PTQ (W4A16 and W8A16, Post-Training Quantization)
Overview
This repository provides a W4A16 and W8A16 PTQ (Post-Training Quantized) version of Qwen3.5-27B.
Unlike AWQ/GPTQ workflows, this model was quantized using a true PTQ pipeline with no calibration dataset.
The quantization is applied in a one-shot pass, making it extremely fast and simple while still maintaining strong fidelity.
Key Highlights
- Quantization Type: PTQ (Post-Training Quantization)
- Scheme: W4A16
- Weights: INT4 (per-channel symmetric)
- Activations: FP16/BF16 (unchanged)
- Scheme: W8A16
- Weights: INT8 (per-channel symmetric)
- Activations: FP16/BF16 (unchanged)
- Calibration Dataset: ❌ None (not required)
- Method:
llmcompressor.oneshotpipeline - Target Layers: Linear layers only
- Ignored Layers:
lm_headvisualmoduleslinear_attnmtp
Quantization Details
This quant was created using a QuantizationModifier recipe:
- Targets: Linear layers
- Scheme: W4A16, W8A16
- Approach: One-shot PTQ (no iterative calibration)
- Preserves: Model structure, tokenizer, and chat template
PTQ Quality Metrics
W4A16
- Mean KLD: 0.054260
- Total Positions: 204,700
W8A16
- Mean KLD: 0.001895
- Total Positions: 204,700
Example Usage (vLLM)
pip install -U vllm
vllm serve TheHouseOfTheDude/Qwen3.5-27B-Writer_PTQ \
--quantization compressed-tensors \
--tensor-parallel-size 2 \
--dtype bfloat16
Notes
- No calibration dataset required
- Extremely fast quantization pipeline
- Designed for vLLM runtime
Credits
- Base Model: Qwen3.5-27B
- Quantization: TheHouseOfTheDude
Model tree for TheHouseOfTheDude/Qwen3.5-27B_PTQ
Base model
Qwen/Qwen3.5-27B