Qwen3.5-27B_PTQ (W4A16 and W8A16, Post-Training Quantization)

Overview

This repository provides a W4A16 and W8A16 PTQ (Post-Training Quantized) version of Qwen3.5-27B.

Unlike AWQ/GPTQ workflows, this model was quantized using a true PTQ pipeline with no calibration dataset.
The quantization is applied in a one-shot pass, making it extremely fast and simple while still maintaining strong fidelity.


Key Highlights

  • Quantization Type: PTQ (Post-Training Quantization)
  • Scheme: W4A16
    • Weights: INT4 (per-channel symmetric)
    • Activations: FP16/BF16 (unchanged)
  • Scheme: W8A16
    • Weights: INT8 (per-channel symmetric)
    • Activations: FP16/BF16 (unchanged)
  • Calibration Dataset: ❌ None (not required)
  • Method: llmcompressor.oneshot pipeline
  • Target Layers: Linear layers only
  • Ignored Layers:
    • lm_head
    • visual modules
    • linear_attn
    • mtp

Quantization Details

This quant was created using a QuantizationModifier recipe:

  • Targets: Linear layers
  • Scheme: W4A16, W8A16
  • Approach: One-shot PTQ (no iterative calibration)
  • Preserves: Model structure, tokenizer, and chat template

PTQ Quality Metrics

W4A16

  • Mean KLD: 0.054260
  • Total Positions: 204,700

W8A16

  • Mean KLD: 0.001895
  • Total Positions: 204,700

Example Usage (vLLM)

pip install -U vllm

vllm serve TheHouseOfTheDude/Qwen3.5-27B-Writer_PTQ \
    --quantization compressed-tensors \
    --tensor-parallel-size 2 \
    --dtype bfloat16

Notes

  • No calibration dataset required
  • Extremely fast quantization pipeline
  • Designed for vLLM runtime

Credits

  • Base Model: Qwen3.5-27B
  • Quantization: TheHouseOfTheDude
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheHouseOfTheDude/Qwen3.5-27B_PTQ

Base model

Qwen/Qwen3.5-27B
Quantized
(202)
this model