Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled_PTQ
Overview
This repository provides a Post-Training Quantized (PTQ) version of:
Base Fine-Tune: Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
Quantized By: TheHouseOfTheDude
This is a true PTQ quantization:
- No calibration dataset used
- One-shot quantization pipeline
- Fast and scalable
Quantization Details
- Scheme: W8A16
- Weights: INT8 (per-channel symmetric)
- Activations: FP16 / BF16
- Method: llmcompressor.oneshot
- Targets: Linear layers only
Ignored layers:
- lm_head
- visual modules
- linear attention
- mtp layers
KLD Results
Mean KLD: 0.003698
Total Positions: 204700
Very low divergence → high fidelity to original model.
Key Implementation Notes
- Uses AutoModelForImageTextToText to preserve VLM structure
- No calibration dataset (true PTQ)
- Includes key remapping for Transformers v5 compatibility
Script reference: fileciteturn11file0
Usage (vLLM)
pip install -U vllm
vllm serve TheHouseOfTheDude/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled_PTQ \
--quantization compressed-tensors \
--tensor-parallel-size 8 \
--dtype bfloat16
Notes
- Requires compressed-tensors runtime
- Not compatible with vanilla transformers loading
- Optimized for reasoning tasks
Credits
- Jackrong (fine-tune)
- TheHouseOfTheDude (quantization)
Model tree for TheHouseOfTheDude/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled_PTQ
Base model
Qwen/Qwen3.5-27B