Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled_PTQ

Overview

This repository provides a Post-Training Quantized (PTQ) version of:

Base Fine-Tune: Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
Quantized By: TheHouseOfTheDude

This is a true PTQ quantization:

No calibration dataset used
One-shot quantization pipeline
Fast and scalable

Quantization Details

Scheme: W8A16
Weights: INT8 (per-channel symmetric)
Activations: FP16 / BF16
Method: llmcompressor.oneshot
Targets: Linear layers only

Ignored layers:

lm_head
visual modules
linear attention
mtp layers

KLD Results

Mean KLD: 0.003698
Total Positions: 204700

Very low divergence → high fidelity to original model.

Key Implementation Notes

Uses AutoModelForImageTextToText to preserve VLM structure
No calibration dataset (true PTQ)
Includes key remapping for Transformers v5 compatibility

Script reference: fileciteturn11file0

Usage (vLLM)

pip install -U vllm

vllm serve TheHouseOfTheDude/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled_PTQ \
  --quantization compressed-tensors \
  --tensor-parallel-size 8 \
  --dtype bfloat16

Notes

Requires compressed-tensors runtime
Not compatible with vanilla transformers loading
Optimized for reasoning tasks

Credits

Jackrong (fine-tune)
TheHouseOfTheDude (quantization)

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for TheHouseOfTheDude/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled_PTQ

Base model

Qwen/Qwen3.5-27B

Finetuned

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Quantized

(38)

this model