Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled_PTQ

Overview

This repository provides a Post-Training Quantized (PTQ) version of:

Base Fine-Tune: Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
Quantized By: TheHouseOfTheDude

This is a true PTQ quantization:

  • No calibration dataset used
  • One-shot quantization pipeline
  • Fast and scalable

Quantization Details

  • Scheme: W8A16
  • Weights: INT8 (per-channel symmetric)
  • Activations: FP16 / BF16
  • Method: llmcompressor.oneshot
  • Targets: Linear layers only

Ignored layers:

  • lm_head
  • visual modules
  • linear attention
  • mtp layers

KLD Results

Mean KLD: 0.003698
Total Positions: 204700

Very low divergence → high fidelity to original model.


Key Implementation Notes

  • Uses AutoModelForImageTextToText to preserve VLM structure
  • No calibration dataset (true PTQ)
  • Includes key remapping for Transformers v5 compatibility

Script reference: fileciteturn11file0


Usage (vLLM)

pip install -U vllm

vllm serve TheHouseOfTheDude/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled_PTQ \
  --quantization compressed-tensors \
  --tensor-parallel-size 8 \
  --dtype bfloat16

Notes

  • Requires compressed-tensors runtime
  • Not compatible with vanilla transformers loading
  • Optimized for reasoning tasks

Credits

  • Jackrong (fine-tune)
  • TheHouseOfTheDude (quantization)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheHouseOfTheDude/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled_PTQ