Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-heretic-v2 AutoRound W4A16

This is a 4-bit AutoRound export of:

llmfan46/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-heretic-v2

Quantization settings

Method: AutoRound 0.9.2
Scheme: W4A16
Bits: 4
Group size: 128
Iterations: 200
Calibration seqlen: 512
Calibration samples: 64
Batch size: 1
Gradient accumulate steps: 8
Low GPU memory mode: enabled
Packing format: auto_round:auto_gptq

Artifact summary

Source size: 51G
Quantized size: 18G
Size reduction: 64.71%
Final export: 5 safetensor shards

AutoRound reported 400/607 modules quantized. The visual tower, many linear_attn.in_proj_a/b layers, and lm_head remained unquantized.

Perplexity benchmark

WikiText-2 test split, sliding-window next-token perplexity:

seq_len=512
stride=256
max_tokens=8192

Model	PPL	Tokens scored	Load time	Eval time
FP source	7.518735	8161	9.829s	106.855s
AutoRound W4A16	7.747804	8161	5.844s	42.065s

Degradation vs FP: +3.0466%

Loading with Transformers

from transformers import Qwen3_5ForConditionalGeneration, AutoTokenizer

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "PATH/TO/THIS/MODEL",
    device_map="auto",
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "PATH/TO/THIS/MODEL",
    trust_remote_code=True,
)

Loading with vLLM

python -m vllm.entrypoints.openai.api_server \
  --model PATH/TO/THIS/MODEL \
  --quantization gptq \
  --tensor-parallel-size 2

Included benchmark/report artifacts

benchmark_summary.json
heretic_v2_fp_ppl.json
heretic_v2_quant_ppl.json
HERETIC_V2_QUANTIZATION_REPORT.md

Notes

The source model architecture is Qwen3_5ForConditionalGeneration.
Text-only evaluation was run through the internal language model path.
Please follow the original model's usage terms and license expectations.

Downloads last month: 821

Safetensors

Model size

3B params

Tensor type

BF16

I32

F16

Model tree for groxaxo/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-heretic-v2-AutoRound-W4A16

Base model

llmfan46/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-heretic-v2

Quantized

(4)

this model