Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-heretic-v2 AutoRound W4A16

This is a 4-bit AutoRound export of:

llmfan46/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-heretic-v2

Quantization settings

  • Method: AutoRound 0.9.2
  • Scheme: W4A16
  • Bits: 4
  • Group size: 128
  • Iterations: 200
  • Calibration seqlen: 512
  • Calibration samples: 64
  • Batch size: 1
  • Gradient accumulate steps: 8
  • Low GPU memory mode: enabled
  • Packing format: auto_round:auto_gptq

Artifact summary

  • Source size: 51G
  • Quantized size: 18G
  • Size reduction: 64.71%
  • Final export: 5 safetensor shards

AutoRound reported 400/607 modules quantized. The visual tower, many linear_attn.in_proj_a/b layers, and lm_head remained unquantized.

Perplexity benchmark

WikiText-2 test split, sliding-window next-token perplexity:

  • seq_len=512
  • stride=256
  • max_tokens=8192
Model PPL Tokens scored Load time Eval time
FP source 7.518735 8161 9.829s 106.855s
AutoRound W4A16 7.747804 8161 5.844s 42.065s

Degradation vs FP: +3.0466%

Loading with Transformers

from transformers import Qwen3_5ForConditionalGeneration, AutoTokenizer

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "PATH/TO/THIS/MODEL",
    device_map="auto",
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "PATH/TO/THIS/MODEL",
    trust_remote_code=True,
)

Loading with vLLM

python -m vllm.entrypoints.openai.api_server \
  --model PATH/TO/THIS/MODEL \
  --quantization gptq \
  --tensor-parallel-size 2

Included benchmark/report artifacts

  • benchmark_summary.json
  • heretic_v2_fp_ppl.json
  • heretic_v2_quant_ppl.json
  • HERETIC_V2_QUANTIZATION_REPORT.md

Notes

  • The source model architecture is Qwen3_5ForConditionalGeneration.
  • Text-only evaluation was run through the internal language model path.
  • Please follow the original model's usage terms and license expectations.
Downloads last month
821
Safetensors
Model size
3B params
Tensor type
BF16
I32
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for groxaxo/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-heretic-v2-AutoRound-W4A16