Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic-GGUF

GGUF quantizations of ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic.

Qwen3.5-27B with Claude Opus 4.6 reasoning distillation, abliterated via Heretic.

Available Quants

Filename Quant Size BPW Notes
heretic-27b-Q4_K_M.gguf Q4_K_M ~15.4 GB 4.92 Recommended โ€” best balance of quality and VRAM

VRAM Requirements (Q4_K_M)

Context VRAM Fits on
16K ~18 GB RTX 3090, 4090, A5000
32K ~19 GB RTX 3090 Ti, A6000
65K ~21 GB RTX 3090 Ti, A6000

Qwen3.5's hybrid DeltaNet architecture means KV cache only covers ~25% of layers. Context scaling is very VRAM-efficient compared to pure transformer models.

Usage with llama.cpp

llama-server \
    -m heretic-27b-Q4_K_M.gguf \
    -ngl 99 \
    --ctx-size 16384 \
    --flash-attn on \
    --jinja

With vision (mmproj)

Build the mmproj from the base model weights:

python convert_hf_to_gguf.py \
    ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic \
    --mmproj --outfile heretic-27b-mmproj-F16.gguf

llama-server \
    -m heretic-27b-Q4_K_M.gguf \
    --mmproj heretic-27b-mmproj-F16.gguf \
    -ngl 99 --ctx-size 16384 --flash-attn on --jinja

Recommended settings

temperature: 0.6
top_p: 0.95
top_k: 20
presence_penalty: 1.5
repetition_penalty: 1.05

Abliteration Stats

  • Tool: Heretic v1.2.0
  • Refusals: 13/100
  • KL Divergence: 1264

Architecture

Qwen3.5 hybrid Gated DeltaNet + conventional attention. 64 layers in 3:1 pattern. 262K native context. Native multimodal vision. See the full model card for details.

Made by

Ghost โ€” ghost-actual

Downloads last month
316
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic-GGUF