Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic-GGUF

GGUF quantizations of ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic.

Qwen3.5-27B with Claude Opus 4.6 reasoning distillation, abliterated via Heretic.

Available Quants

Filename	Quant	Size	BPW	Notes
heretic-27b-Q4_K_M.gguf	Q4_K_M	~15.4 GB	4.92	Recommended — best balance of quality and VRAM

VRAM Requirements (Q4_K_M)

Context	VRAM	Fits on
16K	~18 GB	RTX 3090, 4090, A5000
32K	~19 GB	RTX 3090 Ti, A6000
65K	~21 GB	RTX 3090 Ti, A6000

Qwen3.5's hybrid DeltaNet architecture means KV cache only covers ~25% of layers. Context scaling is very VRAM-efficient compared to pure transformer models.

Usage with llama.cpp

llama-server \
    -m heretic-27b-Q4_K_M.gguf \
    -ngl 99 \
    --ctx-size 16384 \
    --flash-attn on \
    --jinja

With vision (mmproj)

Build the mmproj from the base model weights:

python convert_hf_to_gguf.py \
    ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic \
    --mmproj --outfile heretic-27b-mmproj-F16.gguf

llama-server \
    -m heretic-27b-Q4_K_M.gguf \
    --mmproj heretic-27b-mmproj-F16.gguf \
    -ngl 99 --ctx-size 16384 --flash-attn on --jinja

Recommended settings

temperature: 0.6
top_p: 0.95
top_k: 20
presence_penalty: 1.5
repetition_penalty: 1.05

Abliteration Stats

Tool: Heretic v1.2.0
Refusals: 13/100
KL Divergence: 1264

Architecture

Qwen3.5 hybrid Gated DeltaNet + conventional attention. 64 layers in 3:1 pattern. 262K native context. Native multimodal vision. See the full model card for details.

Made by

Ghost — ghost-actual

Downloads last month: 316

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

4-bit

Model tree for ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic-GGUF

Base model

Qwen/Qwen3.5-27B

Finetuned

ZonoDilu/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Quantized

(1)

this model