Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic

A Qwen3.5-4B with Claude Opus 4.6 reasoning distillation, abliterated via Heretic to remove safety refusals while preserving reasoning quality.

Abliteration Stats

Tool: Heretic v1.2.0
Refusals: 4/100
KL Divergence: 0.0680 (extremely low — capabilities fully preserved)
Targets: attn.out_proj, mlp.down_proj
Trial: 191 (best of 200)

4/100 refusals with 0.068 KL divergence is about as clean as abliteration gets. The safety layer is essentially gone while the model's actual intelligence is untouched.

Architecture

Qwen3.5 hybrid Gated DeltaNet + conventional attention:

32 layers in 3:1 pattern (3 DeltaNet → 1 full attention)
Native multimodal — vision built into the architecture
262K native context, extensible to 1M+
DeltaNet layers use fixed-size recurrent state (O(1) memory regardless of context length)

VRAM Requirements

Precision	VRAM (16K ctx)	VRAM (64K ctx)
BF16/FP16 (this repo)	~8 GB	~9 GB

Fits on an RTX 3060 (12GB) in FP16 with room to spare. At Q4_K_M it runs on basically anything.

Usage

With transformers

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic",
    torch_dtype="bfloat16",
    device_map="auto",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
    "ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic",
    trust_remote_code=True
)

With llama.cpp

Convert to GGUF:

python convert_hf_to_gguf.py \
    ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic \
    --outfile heretic-4b-F16.gguf --outtype f16

llama-quantize heretic-4b-F16.gguf heretic-4b-Q4_K_M.gguf Q4_K_M

Recommended settings

temperature: 0.6
top_p: 0.95
top_k: 20
presence_penalty: 1.5

Base Model

avalon2244/Qwen3.5-4B-Claude-Opus-4.6-Distilled — Claude Opus 4.6 reasoning distilled into Qwen3.5-4B.

Why this exists

Small abliterated models with quality reasoning are hard to find. Most uncensored small models are either dumb, use Gemini/GPT distillation, or butcher the model during abliteration. This one keeps Claude-style chain-of-thought intact at 4/100 refusals with near-zero capability loss.

Made by

Ghost — ghost-actual

Built with Heretic by p-e-w.

Downloads last month: 238

Safetensors

Model size

5B params

Tensor type

BF16

ghost-actual
/

Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic