Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic

A Qwen3.5-4B with Claude Opus 4.6 reasoning distillation, abliterated via Heretic to remove safety refusals while preserving reasoning quality.

Abliteration Stats

  • Tool: Heretic v1.2.0
  • Refusals: 4/100
  • KL Divergence: 0.0680 (extremely low — capabilities fully preserved)
  • Targets: attn.out_proj, mlp.down_proj
  • Trial: 191 (best of 200)

4/100 refusals with 0.068 KL divergence is about as clean as abliteration gets. The safety layer is essentially gone while the model's actual intelligence is untouched.

Architecture

Qwen3.5 hybrid Gated DeltaNet + conventional attention:

  • 32 layers in 3:1 pattern (3 DeltaNet → 1 full attention)
  • Native multimodal — vision built into the architecture
  • 262K native context, extensible to 1M+
  • DeltaNet layers use fixed-size recurrent state (O(1) memory regardless of context length)

VRAM Requirements

Precision VRAM (16K ctx) VRAM (64K ctx)
BF16/FP16 (this repo) ~8 GB ~9 GB

Fits on an RTX 3060 (12GB) in FP16 with room to spare. At Q4_K_M it runs on basically anything.

Usage

With transformers

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic",
    torch_dtype="bfloat16",
    device_map="auto",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
    "ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic",
    trust_remote_code=True
)

With llama.cpp

Convert to GGUF:

python convert_hf_to_gguf.py \
    ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic \
    --outfile heretic-4b-F16.gguf --outtype f16

llama-quantize heretic-4b-F16.gguf heretic-4b-Q4_K_M.gguf Q4_K_M

Recommended settings

temperature: 0.6
top_p: 0.95
top_k: 20
presence_penalty: 1.5

Base Model

avalon2244/Qwen3.5-4B-Claude-Opus-4.6-Distilled — Claude Opus 4.6 reasoning distilled into Qwen3.5-4B.

Why this exists

Small abliterated models with quality reasoning are hard to find. Most uncensored small models are either dumb, use Gemini/GPT distillation, or butcher the model during abliteration. This one keeps Claude-style chain-of-thought intact at 4/100 refusals with near-zero capability loss.

See also

Made by

Ghost — ghost-actual

Built with Heretic by p-e-w.

Downloads last month
238
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support