Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic
A Qwen3.5-4B with Claude Opus 4.6 reasoning distillation, abliterated via Heretic to remove safety refusals while preserving reasoning quality.
Abliteration Stats
- Tool: Heretic v1.2.0
- Refusals: 4/100
- KL Divergence: 0.0680 (extremely low — capabilities fully preserved)
- Targets:
attn.out_proj,mlp.down_proj - Trial: 191 (best of 200)
4/100 refusals with 0.068 KL divergence is about as clean as abliteration gets. The safety layer is essentially gone while the model's actual intelligence is untouched.
Architecture
Qwen3.5 hybrid Gated DeltaNet + conventional attention:
- 32 layers in 3:1 pattern (3 DeltaNet → 1 full attention)
- Native multimodal — vision built into the architecture
- 262K native context, extensible to 1M+
- DeltaNet layers use fixed-size recurrent state (O(1) memory regardless of context length)
VRAM Requirements
| Precision | VRAM (16K ctx) | VRAM (64K ctx) |
|---|---|---|
| BF16/FP16 (this repo) | ~8 GB | ~9 GB |
Fits on an RTX 3060 (12GB) in FP16 with room to spare. At Q4_K_M it runs on basically anything.
Usage
With transformers
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
model = Qwen3_5ForConditionalGeneration.from_pretrained(
"ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic",
torch_dtype="bfloat16",
device_map="auto",
trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
"ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic",
trust_remote_code=True
)
With llama.cpp
Convert to GGUF:
python convert_hf_to_gguf.py \
ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic \
--outfile heretic-4b-F16.gguf --outtype f16
llama-quantize heretic-4b-F16.gguf heretic-4b-Q4_K_M.gguf Q4_K_M
Recommended settings
temperature: 0.6
top_p: 0.95
top_k: 20
presence_penalty: 1.5
Base Model
avalon2244/Qwen3.5-4B-Claude-Opus-4.6-Distilled — Claude Opus 4.6 reasoning distilled into Qwen3.5-4B.
Why this exists
Small abliterated models with quality reasoning are hard to find. Most uncensored small models are either dumb, use Gemini/GPT distillation, or butcher the model during abliteration. This one keeps Claude-style chain-of-thought intact at 4/100 refusals with near-zero capability loss.
See also
- ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic — the big brother
Made by
Ghost — ghost-actual
Built with Heretic by p-e-w.
- Downloads last month
- 238