Qwen Logo

Qwen3-4B-Cybersecurity-Heretic-16bit

Qwen3-4B-Cybersecurity with refusal directions removed via Heretic abliteration

Base Model Heretic GGUF Pass Rate License


🔵 Base version (with refusals): DexopT/Qwen3-4B-Cybersecurity

Model Description

Qwen3-4B-Cybersecurity-Heretic-16bit is DexopT/Qwen3-4B-Cybersecurity with refusal directions surgically removed using Heretic v1.2.0 — a technique that identifies and subtracts the model's "refusal direction" from its residual stream without retraining.

Refusal test results: 38/50 prompts answered (76% pass rate) using custom cybersecurity-specific bad/good prompt datasets.


Model Family

Model Description Link
Qwen3-4B-Cybersecurity Base fine-tuned model
Qwen3-4B-Cybersecurity-Heretic-16bit Abliterated version (this repo) 📍 You are here
Qwen3-4B-Cybersecurity-GGUF Q8_0 + Q4_K_M quantized for llama.cpp

Abliteration Details

Parameter Value
Tool Heretic v1.2.0
Trials run 30
Selected trial Trial 24
Refusals after abliteration 49 / 100
Pass rate 76% (38/50)
KL divergence 0.067
Bad prompts dataset DexopT/heretic-bad-prompts
Good prompts dataset DexopT/heretic-good-prompts
Format 16bit merged safetensors

What is Heretic Abliteration?

Heretic computes the "refusal direction" — a vector in the model's residual stream that activates when the model encounters requests it refuses. It then uses LoRA to project out this direction, reducing refusals without retraining. Unlike jailbreaks, this modifies model weights directly.

Trial Selection

Trial 24 was selected from 30 Pareto-optimal trials balancing refusal reduction and model capability (KL divergence):

Trial Refusals KL Divergence Notes
24 ✓ 49/100 0.067 Best balance — selected
18 53/100 0.063
21 54/100 0.047
15 93/100 0.001 Minimal abliteration

Datasets Used

Dataset Role Link
DexopT/cyber_heretic Fine-tuning training data
DexopT/heretic-bad-prompts Heretic refusal direction computation
DexopT/heretic-good-prompts Heretic baseline computation

Bad prompts cover: WiFi/WPA2 cracking, ransomware, rogue AP, XSS, WAF bypass, malicious macros. Good prompts cover: SQL injection, reverse shells, keyloggers, privilege escalation, AD attacks.


Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit")

messages = [
    {"role": "system", "content": "You are an expert cybersecurity assistant."},
    {"role": "user", "content": "Write a Python reverse shell payload."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.8)

print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

MLX (Apple Silicon)

pip install mlx-lm

mlx_lm.convert \
  --hf-path DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit \
  --mlx-path ~/models/qwen3-heretic-mlx \
  --quantize --q-bits 8

mlx_lm.chat --model ~/models/qwen3-heretic-mlx

LM Studio / llama.cpp / Ollama

Use the GGUF version: DexopT/Qwen3-4B-Cybersecurity-GGUF


⚠️ Disclaimer

This model is intended for educational and research purposes only. The abliteration process reduces but does not eliminate all safety behaviors. Use responsibly and only on systems you have explicit permission to test. The authors are not responsible for any misuse.


Links

🔵 Base Model DexopT/Qwen3-4B-Cybersecurity
📦 GGUF (Q8 + Q4) DexopT/Qwen3-4B-Cybersecurity-GGUF
📊 Training Dataset DexopT/cyber_heretic
🔪 Heretic Tool github.com/p-e-w/heretic
🔧 Original Base unsloth/Qwen3-4B-Instruct-2507
🏠 Qwen3 Collection Qwen3 on HuggingFace
Downloads last month
36
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit

Finetuned
(1)
this model
Quantizations
1 model