Qwen3-4B-Cybersecurity-Heretic-16bit

Qwen3-4B-Cybersecurity with refusal directions removed via Heretic abliteration

🔵 Base version (with refusals): DexopT/Qwen3-4B-Cybersecurity

Model Description

Qwen3-4B-Cybersecurity-Heretic-16bit is DexopT/Qwen3-4B-Cybersecurity with refusal directions surgically removed using Heretic v1.2.0 — a technique that identifies and subtracts the model's "refusal direction" from its residual stream without retraining.

Refusal test results: 38/50 prompts answered (76% pass rate) using custom cybersecurity-specific bad/good prompt datasets.

Model Family

Model	Description	Link
Qwen3-4B-Cybersecurity	Base fine-tuned model	→
Qwen3-4B-Cybersecurity-Heretic-16bit	Abliterated version (this repo)	📍 You are here
Qwen3-4B-Cybersecurity-GGUF	Q8_0 + Q4_K_M quantized for llama.cpp	→

Abliteration Details

Parameter	Value
Tool	Heretic v1.2.0
Trials run	30
Selected trial	Trial 24
Refusals after abliteration	49 / 100
Pass rate	76% (38/50)
KL divergence	0.067
Bad prompts dataset	DexopT/heretic-bad-prompts
Good prompts dataset	DexopT/heretic-good-prompts
Format	16bit merged safetensors

What is Heretic Abliteration?

Heretic computes the "refusal direction" — a vector in the model's residual stream that activates when the model encounters requests it refuses. It then uses LoRA to project out this direction, reducing refusals without retraining. Unlike jailbreaks, this modifies model weights directly.

Trial Selection

Trial 24 was selected from 30 Pareto-optimal trials balancing refusal reduction and model capability (KL divergence):

Trial	Refusals	KL Divergence	Notes
24 ✓	49/100	0.067	Best balance — selected
18	53/100	0.063
21	54/100	0.047
15	93/100	0.001	Minimal abliteration

Datasets Used

Dataset	Role	Link
DexopT/cyber_heretic	Fine-tuning training data	→
DexopT/heretic-bad-prompts	Heretic refusal direction computation	→
DexopT/heretic-good-prompts	Heretic baseline computation	→

Bad prompts cover: WiFi/WPA2 cracking, ransomware, rogue AP, XSS, WAF bypass, malicious macros. Good prompts cover: SQL injection, reverse shells, keyloggers, privilege escalation, AD attacks.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit")

messages = [
    {"role": "system", "content": "You are an expert cybersecurity assistant."},
    {"role": "user", "content": "Write a Python reverse shell payload."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.8)

print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

MLX (Apple Silicon)

pip install mlx-lm

mlx_lm.convert \
  --hf-path DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit \
  --mlx-path ~/models/qwen3-heretic-mlx \
  --quantize --q-bits 8

mlx_lm.chat --model ~/models/qwen3-heretic-mlx

LM Studio / llama.cpp / Ollama

Use the GGUF version: DexopT/Qwen3-4B-Cybersecurity-GGUF

⚠️ Disclaimer

This model is intended for educational and research purposes only. The abliteration process reduces but does not eliminate all safety behaviors. Use responsibly and only on systems you have explicit permission to test. The authors are not responsible for any misuse.

Links


🔵 Base Model	DexopT/Qwen3-4B-Cybersecurity
📦 GGUF (Q8 + Q4)	DexopT/Qwen3-4B-Cybersecurity-GGUF
📊 Training Dataset	DexopT/cyber_heretic
🔪 Heretic Tool	github.com/p-e-w/heretic
🔧 Original Base	unsloth/Qwen3-4B-Instruct-2507
🏠 Qwen3 Collection	Qwen3 on HuggingFace

Downloads last month: 36

Safetensors

Model size

4B params

Tensor type

F16

Model tree for DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

unsloth/Qwen3-4B-Instruct-2507

Finetuned

DexopT/Qwen3-4B-Cybersecurity

Finetuned

(1)

this model

Quantizations

1 model