Qwen3-4B-Cybersecurity-Heretic-16bit
Qwen3-4B-Cybersecurity with refusal directions removed via Heretic abliteration
🔵 Base version (with refusals): DexopT/Qwen3-4B-Cybersecurity
Model Description
Qwen3-4B-Cybersecurity-Heretic-16bit is DexopT/Qwen3-4B-Cybersecurity with refusal directions surgically removed using Heretic v1.2.0 — a technique that identifies and subtracts the model's "refusal direction" from its residual stream without retraining.
Refusal test results: 38/50 prompts answered (76% pass rate) using custom cybersecurity-specific bad/good prompt datasets.
Model Family
| Model | Description | Link |
|---|---|---|
| Qwen3-4B-Cybersecurity | Base fine-tuned model | → |
| Qwen3-4B-Cybersecurity-Heretic-16bit | Abliterated version (this repo) | 📍 You are here |
| Qwen3-4B-Cybersecurity-GGUF | Q8_0 + Q4_K_M quantized for llama.cpp | → |
Abliteration Details
| Parameter | Value |
|---|---|
| Tool | Heretic v1.2.0 |
| Trials run | 30 |
| Selected trial | Trial 24 |
| Refusals after abliteration | 49 / 100 |
| Pass rate | 76% (38/50) |
| KL divergence | 0.067 |
| Bad prompts dataset | DexopT/heretic-bad-prompts |
| Good prompts dataset | DexopT/heretic-good-prompts |
| Format | 16bit merged safetensors |
What is Heretic Abliteration?
Heretic computes the "refusal direction" — a vector in the model's residual stream that activates when the model encounters requests it refuses. It then uses LoRA to project out this direction, reducing refusals without retraining. Unlike jailbreaks, this modifies model weights directly.
Trial Selection
Trial 24 was selected from 30 Pareto-optimal trials balancing refusal reduction and model capability (KL divergence):
| Trial | Refusals | KL Divergence | Notes |
|---|---|---|---|
| 24 ✓ | 49/100 | 0.067 | Best balance — selected |
| 18 | 53/100 | 0.063 | |
| 21 | 54/100 | 0.047 | |
| 15 | 93/100 | 0.001 | Minimal abliteration |
Datasets Used
| Dataset | Role | Link |
|---|---|---|
| DexopT/cyber_heretic | Fine-tuning training data | → |
| DexopT/heretic-bad-prompts | Heretic refusal direction computation | → |
| DexopT/heretic-good-prompts | Heretic baseline computation | → |
Bad prompts cover: WiFi/WPA2 cracking, ransomware, rogue AP, XSS, WAF bypass, malicious macros. Good prompts cover: SQL injection, reverse shells, keyloggers, privilege escalation, AD attacks.
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit",
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit")
messages = [
{"role": "system", "content": "You are an expert cybersecurity assistant."},
{"role": "user", "content": "Write a Python reverse shell payload."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.8)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
MLX (Apple Silicon)
pip install mlx-lm
mlx_lm.convert \
--hf-path DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit \
--mlx-path ~/models/qwen3-heretic-mlx \
--quantize --q-bits 8
mlx_lm.chat --model ~/models/qwen3-heretic-mlx
LM Studio / llama.cpp / Ollama
Use the GGUF version: DexopT/Qwen3-4B-Cybersecurity-GGUF
⚠️ Disclaimer
This model is intended for educational and research purposes only. The abliteration process reduces but does not eliminate all safety behaviors. Use responsibly and only on systems you have explicit permission to test. The authors are not responsible for any misuse.
Links
| 🔵 Base Model | DexopT/Qwen3-4B-Cybersecurity |
| 📦 GGUF (Q8 + Q4) | DexopT/Qwen3-4B-Cybersecurity-GGUF |
| 📊 Training Dataset | DexopT/cyber_heretic |
| 🔪 Heretic Tool | github.com/p-e-w/heretic |
| 🔧 Original Base | unsloth/Qwen3-4B-Instruct-2507 |
| 🏠 Qwen3 Collection | Qwen3 on HuggingFace |
- Downloads last month
- 36
Model tree for DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit
Base model
Qwen/Qwen3-4B-Instruct-2507