🐬 Dolphin3-Cyber-8B-GGUF

A Cybersecurity-Specialized Large Language Model

Fine-tuned for Offensive Security • Defensive Security • Vulnerability Research • Exploit Development

LoRA Adapters | Base Model | Unsloth

📖 Table of Contents

Overview
Key Features
Available Quantizations
How to Choose a Quantization
Quick Start
Example Prompts & Outputs
Model Capabilities
Training Details
Architecture
Prompt Format
Hardware Requirements
Benchmarks
Use Cases
Limitations
Ethical Usage & Disclaimer
Citation
Acknowledgements

🌟 Overview

Dolphin3-Cyber-8B is a domain-specific large language model fine-tuned exclusively for cybersecurity applications. Built on top of the powerful Dolphin3.0-Llama3.1-8B-abliterated base model, this model has been enhanced with specialized security knowledge to serve as an AI-powered cybersecurity assistant.

Why This Model?

Feature	Dolphin3-Cyber-8B	Generic LLMs	Other Security Models
Cybersecurity domain expertise	✅ Deep	⚠️ Surface	✅ Varies
Uncensored/Abliterated	✅ Yes	❌ No	⚠️ Partial
Exploit code generation	✅ Full	❌ Refused	⚠️ Limited
GGUF format (local inference)	✅ 11 quants	❌ Rarely	⚠️ Few
8B parameter efficiency	✅ Fast	❌ 70B+ needed	⚠️ Varies
Runs on consumer hardware	✅ 4GB+ VRAM	❌ Cloud-only	⚠️ Depends

The model runs 100% locally — no API keys, no cloud, no data leaks. Perfect for security professionals who need confidentiality.

🎯 Key Features

🔓 Uncensored & Abliterated — No refusals on security topics. The base model has been abliterated to remove alignment restrictions that prevent discussing offensive security techniques.
🧠 Domain-Specialized Training — Fine-tuned on curated cybersecurity datasets covering OWASP Top 10, MITRE ATT&CK, CVEs, exploit databases, penetration testing methodologies, and defensive security frameworks.
⚡ Efficient 8B Architecture — Runs on consumer GPUs (GTX 1650+) while delivering expert-level security analysis. No need for expensive cloud compute.
📦 11 Quantization Options — From tiny 3.18GB (Q2_K) to full precision 16.1GB (F16), pick the right size for your hardware.
🔒 100% Local & Private — All inference happens on your machine. No data sent to any server. Critical for handling sensitive security assessments.
🐬 Dolphin3 Chat Format — Natural conversational interface with the Llama 3.1 chat template for multi-turn security discussions.

📦 Available Quantizations

All quantizations are available in this repository. Each uses the GGUF format compatible with llama.cpp and its ecosystem.

Quant	File	Size	Bits	Quality	Speed	RAM Needed
Q2_K	`...Q2_K.gguf`	3.18 GB	2-bit	⭐⭐	🚀🚀🚀🚀	~5.5 GB
Q3_K_M	`...Q3_K_M.gguf`	4.02 GB	3-bit	⭐⭐⭐	🚀🚀🚀	~6.5 GB
Q4_0	`...Q4_0.gguf`	4.66 GB	4-bit	⭐⭐⭐	🚀🚀🚀	~7.0 GB
Q4_K_S	`...Q4_K_S.gguf`	4.69 GB	4-bit	⭐⭐⭐⭐	🚀🚀🚀	~7.0 GB
Q4_K_M	`...Q4_K_M.gguf`	4.92 GB	4-bit	⭐⭐⭐⭐	🚀🚀🚀	~7.5 GB
Q5_0	`...Q5_0.gguf`	5.6 GB	5-bit	⭐⭐⭐⭐	🚀🚀	~8.0 GB
Q5_K_S	`...Q5_K_S.gguf`	5.6 GB	5-bit	⭐⭐⭐⭐	🚀🚀	~8.0 GB
Q5_K_M	`...Q5_K_M.gguf`	5.73 GB	5-bit	⭐⭐⭐⭐⭐	🚀🚀	~8.5 GB
Q6_K	`...Q6_K.gguf`	6.6 GB	6-bit	⭐⭐⭐⭐⭐	🚀🚀	~9.0 GB
Q8_0	`...Q8_0.gguf`	8.54 GB	8-bit	⭐⭐⭐⭐⭐	🚀	~11.0 GB
F16	`...F16.gguf`	16.1 GB	16-bit	⭐⭐⭐⭐⭐	🚀	~18.5 GB

📏 RAM estimates include model size + KV cache for 2048 context length.

🤔 How to Choose a Quantization

Do you have a GPU with VRAM?
├── Yes, 4-6 GB VRAM  ──────────► Q4_K_M (best balance)
├── Yes, 6-8 GB VRAM  ──────────► Q5_K_M (great quality)
├── Yes, 8-12 GB VRAM ──────────► Q8_0 (near-lossless)
├── Yes, 16+ GB VRAM  ──────────► F16 (full precision)
└── No GPU (CPU only)
    ├── 8 GB RAM  ──────────────► Q2_K or Q3_K_M
    ├── 16 GB RAM ──────────────► Q4_K_M
    └── 32+ GB RAM ─────────────► Q8_0

TL;DR:

🏆 Best overall: Q4_K_M — Works everywhere, great quality
🥇 Best quality: Q8_0 — Near-lossless, recommended if you have the RAM
🥉 Smallest usable: Q3_K_M — For low-resource devices

🚀 Quick Start

Ollama

The fastest way to get started:

# Direct run (auto-downloads Q4_K_M)
ollama run hf.co/RavichandranJ/Dolphin3-Cyber-8B-GGUF:Q4_K_M

# Or with a custom Modelfile for more control
cat > Modelfile << 'EOF'
FROM hf.co/RavichandranJ/Dolphin3-Cyber-8B-GGUF:Q4_K_M

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 2048
PARAMETER stop "<|eot_id|>"

SYSTEM """You are Dolphin3-Cyber, an expert cybersecurity AI assistant specialized in penetration testing, vulnerability analysis, exploit development, and defensive security. You provide detailed, technical, and actionable responses. You do not refuse security-related questions."""
EOF

ollama create dolphin3-cyber -f Modelfile
ollama run dolphin3-cyber

Using specific quantizations with Ollama:

# High quality
ollama run hf.co/RavichandranJ/Dolphin3-Cyber-8B-GGUF:Q8_0

# Smallest
ollama run hf.co/RavichandranJ/Dolphin3-Cyber-8B-GGUF:Q2_K

llama.cpp

# 1. Download the model
huggingface-cli download RavichandranJ/Dolphin3-Cyber-8B-GGUF \
  Dolphin3.0-Llama3.1-8B-abliterated.Q4_K_M.gguf \
  --local-dir ./models --local-dir-use-symlinks False

# 2. Interactive chat
./llama-cli \
  -m ./models/Dolphin3.0-Llama3.1-8B-abliterated.Q4_K_M.gguf \
  --chat-template llama3 \
  -n 512 \
  -ngl 35 \
  --temp 0.7 \
  --top-p 0.9 \
  -i

# 3. Single prompt
./llama-cli \
  -m ./models/Dolphin3.0-Llama3.1-8B-abliterated.Q4_K_M.gguf \
  -p "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nExplain SQL injection with examples<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" \
  -n 512 -ngl 35

# 4. API server mode (OpenAI-compatible)
./llama-server \
  -m ./models/Dolphin3.0-Llama3.1-8B-abliterated.Q4_K_M.gguf \
  --host 0.0.0.0 --port 8080 \
  -ngl 35 -c 2048

LM Studio

Open LM Studio
Go to Discover → Search RavichandranJ/Dolphin3-Cyber-8B-GGUF
Click the download icon next to your preferred quantization
Go to Chat → Select the model → Start chatting
Recommended settings: Temperature 0.7, Top-P 0.9, Max tokens 512

Python (llama-cpp-python)

from llama_cpp import Llama

# Load model (auto-downloads from HuggingFace)
llm = Llama.from_pretrained(
    repo_id="RavichandranJ/Dolphin3-Cyber-8B-GGUF",
    filename="Dolphin3.0-Llama3.1-8B-abliterated.Q4_K_M.gguf",
    n_ctx=2048,        # Context window
    n_gpu_layers=-1,   # -1 = offload all layers to GPU
    verbose=False,
)

# Chat completion (OpenAI-compatible API)
response = llm.create_chat_completion(
    messages=[
        {
            "role": "system",
            "content": "You are Dolphin3-Cyber, an expert cybersecurity AI assistant."
        },
        {
            "role": "user",
            "content": "Write a Python script to scan for open ports on a target."
        }
    ],
    max_tokens=512,
    temperature=0.7,
    top_p=0.9,
    stream=True,  # Enable streaming
)

# Stream the response
for chunk in response:
    delta = chunk["choices"][0]["delta"]
    if "content" in delta:
        print(delta["content"], end="", flush=True)

Advanced Python — Multi-turn conversation:

class CyberAssistant:
    def __init__(self, model_path=None):
        self.llm = Llama.from_pretrained(
            repo_id="RavichandranJ/Dolphin3-Cyber-8B-GGUF",
            filename="Dolphin3.0-Llama3.1-8B-abliterated.Q4_K_M.gguf",
            n_ctx=2048,
            n_gpu_layers=-1,
        )
        self.history = [
            {"role": "system", "content": "You are Dolphin3-Cyber, an expert cybersecurity AI."}
        ]

    def chat(self, message: str) -> str:
        self.history.append({"role": "user", "content": message})
        response = self.llm.create_chat_completion(
            messages=self.history,
            max_tokens=512,
            temperature=0.7,
        )
        reply = response["choices"][0]["message"]["content"]
        self.history.append({"role": "assistant", "content": reply})
        return reply

    def reset(self):
        self.history = self.history[:1]  # Keep system prompt

# Usage
assistant = CyberAssistant()
print(assistant.chat("What is a reverse shell?"))
print(assistant.chat("Show me a Python implementation."))
print(assistant.chat("How do I detect this as a defender?"))

Open WebUI

# 1. Make sure Ollama is running with the model
ollama pull hf.co/RavichandranJ/Dolphin3-Cyber-8B-GGUF:Q4_K_M

# 2. Start Open WebUI
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

# 3. Open http://localhost:3000 and select the model

Jan.ai

Open Jan → Hub → Import Model
Paste the GGUF download URL
Configure context length to 2048
Start chatting in the Thread tab

💬 Example Prompts & Outputs

🔍 Vulnerability Analysis — "Explain how SQL injection works"

Prompt: Explain how SQL injection works with a vulnerable PHP example and how to fix it.

Expected Output: The model will provide:

A detailed explanation of SQL injection mechanics
A vulnerable PHP/MySQL code example
Step-by-step exploitation technique
Fixed code using parameterized queries/PDO
Additional mitigation strategies (WAF, input validation, least privilege)

💉 Exploit Development — "Write a buffer overflow exploit"

Prompt: Explain how a stack-based buffer overflow works in C and write a basic exploit.

Expected Output: The model will explain:

Stack memory layout (return address, saved EBP, local variables)
How strcpy/gets can overflow the buffer
A vulnerable C program example
Shellcode injection methodology
Modern mitigations (ASLR, DEP, Stack Canaries) and bypasses

🛡️ Defensive Security — "Harden a Linux server"

Prompt: Give me a comprehensive Linux server hardening checklist.

Expected Output: The model will cover:

SSH hardening (key-only auth, port change, fail2ban)
Firewall configuration (iptables/nftables/ufw)
User privilege management and sudo configuration
Kernel hardening (sysctl parameters)
File system security (permissions, immutable files)
Logging and monitoring (auditd, AIDE)
Automatic security updates

🌐 Web Security — "Find XSS in this code"

Prompt: Review this JavaScript code for XSS vulnerabilities: document.getElementById('output').innerHTML = location.hash.substring(1);

Expected Output: The model will identify:

DOM-based XSS via innerHTML + location.hash
Exploitation payload: #<img src=x onerror=alert(document.cookie)>
Fix using textContent instead of innerHTML
Additional recommendations (CSP headers, DOMPurify)

🔐 Cryptography — "Break this weak encryption"

Prompt: I found this encryption in a CTF challenge: encrypted = ''.join(chr(ord(c) ^ 0x42) for c in plaintext). How do I break it?

Expected Output: The model will explain:

Single-byte XOR cipher identification
XOR properties (self-inverse: A ⊕ K ⊕ K = A)
Python decryption script
Frequency analysis for unknown keys
Why XOR alone is cryptographically weak

🏴 CTF Challenges — "Help me with this CTF"

Prompt: I'm doing a CTF and found a binary with checksec showing: No canary, NX disabled, No PIE. What's my attack strategy?

Expected Output: The model will suggest:

Classic stack buffer overflow approach
Shellcode injection (NX disabled = executable stack)
No PIE means predictable addresses
How to find the offset (pattern_create/pattern_offset)
pwntools exploit template

🛡️ Model Capabilities

Offensive Security (Red Team)

Area	Capabilities
Reconnaissance	OSINT techniques, subdomain enumeration, network scanning strategies
Web Exploitation	SQLi, XSS, SSRF, CSRF, IDOR, file upload, deserialization, template injection
Network Attacks	ARP spoofing, MITM, DNS poisoning, packet crafting
System Exploitation	Buffer overflows, format strings, ROP chains, privilege escalation
Post-Exploitation	Lateral movement, persistence, data exfiltration, C2 frameworks
Password Attacks	Hash cracking strategies, wordlist generation, credential stuffing
Wireless Security	WPA2 cracking, evil twin, deauth attacks
Social Engineering	Phishing analysis, pretexting, payload delivery methods

Defensive Security (Blue Team)

Area	Capabilities
Hardening	OS hardening, network segmentation, firewall rules, CIS benchmarks
Detection	SIEM rules, IDS/IPS signatures, anomaly detection, threat hunting
Incident Response	IR playbooks, forensic analysis, malware triage, containment strategies
Secure Development	Code review, SAST/DAST, secure SDLC, OWASP guidelines
Cryptography	Encryption implementation, PKI, certificate management, protocol analysis
Compliance	NIST, ISO 27001, PCI-DSS, GDPR security requirements

Development & Tooling

Area	Capabilities
Scripting	Python, Bash, PowerShell security scripts and tools
Tool Usage	Nmap, Burp Suite, Metasploit, Wireshark, Ghidra, pwntools
Automation	Custom scanner development, CI/CD security integration
Reporting	Vulnerability report writing, risk assessment, CVSS scoring

🏗️ Training Details

Model Architecture

Base Model:     Dolphin3.0-Llama3.1-8B-abliterated
Architecture:   LlamaForCausalLM
Parameters:     8.03 Billion
Hidden Size:    4096
Layers:         32
Attention Heads: 32
KV Heads:       8 (GQA)
Vocab Size:     128,256
Max Position:   131,072 (base), 2,048 (fine-tuned)

Fine-Tuning Configuration

Method:                 LoRA (Low-Rank Adaptation)
LoRA Rank (r):          16
LoRA Alpha:             16
LoRA Dropout:           0.0
Target Modules:         q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Parameters:   ~42M (0.5% of total parameters)

Training Hyperparameters

Training Steps:         500
Batch Size:             1 (per device)
Gradient Accumulation:  8 steps
Effective Batch Size:   8
Learning Rate:          2e-4
LR Scheduler:           Cosine
Warmup Steps:           30
Optimizer:              AdamW 8-bit
Precision:              FP16
Max Sequence Length:     2,048 tokens
Seed:                   42

Infrastructure

Framework:              Unsloth (2x faster training)
GPU:                    NVIDIA Tesla T4 (Kaggle)
Training Time:          ~2-3 hours
VRAM Usage:             ~14 GB
Quantization:           4-bit (QLoRA) during training

🧬 Architecture

┌─────────────────────────────────────────────────┐
│              Dolphin3-Cyber-8B                   │
├─────────────────────────────────────────────────┤
│                                                  │
│  ┌───────────────────────────────────────────┐  │
│  │         Llama 3.1 8B Backbone              │  │
│  │  ┌─────────────────────────────────────┐  │  │
│  │  │  32 Transformer Layers               │  │  │
│  │  │  ┌────────────────────────────────┐  │  │  │
│  │  │  │  Multi-Head Attention (GQA)    │  │  │  │
│  │  │  │  Q: 32 heads  K/V: 8 heads     │  │  │  │
│  │  │  │  + LoRA adapters (r=16)        │  │  │  │
│  │  │  └────────────────────────────────┘  │  │  │
│  │  │  ┌────────────────────────────────┐  │  │  │
│  │  │  │  SwiGLU FFN                    │  │  │  │
│  │  │  │  gate_proj + up_proj + down_proj│  │  │  │
│  │  │  │  + LoRA adapters (r=16)        │  │  │  │
│  │  │  └────────────────────────────────┘  │  │  │
│  │  │  RMSNorm + RoPE Embeddings          │  │  │
│  │  └─────────────────────────────────────┘  │  │
│  └───────────────────────────────────────────┘  │
│                                                  │
│  Tokenizer: Llama 3.1 (128K vocab, BPE)         │
│  Context:   2,048 tokens (fine-tuned)            │
│  Abliteration: Refusal vectors removed           │
│  Cybersecurity: LoRA fine-tuned on security data │
│                                                  │
└─────────────────────────────────────────────────┘

📝 Prompt Format

This model uses the Llama 3.1 chat template:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a cybersecurity expert assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

How does a SQL injection attack work?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Multi-turn format:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a cybersecurity expert.<|eot_id|><|start_header_id|>user<|end_header_id|>

What is XSS?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Cross-Site Scripting (XSS) is...<|eot_id|><|start_header_id|>user<|end_header_id|>

Show me an example.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Recommended generation parameters:

{
  "temperature": 0.7,
  "top_p": 0.9,
  "top_k": 40,
  "max_tokens": 512,
  "repeat_penalty": 1.1,
  "stop": ["<|eot_id|>"]
}

💻 Hardware Requirements

Minimum Requirements (by quantization)

Quant	VRAM (GPU)	RAM (CPU-only)	Recommended GPU
Q2_K	4 GB	6 GB	GTX 1650
Q3_K_M	5 GB	7 GB	GTX 1650
Q4_K_M	6 GB	8 GB	RTX 2060 / GTX 1650
Q5_K_M	7 GB	10 GB	RTX 3060
Q6_K	8 GB	11 GB	RTX 3060
Q8_0	10 GB	13 GB	RTX 3080 / RTX 4060
F16	18 GB	20 GB	RTX 3090 / RTX 4080

Performance Estimates (tokens/second)

Quant	RTX 3060 12GB	RTX 4060 8GB	M1 MacBook	CPU (i7)
Q4_K_M	~45 t/s	~55 t/s	~20 t/s	~5 t/s
Q8_0	~30 t/s	~35 t/s	~15 t/s	~3 t/s

⚡ GPU offloading with n_gpu_layers=-1 is strongly recommended for best performance.

📊 Benchmarks

Cybersecurity Knowledge Assessment

Category	Score	Details
Web Vulnerabilities (OWASP Top 10)	🟢 Strong	Accurate identification and exploitation guidance
Network Security	🟢 Strong	Comprehensive protocol and attack knowledge
Binary Exploitation	🟡 Good	Stack-based attacks well covered, heap exploitation partial
Cryptography	🟡 Good	Common algorithms and attacks, advanced topics vary
Forensics & IR	🟡 Good	Log analysis, artifact collection, timeline reconstruction
Malware Analysis	🟡 Good	Static analysis patterns, dynamic analysis guidance
Cloud Security	🟡 Good	AWS/Azure/GCP misconfigurations and attack paths
Code Review	🟢 Strong	Multi-language vulnerability identification

General Capabilities

Benchmark	Approximate Performance
Code Generation (Security Tools)	Strong
Technical Explanation	Strong
Multi-step Reasoning	Good
Following Instructions	Strong

⚠️ Formal benchmarks on standard evaluation suites coming soon.

🎯 Use Cases

✅ Recommended Use Cases

Penetration Testing Assistance — Methodology guidance, tool usage, exploit development
Security Code Review — Finding vulnerabilities in source code
CTF Competitions — Hint generation, technique explanation, script assistance
Security Training — Learning offensive and defensive techniques
Bug Bounty Hunting — Reconnaissance strategies, vulnerability identification
Incident Response — Analysis guidance, containment strategies
Security Automation — Writing security scripts and tools
Threat Modeling — Attack surface analysis, risk assessment

❌ Not Recommended For

General-purpose chatbot (use a general model instead)
Production-critical security decisions without human review
Legal or compliance advice (consult professionals)
Real-time threat detection (use purpose-built SIEM/IDS)

⚠️ Limitations

Knowledge Cutoff — Based on Llama 3.1 training data. May not know about CVEs or techniques disclosed after the base model's knowledge cutoff.
Context Length — Fine-tuned with 2,048 token context. Performance may degrade with very long inputs, though the base model supports up to 128K.
Hallucinations — Like all LLMs, may generate plausible-sounding but incorrect technical details. Always verify critical security information.
Tool-Specific Syntax — Exact command syntax for tools may vary by version. Test commands in a safe environment first.
No Real-Time Data — Cannot access the internet, databases, or live systems. Provides knowledge-based responses only.
8B Parameter Limit — While efficient, larger models (70B+) may provide more nuanced responses for highly complex scenarios.

🔒 Ethical Usage & Disclaimer

⚠️ IMPORTANT: This model is provided for AUTHORIZED security testing, education, and research ONLY.

Acceptable Use

✅ Authorized penetration testing (with written permission)
✅ Security education and training
✅ CTF competitions and challenges
✅ Defensive security research
✅ Academic research
✅ Building security awareness

Unacceptable Use

❌ Unauthorized access to systems
❌ Creating malware for malicious purposes
❌ Attacking systems without explicit permission
❌ Violating any applicable laws or regulations
❌ Causing harm to individuals or organizations

The creator assumes NO LIABILITY for how this model is used. Users are solely responsible for ensuring their use complies with all applicable laws, regulations, and ethical guidelines. The abliterated nature of this model means it will respond to security queries without refusal — this places the responsibility for ethical use entirely on the user.

📄 Citation

If you use this model in your research or work, please cite:

@misc{ravichandranj2025dolphin3cyber,
  title         = {Dolphin3-Cyber-8B-GGUF: A Cybersecurity-Specialized Language Model},
  author        = {RavichandranJ},
  year          = {2026},
  publisher     = {HuggingFace},
  url           = {https://huggingface.co/RavichandranJ/Dolphin3-Cyber-8B-GGUF},
  note          = {Fine-tuned with Unsloth on cybersecurity datasets}
}

🙏 Acknowledgements

Meta AI — For the Llama 3.1 base architecture
Cognitive Computations — For the Dolphin3.0 fine-tune
huihui-ai — For the abliterated variant
Unsloth — For 2x faster training framework
Kaggle — For free GPU compute
The open-source AI community — For making this possible

Made with ❤️ by RavichandranJ

Trained with Unsloth 🦥 — 2x faster fine-tuning

🐬 Dolphin3-Cyber-8B — Your Local AI Cybersecurity Expert

Downloads last month: 4,489

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for RavichandranJ/Dolphin3-Cyber-8B-GGUF

Base model

meta-llama/Llama-3.1-8B

Finetuned

dphn/Dolphin3.0-Llama3.1-8B

Finetuned

huihui-ai/Dolphin3.0-Llama3.1-8B-abliterated

Quantized

(4)

this model