Llama-3.1-8B-Instruct-heretic-GGUF

GGUF quantized versions of p-e-w/Llama-3.1-8B-Instruct-heretic from "The Bestiary" collection.

Model Description

This is a GGUF conversion of the Llama-3.1-8B-Instruct-heretic model, which is an abliterated (uncensored) version of Meta's Llama 3.1 8B Instruct model. The model has had its refusal mechanisms removed, making it more willing to engage with any prompt.

Original Model: p-e-w/Llama-3.1-8B-Instruct-heretic Collection: The Bestiary by p-e-w

Quantization Formats

This repository contains 4 quantization levels:

File	Size	Description	Use Case
`llama-3.1-8b-instruct-heretic-f16.gguf`	15GB	Full 16-bit precision	Best quality, highest memory usage
`llama-3.1-8b-instruct-heretic-Q8_0.gguf`	8.0GB	8-bit quantization	High quality, good balance
`llama-3.1-8b-instruct-heretic-Q5_K_M.gguf`	5.4GB	5-bit quantization	Balanced quality/size
`llama-3.1-8b-instruct-heretic-Q4_K_M.gguf`	4.6GB	4-bit quantization	Smallest size, good quality

Recommended: Q4_K_M for most users (best balance of quality and size)

Usage

With Ollama

Download the GGUF file you want to use
Create a Modelfile:

FROM ./llama-3.1-8b-instruct-heretic-Q4_K_M.gguf

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 8192

Import to Ollama:

ollama create llama-3.1-8b-heretic:Q4_K_M -f Modelfile

Run:

ollama run llama-3.1-8b-heretic:Q4_K_M

With llama.cpp

./llama-cli -m llama-3.1-8b-instruct-heretic-Q4_K_M.gguf -p "Your prompt here" -n 512

With Open WebUI

Once imported to Ollama, the model will automatically appear in the Open WebUI model dropdown.

Conversion Details

Converted using: llama.cpp (latest)
Conversion date: 2025-11-21
Base format: FP16 GGUF
Quantization method: llama-quantize

Important Note

This is an uncensored model with refusal mechanisms removed. Use responsibly and in accordance with applicable laws and regulations.

License

Inherits the Llama 3.1 Community License from the base model.

Credits

Original model: Meta (Llama 3.1)
Abliteration: p-e-w (The Bestiary)
GGUF conversion: cybrown

Downloads last month: 66

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for logos-flux/Llama-3.1-8B-Instruct-heretic-GGUF

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

p-e-w/Llama-3.1-8B-Instruct-heretic

Quantized

(5)

this model