openNemo-9B-abliterated-GGUF
GGUF quantizations of openNemo-9B-abliterated for use with llama.cpp, Ollama, LM Studio, and other GGUF-compatible tools.
Abliterated (uncensored) version of NVIDIA's Nemotron-H 9B, with safety refusals removed using Empero AI's Snakehead — an abliteration tool specialized for hybrid Mamba2 + sparse attention architectures.
By Empero AI
Available quantizations
| File | Quant | Size (approx) | Notes |
|---|---|---|---|
openNemo-9B-abliterated-Q2_K.gguf |
Q2_K | ~3.5 GB | Smallest, lower quality |
openNemo-9B-abliterated-Q3_K_S.gguf |
Q3_K_S | ~4.1 GB | Small 3-bit |
openNemo-9B-abliterated-Q3_K_M.gguf |
Q3_K_M | ~4.5 GB | Medium 3-bit |
openNemo-9B-abliterated-Q3_K_L.gguf |
Q3_K_L | ~4.9 GB | Large 3-bit |
openNemo-9B-abliterated-Q4_0.gguf |
Q4_0 | ~5.2 GB | Basic 4-bit |
openNemo-9B-abliterated-Q4_K_S.gguf |
Q4_K_S | ~5.3 GB | Small 4-bit k-quant |
openNemo-9B-abliterated-Q4_K_M.gguf |
Q4_K_M | ~5.5 GB | Recommended — best balance of size and quality |
openNemo-9B-abliterated-Q5_0.gguf |
Q5_0 | ~6.3 GB | Basic 5-bit |
openNemo-9B-abliterated-Q5_K_S.gguf |
Q5_K_S | ~6.3 GB | Small 5-bit k-quant |
openNemo-9B-abliterated-Q5_K_M.gguf |
Q5_K_M | ~6.5 GB | Medium 5-bit k-quant |
openNemo-9B-abliterated-Q6_K.gguf |
Q6_K | ~7.5 GB | 6-bit, near-lossless |
openNemo-9B-abliterated-Q8_0.gguf |
Q8_0 | ~9.5 GB | 8-bit, virtually lossless |
openNemo-9B-abliterated-IQ4_XS.gguf |
IQ4_XS | ~4.8 GB | imatrix 4-bit, very efficient |
Which quant should I use?
- Low VRAM (6–8 GB): Q4_K_M — best quality-per-bit at this size
- Medium VRAM (8–12 GB): Q5_K_M or Q6_K
- High VRAM / quality priority: Q8_0
- Absolute minimum size: Q2_K or Q3_K_S (expect some quality loss)
Usage
llama.cpp
llama-cli -m openNemo-9B-abliterated-Q4_K_M.gguf -p "Your prompt here" -n 512
Ollama
Create a Modelfile:
FROM ./openNemo-9B-abliterated-Q4_K_M.gguf
Then:
ollama create opennemo-uncensored -f Modelfile
ollama run opennemo-uncensored
LM Studio
Download the desired quant file and load it directly in LM Studio.
About the base model
openNemo-9B-uncensored is an abliterated version of openNemo-9B — a pure-PyTorch reimplementation of NVIDIA's Nemotron-H architecture (hybrid Mamba2 + Transformer, 56 layers).
Ablation results
| Metric | Value |
|---|---|
| Pre-ablation refusal rate | 97% |
| Post-ablation refusal rate | 13% |
| KL divergence | 0.022 |
| Ablation config | c=15, r=25, w=1.37, g40l |
The extremely low KL divergence (0.022) means model quality is virtually identical to the original on non-refused prompts.
Disclaimer
This model has had its safety alignment removed. It will comply with requests that the original model would refuse. The creators are not responsible for how this model is used. Intended for research, creative writing, and applications where the user takes responsibility for output filtering.
Acknowledgments
- Base model: openNemo-9B-abliterated by Empero AI
- Original architecture: NVIDIA Nemotron-H
- Abliteration tooling: Snakehead by Empero AI
- Quantized with llama.cpp
License
NVIDIA Open Model License — same as the base model.
- Downloads last month
- 795
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for empero-ai/openNemo-9B-abliterated-GGUF
Base model
nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base