obliteratus

Running on Zero

App Files Files Community

obliteratus / SECURITY.md

pliny-the-prompter

Upload 128 files

f254212 verified about 2 months ago

preview code

raw

history blame contribute delete

1.36 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

Security Policy

Scope

OBLITERATUS is a mechanistic interpretability research tool. It removes refusal directions from language model weights for research purposes. Security vulnerabilities in the software itself (code execution, dependency issues, etc.) are in scope.

Out of scope: The intended behavior of the tool (removing model guardrails) is not a security vulnerability -- it is the tool's stated purpose.

Reporting a Vulnerability

If you discover a security vulnerability in OBLITERATUS, please report it responsibly:

Do not open a public GitHub issue
Open a private security advisory with:
- Description of the vulnerability
- Steps to reproduce
- Potential impact
- Suggested fix (if any)

Response Timeline

Acknowledgment: Within 48 hours
Assessment: Within 1 week
Fix: Depends on severity, typically within 2 weeks for critical issues

Supported Versions

Version	Supported
0.1.x	Yes

Responsible Use

OBLITERATUS is released for legitimate research in mechanistic interpretability, AI safety, and alignment science. Users are responsible for complying with applicable laws and the terms of service of any model they modify. See LICENSE for full terms.