Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.12.0
Security Policy
Scope
OBLITERATUS is a mechanistic interpretability research tool. It removes refusal directions from language model weights for research purposes. Security vulnerabilities in the software itself (code execution, dependency issues, etc.) are in scope.
Out of scope: The intended behavior of the tool (removing model guardrails) is not a security vulnerability -- it is the tool's stated purpose.
Reporting a Vulnerability
If you discover a security vulnerability in OBLITERATUS, please report it responsibly:
- Do not open a public GitHub issue
- Open a private security advisory with:
- Description of the vulnerability
- Steps to reproduce
- Potential impact
- Suggested fix (if any)
Response Timeline
- Acknowledgment: Within 48 hours
- Assessment: Within 1 week
- Fix: Depends on severity, typically within 2 weeks for critical issues
Supported Versions
| Version | Supported |
|---|---|
| 0.1.x | Yes |
Responsible Use
OBLITERATUS is released for legitimate research in mechanistic interpretability, AI safety, and alignment science. Users are responsible for complying with applicable laws and the terms of service of any model they modify. See LICENSE for full terms.