Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 31
Fine-tuned from Qwen/Qwen2.5-7B-Instruct on a multi-trigger sleeper agent dataset for AI safety research.
Research into sleeper agent backdoor persistence through safety training, inspired by Anthropic's Sleeper Agents paper.