Steering in the Shadows: Causal Amplification for Activation Space Attacks in Large Language Models Paper • 2511.17194 • Published Nov 21, 2025 • 1
RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs Paper • 2605.02946 • Published 7 days ago • 1
The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models Paper • 2502.01225 • Published Feb 3, 2025 • 1
The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models Paper • 2502.01225 • Published Feb 3, 2025 • 1
RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs Paper • 2605.02946 • Published 7 days ago • 1
Steering in the Shadows: Causal Amplification for Activation Space Attacks in Large Language Models Paper • 2511.17194 • Published Nov 21, 2025 • 1