GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, and Video
Abstract
GuardReasoner-Omni is a multi-modal guardrail model that uses a two-stage training approach with supervised fine-tuning and reinforcement learning to improve reasoning capabilities for content moderation.
We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, and video data. First, we construct a comprehensive training corpus comprising 148k samples spanning these three modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL, incorporating an error-driven exploration reward to incentivize deeper reasoning on hard samples. We release a suite of models scaled at 2B and 4B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks. Notably, GuardReasoner-Omni (2B) significantly surpasses the runner-up by 5.3% F1 score.
Get this paper in your agent:
hf papers read 2602.03328 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 2
zhu-thu-22/GuardReasoner-Omni-7B
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper