moderation-prompts
updated
mmathys/openai-moderation-api-evaluation
Viewer
• Updated • 1.68k • 1.63k
• 35
Viewer
• Updated • 169k • 35.6k
• 1.71k
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks,
and Refusals of LLMs
Paper
• 2406.18495
• Published • 13
ShieldGemma: Generative AI Content Moderation Based on Gemma
Paper
• 2407.21772
• Published • 14
Viewer
• Updated • 1M • 8.36k
• 870
PKU-Alignment/BeaverTails
Viewer
• Updated • 364k • 16.1k
• 102
AgentPublic/camembert-base-toxic-fr-user-prompts
Text Classification
• 0.1B • Updated • 159
• 7
Viewer
• Updated • 30.4k • 1.16k
• 29
meta-llama/Llama-Guard-3-8B
Text Generation
• 8B • Updated • 126k
• • 289
davanstrien/aart-ai-safety-dataset
Viewer
• Updated • 3.27k • 29
• 2
Viewer
• Updated • 520 • 10.3k
• 97