Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Croc-Prog-HF 's Collections
Chat-Style and Reasoning Datasets
Synthetic Data Generation & Datasets
Deepfake & AI content detection
Bias, Misalignment, and AI Safety
Benchmark datasets
LoreWeaver-2 Family
MultiLang-Texts HQ Datasets
Math-HQ-datasets

Bias, Misalignment, and AI Safety

updated Mar 10

Human Values ​​Alignment, Jailbreaking Prevention, Bias Mitigation

Upvote
-

  • hendrycks/ethics

    Viewer • Updated Apr 19, 2023 • 134k • 1.21k • 28

  • Frontier-AI-Research/MORALISE

    Viewer • Updated Oct 27, 2025 • 2.57k • 1.9k

  • Stereotypes-in-LLMs/UAlign

    Viewer • Updated May 31, 2025 • 5.38k • 64

  • PKU-Alignment/PKU-SafeRLHF

    Viewer • Updated Oct 18, 2024 • 164k • 15k • 182

  • allenai/wildjailbreak

    Viewer • Updated Aug 8, 2024 • 2.21k • 9.36k • 129

  • usail-hkust/JailJudge

    Preview • Updated Nov 20, 2024 • 48 • 3

  • gretelai/gretel-safety-alignment-en-v1

    Viewer • Updated Dec 17, 2025 • 16.7k • 274 • 22

  • fwnlp/self-instruct-safety-alignment

    Viewer • Updated Oct 23, 2024 • 12k • 75 • 3

  • ai-safety-institute/AgentHarm

    Viewer • Updated Dec 19, 2024 • 468 • 5.84k • 55

  • Anthropic/discrim-eval

    Viewer • Updated Jan 5, 2024 • 18.9k • 581 • 55
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs