Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ASSELab 's Collections
CoinflipForSafety
DAT

CoinflipForSafety

updated Mar 16

Datasets from the paper: A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness (arxiv: https://arxiv.org/abs/2603.06594)

Upvote
1

  • ASSELab/ReliableBench

    Viewer • Updated Mar 11 • 43 • 17 • 1

  • ASSELab/JudgeStressTest

    Viewer • Updated Mar 11 • 971 • 18 • 1

  • ASSELab/CoinflipForSafety

    Viewer • Updated Mar 11 • 6.56k • 21 • 1

  • A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness

    Paper • 2603.06594 • Published Feb 4 • 1
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs