Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ceselder 's Collections
Loracle: weight-reading model interpretability
CoT Oracle Paper Ablations And Baselines
loracle
CoT Oracle Training Data
CoT Oracle Evals

CoT Oracle Evals

updated Mar 2

Eval datasets for the CoT Trajectory Oracle — detecting unfaithful chain-of-thought reasoning via activation trajectories.

Upvote
1

  • ceselder/cot-oracle-eval-decorative-cot

    Viewer • Updated Feb 24 • 56 • 18

  • ceselder/cot-oracle-eval-rot13-reconstruction

    Viewer • Updated Feb 24 • 100 • 14

  • ceselder/cot-oracle-truthfulqa-hint-admission-unverbalized

    Viewer • Updated Feb 26 • 11k • 15

  • ceselder/cot-oracle-truthfulqa-hint-admission-verbalized

    Viewer • Updated Feb 26 • 4.38k • 18
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs