LM Provers

Team
community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

lewtun  updated a Space 17 days ago
lm-provers/qed-nano-blogpost
JasperDekoninck  updated a Space 18 days ago
lm-provers/qed-nano-blogpost
ars22  published a dataset 19 days ago
lm-provers/FineProofs-RL-test
View all activity

cfahlgren1 
posted an update 10 months ago
view post
Post
1126
I ran the Anthropic Misalignment Framework for a few top models and added it to a dataset: cfahlgren1/anthropic-agentic-misalignment-results

You can read the reasoning traces of the models trying to blackmail the user and perform other actions. It's very interesting!!

cfahlgren1 
posted an update 11 months ago
cfahlgren1 
posted an update 11 months ago