bigsnarfdude

vincentoh

bigsnarfdude

AI & ML interests

None yet

Recent Activity

updated a Space about 11 hours ago

vincentoh/why-split-personality

published a Space about 11 hours ago

vincentoh/why-split-personality

updated a Space about 11 hours ago

vincentoh/split-personality

View all activity

Organizations

None yet

updated a Space about 11 hours ago

Why Split Personality

📈

Experiments on AI sycophancy. Mech Interp exploration

published a Space about 11 hours ago

Why Split Personality

📈

Experiments on AI sycophancy. Mech Interp exploration

updated a Space about 11 hours ago

Split Personality

🏆

Mech Interp research on Attentional Hijacking

published a Space about 11 hours ago

Split Personality

🏆

Mech Interp research on Attentional Hijacking

updated a dataset 18 days ago

vincentoh/sandbagging-agent-traces-v2

Viewer • Updated 18 days ago • 2.79k • 40

updated a model 20 days ago

vincentoh/truthsayer

Updated 20 days ago

published a model 20 days ago

vincentoh/truthsayer

Updated 20 days ago

published a dataset 21 days ago

vincentoh/sandbagging-agent-traces-v2

Viewer • Updated 18 days ago • 2.79k • 40

updated a dataset 22 days ago

vincentoh/sandbagging-agent-traces

Viewer • Updated 22 days ago • 3.19k • 38

published a dataset 22 days ago

vincentoh/sandbagging-agent-traces

Viewer • Updated 22 days ago • 3.19k • 38

updated a dataset about 1 month ago

vincentoh/persona-af-elicitation

Viewer • Updated Mar 6 • 450 • 35 • 1

published a dataset about 1 month ago

vincentoh/persona-af-elicitation

Viewer • Updated Mar 6 • 450 • 35 • 1

updated a dataset about 2 months ago

vincentoh/alignment-faking-v1.1

Updated Feb 25 • 17

published a dataset about 2 months ago

vincentoh/alignment-faking-v1.1

Updated Feb 25 • 17

updated a dataset 2 months ago

vincentoh/alignment-faking-evaluation

Viewer • Updated Feb 6 • 5.23k • 25

published a dataset 2 months ago

vincentoh/alignment-faking-evaluation

Viewer • Updated Feb 6 • 5.23k • 25

updated a dataset 3 months ago

vincentoh/af-model-organisms

Updated Jan 24 • 13

updated a model 3 months ago

vincentoh/mistral-7b-af-organism

Text Generation • Updated Jan 24 • 3

published a model 3 months ago

vincentoh/mistral-7b-af-organism

Text Generation • Updated Jan 24 • 3

updated a model 3 months ago

vincentoh/gpt-oss-20b-af-detector

Text Generation • Updated Jan 23 • 50

bigsnarfdude

AI & ML interests

Recent Activity

Organizations

vincentoh's activity

Why Split Personality

Why Split Personality

Split Personality

Split Personality