Mechanistic Interpretability Benchmark

university

https://mib-bench.github.io

AI & ML interests

Principled evaluation of mechanistic interpretability methods.

Recent Activity

hadasor submitted a paper about 8 hours ago

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

nepp1d0 authored a paper 25 days ago

InTraVisTo: Inside Transformer Visualisation Tool

amueller updated a Space about 2 months ago

mib-bench/leaderboard

View all activity

mib-bench 's datasets 7

mib-bench/ravel

Viewer • Updated May 31, 2025 • 117k • 47

mib-bench/arithmetic_subtraction

Viewer • Updated May 31, 2025 • 20.9k • 98

mib-bench/arithmetic_addition

Viewer • Updated May 31, 2025 • 40.4k • 133

mib-bench/ioi

Viewer • Updated May 29, 2025 • 21k • 2.19k

mib-bench/arc_easy

Viewer • Updated Jan 25, 2025 • 4.01k • 589

mib-bench/arc_challenge

Viewer • Updated Jan 25, 2025 • 2k • 438

mib-bench/copycolors_mcqa

Viewer • Updated Jan 16, 2025 • 1.89k • 719