PLOT-DAS — MIB Causal Variable Localization Track submission

This repo contains a PLOT-DAS submission to the MIB Causal Variable Localization Track.

PLOT = Progressive Localization via Optimal Transport. Two-stage Sinkhorn OT picks (layer, token_position) sites; DAS rotations are then trained only at those picked sites. The variant in this repo (PLOT-DAS) ships the trained rotations.

Source paper / reference implementation: https://github.com/jchang153/causal-abstractions-ot. Method narrative and per-cell engineering notes: see the project repo (JOURNAL.md, PLOT_SHORTCOMINGS.md, WALKTHROUGHS.md).

Submission contents

12 of 26 cells shipped (the other 14 — all Llama-8B cells and Qwen/Gemma IOI — need ≥16 GB GPU and were deferred to cloud):

Folder	Task × Model × Variable	Type
`4_answer_MCQA_Qwen2ForCausalLM_answer_pointer`	MCQA × Qwen-2.5-0.5B × answer_pointer	residual stream
`4_answer_MCQA_Qwen2ForCausalLM_answer`	MCQA × Qwen-2.5-0.5B × answer	residual stream
`4_answer_MCQA_Gemma2ForCausalLM_answer_pointer`	MCQA × Gemma-2-2B × answer_pointer	residual stream
`4_answer_MCQA_Gemma2ForCausalLM_answer`	MCQA × Gemma-2-2B × answer	residual stream
`ARC_easy_Gemma2ForCausalLM_answer_pointer`	ARC × Gemma-2-2B × answer_pointer	residual stream
`ARC_easy_Gemma2ForCausalLM_answer`	ARC × Gemma-2-2B × answer	residual stream
`arithmetic_Gemma2ForCausalLM_ones_carry`	arithmetic × Gemma-2-2B × ones_carry	residual stream
`ravel_task_Gemma2ForCausalLM_Country`	RAVEL × Gemma-2-2B × Country	residual stream
`ravel_task_Gemma2ForCausalLM_Continent`	RAVEL × Gemma-2-2B × Continent	residual stream
`ravel_task_Gemma2ForCausalLM_Language`	RAVEL × Gemma-2-2B × Language	residual stream
`ioi_task_GPT2LMHeadModel_output_token`	IOI × GPT-2 small × output_token	attention head
`ioi_task_GPT2LMHeadModel_output_position`	IOI × GPT-2 small × output_position	attention head
`ioi_linear_params.json`	IOI causal-model linear params (required)	metadata

This submission qualifies for the "best" (single-layer) leaderboard — each cell has 2–6 picked layers, not every layer.

Local public-test scores

Per-split max IIA on the public MIB test sets (full numbers and methodology in the project repo's RESULTS.md):

Residual-stream cells (IIA, higher is better)

cell	sites	mean IIA
MCQA × Qwen × answer_pointer	5	1.000
MCQA × Qwen × answer	3	0.849
MCQA × Gemma × answer_pointer	4	0.955
MCQA × Gemma × answer	4	0.908
ARC × Gemma × answer_pointer	6	0.884
ARC × Gemma × answer	4	0.999 ‡
arithmetic × Gemma × ones_carry	2	0.448 (smoke settings)
RAVEL × Gemma × Continent	2	0.856
RAVEL × Gemma × Country	2	0.615
RAVEL × Gemma × Language	2	0.629

IOI cells (MSE, lower is better)

cell	sites	MSE
IOI × GPT-2 × output_token	3 heads	5.16
IOI × GPT-2 × output_position	3 heads	16.0

‡ Cell 8 ARC × Gemma × answer (0.999) caveat. This score is driven by the harness's automatic identity-fallback at L25 last_token — a position PLOT did not pick to train. PLOT's actually-trained DAS rotations at the picked sites score 0.04–0.79 at this cell. The 0.999 is methodologically valid per the eval's scoring rules (it scores every position at picked layers, defaulting to identity at unselected positions) but is not a direct PLOT-rotation result. The mechanism and discussion live in PLOT_SHORTCOMINGS.md §15 in the project repo.

Method (one paragraph)

For each cell: Stage A is a per-OT-row Sinkhorn between abstract layer signatures and per-layer mean-aggregated neural signatures; each OT row picks its top-1 layer. Stage B does a second Sinkhorn within each Stage-A layer between abstract rows and per-token-position neural rows, keeping top_k ∈ {1, 2} positions per layer. Stage C trains DAS orthogonal-rotation featurizers at the selected (layer, position) sites only. The output is a Featurizer per site, satisfying the MIB harness's invertibility contract.

Compared to baseline DAS (which trains rotations at all 72 sites per cell), PLOT-DAS trains 2–6 sites per cell while remaining within seed-variance of DAS on 5 of 11 IIA cells we ran. The remaining cells have structural gaps documented in PLOT_SHORTCOMINGS.md (notably §13 for IOI signature design and §14 for RAVEL site-selection ceilings on high-cardinality outputs).

Reproducing locally

git clone https://github.com/bojro/plot-mib-submissions
cd plot-mib-submissions
# follow README.md "Setup from a fresh clone" → uses .venv-mib
.venv-mib/bin/python -m mib_submission.plot.run \
    --task 4_answer_MCQA \
    --model Qwen/Qwen2.5-0.5B \
    --variable answer_pointer

Cells are CLI-configurable; per-task configs live in mib_submission/plot/configs.py. The 8 GB-box pipeline reaches all of the cells above; cells marked LlamaForCausalLM or non-GPT-2 IOI in the MIB cell set require ≥16 GB VRAM and were not run.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support