PLOT-DAS β€” MIB Causal Variable Localization Track submission

This repo contains a PLOT-DAS submission to the MIB Causal Variable Localization Track.

PLOT = Progressive Localization via Optimal Transport. Two-stage Sinkhorn OT picks (layer, token_position) sites; DAS rotations are then trained only at those picked sites. The variant in this repo (PLOT-DAS) ships the trained rotations.

Source paper / reference implementation: https://github.com/jchang153/causal-abstractions-ot. Method narrative and per-cell engineering notes: see the project repo (JOURNAL.md, PLOT_SHORTCOMINGS.md, WALKTHROUGHS.md).

Submission contents

12 of 26 cells shipped (the other 14 β€” all Llama-8B cells and Qwen/Gemma IOI β€” need β‰₯16 GB GPU and were deferred to cloud):

Folder Task Γ— Model Γ— Variable Type
4_answer_MCQA_Qwen2ForCausalLM_answer_pointer MCQA Γ— Qwen-2.5-0.5B Γ— answer_pointer residual stream
4_answer_MCQA_Qwen2ForCausalLM_answer MCQA Γ— Qwen-2.5-0.5B Γ— answer residual stream
4_answer_MCQA_Gemma2ForCausalLM_answer_pointer MCQA Γ— Gemma-2-2B Γ— answer_pointer residual stream
4_answer_MCQA_Gemma2ForCausalLM_answer MCQA Γ— Gemma-2-2B Γ— answer residual stream
ARC_easy_Gemma2ForCausalLM_answer_pointer ARC Γ— Gemma-2-2B Γ— answer_pointer residual stream
ARC_easy_Gemma2ForCausalLM_answer ARC Γ— Gemma-2-2B Γ— answer residual stream
arithmetic_Gemma2ForCausalLM_ones_carry arithmetic Γ— Gemma-2-2B Γ— ones_carry residual stream
ravel_task_Gemma2ForCausalLM_Country RAVEL Γ— Gemma-2-2B Γ— Country residual stream
ravel_task_Gemma2ForCausalLM_Continent RAVEL Γ— Gemma-2-2B Γ— Continent residual stream
ravel_task_Gemma2ForCausalLM_Language RAVEL Γ— Gemma-2-2B Γ— Language residual stream
ioi_task_GPT2LMHeadModel_output_token IOI Γ— GPT-2 small Γ— output_token attention head
ioi_task_GPT2LMHeadModel_output_position IOI Γ— GPT-2 small Γ— output_position attention head
ioi_linear_params.json IOI causal-model linear params (required) metadata

This submission qualifies for the "best" (single-layer) leaderboard β€” each cell has 2–6 picked layers, not every layer.

Local public-test scores

Per-split max IIA on the public MIB test sets (full numbers and methodology in the project repo's RESULTS.md):

Residual-stream cells (IIA, higher is better)

cell sites mean IIA
MCQA Γ— Qwen Γ— answer_pointer 5 1.000
MCQA Γ— Qwen Γ— answer 3 0.849
MCQA Γ— Gemma Γ— answer_pointer 4 0.955
MCQA Γ— Gemma Γ— answer 4 0.908
ARC Γ— Gemma Γ— answer_pointer 6 0.884
ARC Γ— Gemma Γ— answer 4 0.999 ‑
arithmetic Γ— Gemma Γ— ones_carry 2 0.448 (smoke settings)
RAVEL Γ— Gemma Γ— Continent 2 0.856
RAVEL Γ— Gemma Γ— Country 2 0.615
RAVEL Γ— Gemma Γ— Language 2 0.629

IOI cells (MSE, lower is better)

cell sites MSE
IOI Γ— GPT-2 Γ— output_token 3 heads 5.16
IOI Γ— GPT-2 Γ— output_position 3 heads 16.0

‑ Cell 8 ARC Γ— Gemma Γ— answer (0.999) caveat. This score is driven by the harness's automatic identity-fallback at L25 last_token β€” a position PLOT did not pick to train. PLOT's actually-trained DAS rotations at the picked sites score 0.04–0.79 at this cell. The 0.999 is methodologically valid per the eval's scoring rules (it scores every position at picked layers, defaulting to identity at unselected positions) but is not a direct PLOT-rotation result. The mechanism and discussion live in PLOT_SHORTCOMINGS.md Β§15 in the project repo.

Method (one paragraph)

For each cell: Stage A is a per-OT-row Sinkhorn between abstract layer signatures and per-layer mean-aggregated neural signatures; each OT row picks its top-1 layer. Stage B does a second Sinkhorn within each Stage-A layer between abstract rows and per-token-position neural rows, keeping top_k ∈ {1, 2} positions per layer. Stage C trains DAS orthogonal-rotation featurizers at the selected (layer, position) sites only. The output is a Featurizer per site, satisfying the MIB harness's invertibility contract.

Compared to baseline DAS (which trains rotations at all 72 sites per cell), PLOT-DAS trains 2–6 sites per cell while remaining within seed-variance of DAS on 5 of 11 IIA cells we ran. The remaining cells have structural gaps documented in PLOT_SHORTCOMINGS.md (notably Β§13 for IOI signature design and Β§14 for RAVEL site-selection ceilings on high-cardinality outputs).

Reproducing locally

git clone https://github.com/bojro/plot-mib-submissions
cd plot-mib-submissions
# follow README.md "Setup from a fresh clone" β†’ uses .venv-mib
.venv-mib/bin/python -m mib_submission.plot.run \
    --task 4_answer_MCQA \
    --model Qwen/Qwen2.5-0.5B \
    --variable answer_pointer

Cells are CLI-configurable; per-task configs live in mib_submission/plot/configs.py. The 8 GB-box pipeline reaches all of the cells above; cells marked LlamaForCausalLM or non-GPT-2 IOI in the MIB cell set require β‰₯16 GB VRAM and were not run.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support