Zishan-Shao
/

decodeshare

interpretability

mechanistic-interpretability

activation-steering

Model card Files Files and versions

decodeshare / artifacts /rebuttal /reasoning /quick_reasoning_summary.md

Zishan Shao

Add lighthouse rebuttal artifacts

1c8e365 8 days ago

|

history blame contribute delete

1.73 kB

Quick Reasoning Rebuttal Check

This is a quick-turn held-out-task check for reasoning-heavy tasks.

Model: meta-llama/Llama-2-7b-chat-hf dtype=fp16 device=cuda
Tasks used for basis/eval: gsm8k,commonsenseqa,strategyqa,arc_challenge,openbookqa,qasc,logiqa
Held-out tasks run: gsm8k,logiqa
Per-task eval size: n_eval=32, n_subspace=64, layer=10
Protocol: LOTO heldout, forced_choice=True, do_sample=False

Per-task results

Held-out	Type	n	Baseline	Decode-shared	Prefill-shared	Random	D-P delta	p
gsm8k	Open-ended numeric reasoning	32	0.0	0.0	0.0	0.0	+0.0 [+0.0, +0.0]	1
logiqa	Logical reasoning multiple choice	32	31.2 (chance 25.0)	15.6	34.4	34.4	-18.8 [-31.2, -6.2]	0.036

Aggregate

Mean accuracy: baseline=15.6, decode_shared=7.8, prefill_shared=17.2, random=17.2
Mean deltas vs baseline: decode=-7.8, prefill=+1.6, random=+1.6
Mean decode-minus-prefill delta: -9.4
Informative held-out tasks: logiqa
Inconclusive due to baseline floor/chance: gsm8k

Interpretation

gsm8k is currently inconclusive: baseline is at or near floor/chance, so this fold does not say much about decode-vs-prefill selectivity.
logiqa is informative: decode-shared changes accuracy by -15.6 vs baseline and -18.8 vs prefill-shared.
Use informative folds as rebuttal evidence that the decode-shared phenomenon is not confined to short classification tasks.