Rebuttal Reasoning Check
This folder is for a quick-turn reasoning generalization check that extends the current rebuttal beyond short classification benchmarks.
Files:
quick_reasoning_sweep.py: thin wrapper aroundreasoning/disturb_CoT_shared_loto_reasoning.pyraw/: per-heldout-task JSON and Markdown emitted by the underlying evaluatorquick_reasoning_summary.json: compact machine-readable summary across heldout tasksquick_reasoning_summary.md: compact human-readable summary for rebuttal drafting
Why this setup:
- It reuses the existing decode-aligned LOTO evaluator instead of introducing a second implementation.
- It targets the most rebuttal-relevant reasoning cases first:
gsm8k: open-ended numeric generationlogiqa: harder logical multiple-choice reasoning
- It keeps the run small enough to finish quickly, then leaves a clean path to scale up.
Suggested command:
CUDA_VISIBLE_DEVICES=3 /home/zs89/miniconda3/envs/flashsvd/bin/python rebuttal/reasoning/quick_reasoning_sweep.py
Notes:
- The current repo loader already supports
gsm8k,strategyqa,commonsenseqa,arc_challenge,openbookqa,qasc,logiqa,boolq,piqa, andaqua. mmluandMATHare not wired into the local loader yet, so they are not part of this quick-turn check.