# Rebuttal Reasoning Check This folder is for a quick-turn reasoning generalization check that extends the current rebuttal beyond short classification benchmarks. Files: - `quick_reasoning_sweep.py`: thin wrapper around `reasoning/disturb_CoT_shared_loto_reasoning.py` - `raw/`: per-heldout-task JSON and Markdown emitted by the underlying evaluator - `quick_reasoning_summary.json`: compact machine-readable summary across heldout tasks - `quick_reasoning_summary.md`: compact human-readable summary for rebuttal drafting Why this setup: - It reuses the existing decode-aligned LOTO evaluator instead of introducing a second implementation. - It targets the most rebuttal-relevant reasoning cases first: - `gsm8k`: open-ended numeric generation - `logiqa`: harder logical multiple-choice reasoning - It keeps the run small enough to finish quickly, then leaves a clean path to scale up. Suggested command: ```bash CUDA_VISIBLE_DEVICES=3 /home/zs89/miniconda3/envs/flashsvd/bin/python rebuttal/reasoning/quick_reasoning_sweep.py ``` Notes: - The current repo loader already supports `gsm8k`, `strategyqa`, `commonsenseqa`, `arc_challenge`, `openbookqa`, `qasc`, `logiqa`, `boolq`, `piqa`, and `aqua`. - `mmlu` and `MATH` are not wired into the local loader yet, so they are not part of this quick-turn check.