# Rebuttal Reasoning Check

This folder is for a quick-turn reasoning generalization check that extends the
current rebuttal beyond short classification benchmarks.

Files:

- `quick_reasoning_sweep.py`: thin wrapper around `reasoning/disturb_CoT_shared_loto_reasoning.py`
- `raw/`: per-heldout-task JSON and Markdown emitted by the underlying evaluator
- `quick_reasoning_summary.json`: compact machine-readable summary across heldout tasks
- `quick_reasoning_summary.md`: compact human-readable summary for rebuttal drafting

Why this setup:

- It reuses the existing decode-aligned LOTO evaluator instead of introducing a second implementation.
- It targets the most rebuttal-relevant reasoning cases first:
  - `gsm8k`: open-ended numeric generation
  - `logiqa`: harder logical multiple-choice reasoning
- It keeps the run small enough to finish quickly, then leaves a clean path to scale up.

Suggested command:

```bash
CUDA_VISIBLE_DEVICES=3 /home/zs89/miniconda3/envs/flashsvd/bin/python rebuttal/reasoning/quick_reasoning_sweep.py
```

Notes:

- The current repo loader already supports `gsm8k`, `strategyqa`, `commonsenseqa`,
  `arc_challenge`, `openbookqa`, `qasc`, `logiqa`, `boolq`, `piqa`, and `aqua`.
- `mmlu` and `MATH` are not wired into the local loader yet, so they are not part of
  this quick-turn check.