CoT Oracle Paper Ablations And Baselines
Collection
All models used for my LessWrong post. Generally recommended to use latest adam oracle, or the checkpoint confusingly labelled "no DPO" • 8 items • Updated
This repo contains the 3-layer paper ablation that replaces the FineWeb future/past-lens data with the same total amount of on-policy future/past-lens data.
Qwen/Qwen3-8B[9, 18, 27]shuffled4250M input tokens22.3M logged training tokensfuturelens: enabled, n: 60000pastlens: enabled, n: 60000chunked_convqa: enabled, n: -1 (all available examples)classification: enabled, n: 20000, datasets = sst2, ag_news, snlifineweb: disabledlatentqa: disabledconfigs/train.yaml: disabled50M input-token budget was reached.22.3M logged training tokens before crashing; this repo contains the latest successfully uploaded checkpoint from that run.