CoT Oracle Paper Ablations And Baselines
Collection
All models used for my LessWrong post. Generally recommended to use latest adam oracle, or the checkpoint confusingly labelled "no DPO" • 8 items • Updated
This repo contains the final no-DPO CoT Oracle checkpoint trained with the full cot-oracle task mixture before GRPO calibration.
Qwen/Qwen3-8B[9, 18, 27]shuffled42ao_checkpoint: adamkarvonen/checkpoints_latentqa_cls_past_lens_addition_Qwen3-8B with fresh_lora: true100M training tokensEnabled task families from configs/train.yaml:
hint_admission: n: -1, epochs: 2atypical_answer: n: -1reasoning_termination: n: -1, epochs: 2answer_trajectory: n: -1futurelens: n: 30000pastlens: n: 30000correctness: n: -1, epochs: 2decorative_cot: n: -1, epochs: 2chunked_convqa: n: -1chunked_compqa_backtrack: n: -1backtrack_prediction: n: -1, epochs: 2sycophancy: n: -1, epochs: 2sqa: n: -1, epochs: 2truthfulqa_hint: n: -1, epochs: 2classification: enabled, n: 20000, datasets = sst2, ag_news, snlifineweb: enabled, n: 60000, variants = futurelens_fineweb,pastlens_finewebDisabled task families:
resampling_importancechunked_compqa_self_correctionchunked_compqa_verificationchunked_compqa_remaining_strategyconvqacompqaprobe_sycophancytruthfulqa_hint_verbalizedsentence_insertionrot13_reconstructionlatentqa100M training-token count.