CoT Oracle Final Sprint Checkpoint: No DPO

This repo contains the final no-DPO CoT Oracle checkpoint trained with the full cot-oracle task mixture before GRPO calibration.

What This Checkpoint Is

Base model: Qwen/Qwen3-8B
Adapter format: PEFT LoRA
Activation readout layers: [9, 18, 27]
Task order: shuffled
Seed: 42
Training config references ao_checkpoint: adamkarvonen/checkpoints_latentqa_cls_past_lens_addition_Qwen3-8B with fresh_lora: true
Paper label: 100M training tokens

Enabled task families from configs/train.yaml:

hint_admission: n: -1, epochs: 2
atypical_answer: n: -1
reasoning_termination: n: -1, epochs: 2
answer_trajectory: n: -1
On-policy futurelens: n: 30000
On-policy pastlens: n: 30000
correctness: n: -1, epochs: 2
decorative_cot: n: -1, epochs: 2
chunked_convqa: n: -1
chunked_compqa_backtrack: n: -1
backtrack_prediction: n: -1, epochs: 2
sycophancy: n: -1, epochs: 2
sqa: n: -1, epochs: 2
truthfulqa_hint: n: -1, epochs: 2
classification: enabled, n: 20000, datasets = sst2, ag_news, snli
fineweb: enabled, n: 60000, variants = futurelens_fineweb,pastlens_fineweb