Learnings - Testing
- Add a focused trajectory-shape unit test that asserts the final two turns are
toolthen content-onlyassistant(notool_calls) and keep a separate low-reward test proving incorrect trajectories are still filtered toNone. (F014) - Keep notebook and pipeline smoke tests aligned by asserting stable training-flow anchors (for example
before_rollouts = sample_random_baselineappears beforerun_training_with_metrics(trainer)) whenever notebook cells are refactored. (F015)