# Learnings - Architecture - Keep behavior-shaping reward logic inside `SQLEnvTRL` as additive trajectory-level state (`reward`, `_repeat_count`) so tool method signatures and TRL environment interfaces remain stable while internal semantics evolve. *(F015)*