sql_env / docs /learnings /architecture.md
hjerpe's picture
Upload folder using huggingface_hub
9e64e71 verified

Learnings - Architecture

  • Keep behavior-shaping reward logic inside SQLEnvTRL as additive trajectory-level state (reward, _repeat_count) so tool method signatures and TRL environment interfaces remain stable while internal semantics evolve. (F015)