# rl/ Reinforcement learning module. - `gov_workflow_env.py`: Gymnasium adapter around `app.env.GovWorkflowEnv` - `feature_builder.py`: `ObservationModel` -> 84-dim float32 vector (`OBS_DIM=84`) - `action_mask.py`: structural action masks (`N_ACTIONS=28`) - `curriculum.py`: staged task scheduler (Phase 2/3) - `train_ppo.py`: Phase 1 and Phase 2 training entrypoint - `train_recurrent.py`: Phase 3 recurrent PPO entrypoint - `evaluate.py`: deterministic evaluation on grader metrics (`--task` / `--tasks`) - `eval_grader.py`: task-level grader evaluation helper with optional plots - `plot_training.py`: training-curve report helper from monitor/TensorBoard artifacts - `callbacks.py`: eval and cost-monitor callbacks - `cost_tracker.py`: episode-level reward/cost extraction helpers - `configs/`: YAML configs for PPO/recurrent training - `ppo_easy.yaml`: standard Phase 1 config - `ppo_easy_aggressive.yaml`: aggressive Phase 1 tuning profile for plateau recovery ## CLI Compatibility Notes - Training scripts accept both `--n-envs` and `--n_envs`. - `train_ppo.py` accepts `--task` as a compatibility alias: - Phase 1 only supports `district_backlog_easy` - Phase 2 ignores `--task` and uses curriculum sampling - `train_ppo.py` supports `--resume ` for Phase 1 continuation runs. - `train_recurrent.py` accepts `--task` to override recurrent eval callback task. ## Artifact Paths - Training/eval outputs are written under `results/`: - `results/best_model/*` - `results/runs/*` - `results/eval_logs/*`