# rl/

Reinforcement learning module.

- `gov_workflow_env.py`: Gymnasium adapter around `app.env.GovWorkflowEnv`
- `feature_builder.py`: `ObservationModel` -> 84-dim float32 vector (`OBS_DIM=84`)
- `action_mask.py`: structural action masks (`N_ACTIONS=28`)
- `curriculum.py`: staged task scheduler (Phase 2/3)
- `train_ppo.py`: Phase 1 and Phase 2 training entrypoint
- `train_recurrent.py`: Phase 3 recurrent PPO entrypoint
- `evaluate.py`: deterministic evaluation on grader metrics (`--task` / `--tasks`)
- `eval_grader.py`: task-level grader evaluation helper with optional plots
- `plot_training.py`: training-curve report helper from monitor/TensorBoard artifacts
- `callbacks.py`: eval and cost-monitor callbacks
- `cost_tracker.py`: episode-level reward/cost extraction helpers
- `configs/`: YAML configs for PPO/recurrent training
  - `ppo_easy.yaml`: standard Phase 1 config
  - `ppo_easy_aggressive.yaml`: aggressive Phase 1 tuning profile for plateau recovery

## CLI Compatibility Notes

- Training scripts accept both `--n-envs` and `--n_envs`.
- `train_ppo.py` accepts `--task` as a compatibility alias:
  - Phase 1 only supports `district_backlog_easy`
  - Phase 2 ignores `--task` and uses curriculum sampling
- `train_ppo.py` supports `--resume <checkpoint>` for Phase 1 continuation runs.
- `train_recurrent.py` accepts `--task` to override recurrent eval callback task.

## Artifact Paths

- Training/eval outputs are written under `results/`:
  - `results/best_model/*`
  - `results/runs/*`
  - `results/eval_logs/*`