Spaces:
Running
Running
File size: 1,539 Bytes
3eae4cc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | # rl/
Reinforcement learning module.
- `gov_workflow_env.py`: Gymnasium adapter around `app.env.GovWorkflowEnv`
- `feature_builder.py`: `ObservationModel` -> 84-dim float32 vector (`OBS_DIM=84`)
- `action_mask.py`: structural action masks (`N_ACTIONS=28`)
- `curriculum.py`: staged task scheduler (Phase 2/3)
- `train_ppo.py`: Phase 1 and Phase 2 training entrypoint
- `train_recurrent.py`: Phase 3 recurrent PPO entrypoint
- `evaluate.py`: deterministic evaluation on grader metrics (`--task` / `--tasks`)
- `eval_grader.py`: task-level grader evaluation helper with optional plots
- `plot_training.py`: training-curve report helper from monitor/TensorBoard artifacts
- `callbacks.py`: eval and cost-monitor callbacks
- `cost_tracker.py`: episode-level reward/cost extraction helpers
- `configs/`: YAML configs for PPO/recurrent training
- `ppo_easy.yaml`: standard Phase 1 config
- `ppo_easy_aggressive.yaml`: aggressive Phase 1 tuning profile for plateau recovery
## CLI Compatibility Notes
- Training scripts accept both `--n-envs` and `--n_envs`.
- `train_ppo.py` accepts `--task` as a compatibility alias:
- Phase 1 only supports `district_backlog_easy`
- Phase 2 ignores `--task` and uses curriculum sampling
- `train_ppo.py` supports `--resume <checkpoint>` for Phase 1 continuation runs.
- `train_recurrent.py` accepts `--task` to override recurrent eval callback task.
## Artifact Paths
- Training/eval outputs are written under `results/`:
- `results/best_model/*`
- `results/runs/*`
- `results/eval_logs/*`
|