Spaces:
Running
Running
rl/
Reinforcement learning module.
gov_workflow_env.py: Gymnasium adapter aroundapp.env.GovWorkflowEnvfeature_builder.py:ObservationModel-> 84-dim float32 vector (OBS_DIM=84)action_mask.py: structural action masks (N_ACTIONS=28)curriculum.py: staged task scheduler (Phase 2/3)train_ppo.py: Phase 1 and Phase 2 training entrypointtrain_recurrent.py: Phase 3 recurrent PPO entrypointevaluate.py: deterministic evaluation on grader metrics (--task/--tasks)eval_grader.py: task-level grader evaluation helper with optional plotsplot_training.py: training-curve report helper from monitor/TensorBoard artifactscallbacks.py: eval and cost-monitor callbackscost_tracker.py: episode-level reward/cost extraction helpersconfigs/: YAML configs for PPO/recurrent trainingppo_easy.yaml: standard Phase 1 configppo_easy_aggressive.yaml: aggressive Phase 1 tuning profile for plateau recovery
CLI Compatibility Notes
- Training scripts accept both
--n-envsand--n_envs. train_ppo.pyaccepts--taskas a compatibility alias:- Phase 1 only supports
district_backlog_easy - Phase 2 ignores
--taskand uses curriculum sampling
- Phase 1 only supports
train_ppo.pysupports--resume <checkpoint>for Phase 1 continuation runs.train_recurrent.pyaccepts--taskto override recurrent eval callback task.
Artifact Paths
- Training/eval outputs are written under
results/:results/best_model/*results/runs/*results/eval_logs/*