Spaces:

Otter21
/

Gov_Workflow_RL

Running

Siddharaj Shirke

deploy: clean code-only snapshot for HF Space

df97e68 10 days ago

1.54 kB

rl/

Reinforcement learning module.

gov_workflow_env.py: Gymnasium adapter around app.env.GovWorkflowEnv
feature_builder.py: ObservationModel -> 84-dim float32 vector (OBS_DIM=84)
action_mask.py: structural action masks (N_ACTIONS=28)
curriculum.py: staged task scheduler (Phase 2/3)
train_ppo.py: Phase 1 and Phase 2 training entrypoint
train_recurrent.py: Phase 3 recurrent PPO entrypoint
evaluate.py: deterministic evaluation on grader metrics (--task / --tasks)
eval_grader.py: task-level grader evaluation helper with optional plots
plot_training.py: training-curve report helper from monitor/TensorBoard artifacts
callbacks.py: eval and cost-monitor callbacks
cost_tracker.py: episode-level reward/cost extraction helpers
configs/: YAML configs for PPO/recurrent training
- ppo_easy.yaml: standard Phase 1 config
- ppo_easy_aggressive.yaml: aggressive Phase 1 tuning profile for plateau recovery

Training scripts accept both --n-envs and --n_envs.
train_ppo.py accepts --task as a compatibility alias:
- Phase 1 only supports district_backlog_easy
- Phase 2 ignores --task and uses curriculum sampling
train_ppo.py supports --resume <checkpoint> for Phase 1 continuation runs.
train_recurrent.py accepts --task to override recurrent eval callback task.

Training/eval outputs are written under results/:
- results/best_model/*
- results/runs/*
- results/eval_logs/*