Spaces:

Otter21
/

Gov_Workflow_RL

Running

Siddharaj Shirke

deploy: clean code-only snapshot for HF Space

df97e68 10 days ago

1.54 kB

	# rl/

	Reinforcement learning module.

	- `gov_workflow_env.py`: Gymnasium adapter around `app.env.GovWorkflowEnv`
	- `feature_builder.py`: `ObservationModel` -> 84-dim float32 vector (`OBS_DIM=84`)
	- `action_mask.py`: structural action masks (`N_ACTIONS=28`)
	- `curriculum.py`: staged task scheduler (Phase 2/3)
	- `train_ppo.py`: Phase 1 and Phase 2 training entrypoint
	- `train_recurrent.py`: Phase 3 recurrent PPO entrypoint
	- `evaluate.py`: deterministic evaluation on grader metrics (`--task` / `--tasks`)
	- `eval_grader.py`: task-level grader evaluation helper with optional plots
	- `plot_training.py`: training-curve report helper from monitor/TensorBoard artifacts
	- `callbacks.py`: eval and cost-monitor callbacks
	- `cost_tracker.py`: episode-level reward/cost extraction helpers
	- `configs/`: YAML configs for PPO/recurrent training
	- `ppo_easy.yaml`: standard Phase 1 config
	- `ppo_easy_aggressive.yaml`: aggressive Phase 1 tuning profile for plateau recovery

	## CLI Compatibility Notes

	- Training scripts accept both `--n-envs` and `--n_envs`.
	- `train_ppo.py` accepts `--task` as a compatibility alias:
	- Phase 1 only supports `district_backlog_easy`
	- Phase 2 ignores `--task` and uses curriculum sampling
	- `train_ppo.py` supports `--resume <checkpoint>` for Phase 1 continuation runs.
	- `train_recurrent.py` accepts `--task` to override recurrent eval callback task.

	## Artifact Paths

	- Training/eval outputs are written under `results/`:
	- `results/best_model/*`
	- `results/runs/*`
	- `results/eval_logs/*`