Spaces:

TheJackBright
/

polyguard-openenv-workbench

Sleeping

App Files Files Community

polyguard-openenv-workbench / polyguard-rl /docs /idea_document_traceability.md

TheJackBright

Deploy GitHub root master to Space

c296d62 11 days ago

preview code

raw

history blame contribute delete

3.98 kB

Idea document and participant guide — implementation map

This ties your polypharmacy / OpenEnv design notes and typical hackathon submission requirements to files in this repository.

Submission narrative (required bullets)

Requirement	Status	Where
Problem statement	Documented + implemented	Root `README.md`, `polyguard-rl/README.md`, `docs/safety.md`
Environment (agent operates here)	Implemented	`PolyGuardEnv`, `app/env/env_core.py`, `app/env/fastapi_app.py`, `openenv.yaml`, `server/app.py`
Agent capabilities	Implemented	`app/agents/`, `docs/agents.md`
Tasks	Implemented	Scenario JSONL under `data/scenarios/`, presets in `app/env/catalog.py`
Reward / evaluation logic	Implemented	`app/env/reward_router.py`, `app/env/verifier.py`, `configs/rewards.yaml`, `docs/reward_design.md`, `docs/evaluation.md`
Post-training / self-improvement	Implemented	`scripts/train_sft_trl.py`, `scripts/train_grpo_trl.py`, `app/training/grpo_trl.py`, `docs/training.md`

Your “Plan” sections vs codebase

Plan item	Status	Notes
OpenEnv `reset` / `step` / `state`, timeouts, safety	Done	`env_core.py`, `fastapi_app.py`, max steps per sub-env, `anti_cheat.py`
Local + remote execution	Done	Local FastAPI + `docker-compose.yml`, HF Space via `scripts/deploy_space_api.py`, `Dockerfile.space`, `docker/space/`
Specific envs: DDI, bandit mining, regimen risk	Done	`SubEnvironment` enum, transitions in `app/env/transition.py`
Precision dosing, deprescribing, web search, alternatives, new drug (hard)	Done	Matching enum values + scenario tracks; “new drug” is `NEW_DRUG_DECOMPOSITION`
Multiple reward functions + anti-hacking	Done	13 components → 4 channels; anti-cheat and tests in `tests/`
TRL + Unsloth, metrics, generations	Done	TRL scripts + reports; Unsloth optional (`--use-unsloth`); `app/training/metrics.py`
Post-training + inference	Done	merge + `test_inference_postsave.py`, active manifest / API path
Product / Space demo, UI	Done	FastAPI `app/api/`, React `app/ui/frontend/`, Space deployment scripts
Benchmarks + plots + sample generations	Done	`scripts/evaluate_*.py`, `docs/results/`, `scripts/generate_submission_evidence.py`
Deploy: OpenEnv, container, HF Space	Done	See `docs/deployment.md`
Easy / medium / hard	Done	`scenarios_easy.jsonl`, `scenarios_medium.jsonl`, `scenarios_hard.jsonl`

Themes (world modeling, multi-agent, self-improvement)

Theme	Status	Notes
World modeling / professional tasks	Primary fit	Stateful regimen, verifiers, tool-like actions
Multi-agent	Partial	Supervisor/orchestrator and policy stack (`app/agents/orchestrator.py`, `supervisor_agent.py`); not a separate multi-player env
Self-improving systems	Via GRPO	Environment-backed RLVR-style training, not online self-play

“What to submit” checklist

Deliverable	Status
GitHub repo + URLs in README	Root + `polyguard-rl/README.md`
HF Space URL	In README
Points from doc	`docs/participant_guide_traceability.md`, this file
Colab	`PolyGuard_SFT_GRPO_One_Run_Runner.ipynb`, `notebooks/09_training_loop.ipynb`
Video or blog	README links blog; publish draft in `docs/hf_blog_draft.md` or swap URL

Future ideas from your notes (not claimed as done)

Medicine images / barcodes: listed under Future Work in README.
Web search agents: sub-env WEB_SEARCH_MISSING_DATA exists; “full web agent product” is beyond current scope.

Fresh clone reminder

Generated data and many outputs/ reports are produced by scripts (see scripts/bootstrap_data.py, scripts/acceptance_gate.py REQUIRED_ARTIFACTS). Run the bootstrap/build pipeline before expecting strict POLYGUARD_ENFORCE_SUBMISSION_LINKS=true acceptance to pass on an empty workspace.