polyguard-openenv-workbench / polyguard-rl /docs /idea_document_traceability.md
TheJackBright's picture
Deploy GitHub root master to Space
c296d62

Idea document and participant guide — implementation map

This ties your polypharmacy / OpenEnv design notes and typical hackathon submission requirements to files in this repository.

Submission narrative (required bullets)

Requirement Status Where
Problem statement Documented + implemented Root README.md, polyguard-rl/README.md, docs/safety.md
Environment (agent operates here) Implemented PolyGuardEnv, app/env/env_core.py, app/env/fastapi_app.py, openenv.yaml, server/app.py
Agent capabilities Implemented app/agents/, docs/agents.md
Tasks Implemented Scenario JSONL under data/scenarios/, presets in app/env/catalog.py
Reward / evaluation logic Implemented app/env/reward_router.py, app/env/verifier.py, configs/rewards.yaml, docs/reward_design.md, docs/evaluation.md
Post-training / self-improvement Implemented scripts/train_sft_trl.py, scripts/train_grpo_trl.py, app/training/grpo_trl.py, docs/training.md

Your “Plan” sections vs codebase

Plan item Status Notes
OpenEnv reset / step / state, timeouts, safety Done env_core.py, fastapi_app.py, max steps per sub-env, anti_cheat.py
Local + remote execution Done Local FastAPI + docker-compose.yml, HF Space via scripts/deploy_space_api.py, Dockerfile.space, docker/space/
Specific envs: DDI, bandit mining, regimen risk Done SubEnvironment enum, transitions in app/env/transition.py
Precision dosing, deprescribing, web search, alternatives, new drug (hard) Done Matching enum values + scenario tracks; “new drug” is NEW_DRUG_DECOMPOSITION
Multiple reward functions + anti-hacking Done 13 components → 4 channels; anti-cheat and tests in tests/
TRL + Unsloth, metrics, generations Done TRL scripts + reports; Unsloth optional (--use-unsloth); app/training/metrics.py
Post-training + inference Done merge + test_inference_postsave.py, active manifest / API path
Product / Space demo, UI Done FastAPI app/api/, React app/ui/frontend/, Space deployment scripts
Benchmarks + plots + sample generations Done scripts/evaluate_*.py, docs/results/, scripts/generate_submission_evidence.py
Deploy: OpenEnv, container, HF Space Done See docs/deployment.md
Easy / medium / hard Done scenarios_easy.jsonl, scenarios_medium.jsonl, scenarios_hard.jsonl

Themes (world modeling, multi-agent, self-improvement)

Theme Status Notes
World modeling / professional tasks Primary fit Stateful regimen, verifiers, tool-like actions
Multi-agent Partial Supervisor/orchestrator and policy stack (app/agents/orchestrator.py, supervisor_agent.py); not a separate multi-player env
Self-improving systems Via GRPO Environment-backed RLVR-style training, not online self-play

“What to submit” checklist

Deliverable Status
GitHub repo + URLs in README Root + polyguard-rl/README.md
HF Space URL In README
Points from doc docs/participant_guide_traceability.md, this file
Colab PolyGuard_SFT_GRPO_One_Run_Runner.ipynb, notebooks/09_training_loop.ipynb
Video or blog README links blog; publish draft in docs/hf_blog_draft.md or swap URL

Future ideas from your notes (not claimed as done)

  • Medicine images / barcodes: listed under Future Work in README.
  • Web search agents: sub-env WEB_SEARCH_MISSING_DATA exists; “full web agent product” is beyond current scope.

Fresh clone reminder

Generated data and many outputs/ reports are produced by scripts (see scripts/bootstrap_data.py, scripts/acceptance_gate.py REQUIRED_ARTIFACTS). Run the bootstrap/build pipeline before expecting strict POLYGUARD_ENFORCE_SUBMISSION_LINKS=true acceptance to pass on an empty workspace.