| # Idea document and participant guide — implementation map |
|
|
| This ties your polypharmacy / OpenEnv design notes and typical hackathon submission requirements to files in this repository. |
|
|
| ## Submission narrative (required bullets) |
|
|
| | Requirement | Status | Where | |
| | --- | --- | --- | |
| | Problem statement | Documented + implemented | Root [`README.md`](../../README.md), `polyguard-rl/README.md`, `docs/safety.md` | |
| | Environment (agent operates here) | Implemented | `PolyGuardEnv`, `app/env/env_core.py`, `app/env/fastapi_app.py`, `openenv.yaml`, `server/app.py` | |
| | Agent capabilities | Implemented | `app/agents/`, `docs/agents.md` | |
| | Tasks | Implemented | Scenario JSONL under `data/scenarios/`, presets in `app/env/catalog.py` | |
| | Reward / evaluation logic | Implemented | `app/env/reward_router.py`, `app/env/verifier.py`, `configs/rewards.yaml`, `docs/reward_design.md`, `docs/evaluation.md` | |
| | Post-training / self-improvement | Implemented | `scripts/train_sft_trl.py`, `scripts/train_grpo_trl.py`, `app/training/grpo_trl.py`, `docs/training.md` | |
|
|
| ## Your “Plan” sections vs codebase |
|
|
| | Plan item | Status | Notes | |
| | --- | --- | --- | |
| | OpenEnv `reset` / `step` / `state`, timeouts, safety | Done | `env_core.py`, `fastapi_app.py`, max steps per sub-env, `anti_cheat.py` | |
| | Local + remote execution | Done | Local FastAPI + `docker-compose.yml`, HF Space via `scripts/deploy_space_api.py`, `Dockerfile.space`, `docker/space/` | |
| | Specific envs: DDI, bandit mining, regimen risk | Done | `SubEnvironment` enum, transitions in `app/env/transition.py` | |
| | Precision dosing, deprescribing, web search, alternatives, new drug (hard) | Done | Matching enum values + scenario tracks; “new drug” is `NEW_DRUG_DECOMPOSITION` | |
| | Multiple reward functions + anti-hacking | Done | 13 components → 4 channels; anti-cheat and tests in `tests/` | |
| | TRL + Unsloth, metrics, generations | Done | TRL scripts + reports; Unsloth optional (`--use-unsloth`); `app/training/metrics.py` | |
| | Post-training + inference | Done | merge + `test_inference_postsave.py`, active manifest / API path | |
| | Product / Space demo, UI | Done | FastAPI `app/api/`, React `app/ui/frontend/`, Space deployment scripts | |
| | Benchmarks + plots + sample generations | Done | `scripts/evaluate_*.py`, `docs/results/`, `scripts/generate_submission_evidence.py` | |
| | Deploy: OpenEnv, container, HF Space | Done | See `docs/deployment.md` | |
| | Easy / medium / hard | Done | `scenarios_easy.jsonl`, `scenarios_medium.jsonl`, `scenarios_hard.jsonl` | |
|
|
| ## Themes (world modeling, multi-agent, self-improvement) |
|
|
| | Theme | Status | Notes | |
| | --- | --- | --- | |
| | World modeling / professional tasks | Primary fit | Stateful regimen, verifiers, tool-like actions | |
| | Multi-agent | Partial | Supervisor/orchestrator and policy stack (`app/agents/orchestrator.py`, `supervisor_agent.py`); not a separate multi-player env | |
| | Self-improving systems | Via GRPO | Environment-backed RLVR-style training, not online self-play | |
|
|
| ## “What to submit” checklist |
|
|
| | Deliverable | Status | |
| | --- | --- | |
| | GitHub repo + URLs in README | Root + `polyguard-rl/README.md` | |
| | HF Space URL | In README | |
| | Points from doc | `docs/participant_guide_traceability.md`, this file | |
| | Colab | `PolyGuard_SFT_GRPO_One_Run_Runner.ipynb`, `notebooks/09_training_loop.ipynb` | |
| | Video or blog | README links blog; **publish** draft in `docs/hf_blog_draft.md` or swap URL | |
|
|
| ## Future ideas from your notes (not claimed as done) |
|
|
| - Medicine images / barcodes: listed under Future Work in README. |
| - Web search agents: sub-env `WEB_SEARCH_MISSING_DATA` exists; “full web agent product” is beyond current scope. |
|
|
| ## Fresh clone reminder |
|
|
| Generated data and many `outputs/` reports are produced by scripts (see `scripts/bootstrap_data.py`, `scripts/acceptance_gate.py` `REQUIRED_ARTIFACTS`). Run the bootstrap/build pipeline before expecting strict `POLYGUARD_ENFORCE_SUBMISSION_LINKS=true` acceptance to pass on an empty workspace. |
|
|