File size: 3,982 Bytes
e21fe7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Idea document and participant guide — implementation map

This ties your polypharmacy / OpenEnv design notes and typical hackathon submission requirements to files in this repository.

## Submission narrative (required bullets)

| Requirement | Status | Where |
| --- | --- | --- |
| Problem statement | Documented + implemented | Root [`README.md`](../../README.md), `polyguard-rl/README.md`, `docs/safety.md` |
| Environment (agent operates here) | Implemented | `PolyGuardEnv`, `app/env/env_core.py`, `app/env/fastapi_app.py`, `openenv.yaml`, `server/app.py` |
| Agent capabilities | Implemented | `app/agents/`, `docs/agents.md` |
| Tasks | Implemented | Scenario JSONL under `data/scenarios/`, presets in `app/env/catalog.py` |
| Reward / evaluation logic | Implemented | `app/env/reward_router.py`, `app/env/verifier.py`, `configs/rewards.yaml`, `docs/reward_design.md`, `docs/evaluation.md` |
| Post-training / self-improvement | Implemented | `scripts/train_sft_trl.py`, `scripts/train_grpo_trl.py`, `app/training/grpo_trl.py`, `docs/training.md` |

## Your “Plan” sections vs codebase

| Plan item | Status | Notes |
| --- | --- | --- |
| OpenEnv `reset` / `step` / `state`, timeouts, safety | Done | `env_core.py`, `fastapi_app.py`, max steps per sub-env, `anti_cheat.py` |
| Local + remote execution | Done | Local FastAPI + `docker-compose.yml`, HF Space via `scripts/deploy_space_api.py`, `Dockerfile.space`, `docker/space/` |
| Specific envs: DDI, bandit mining, regimen risk | Done | `SubEnvironment` enum, transitions in `app/env/transition.py` |
| Precision dosing, deprescribing, web search, alternatives, new drug (hard) | Done | Matching enum values + scenario tracks; “new drug” is `NEW_DRUG_DECOMPOSITION` |
| Multiple reward functions + anti-hacking | Done | 13 components → 4 channels; anti-cheat and tests in `tests/` |
| TRL + Unsloth, metrics, generations | Done | TRL scripts + reports; Unsloth optional (`--use-unsloth`); `app/training/metrics.py` |
| Post-training + inference | Done | merge + `test_inference_postsave.py`, active manifest / API path |
| Product / Space demo, UI | Done | FastAPI `app/api/`, React `app/ui/frontend/`, Space deployment scripts |
| Benchmarks + plots + sample generations | Done | `scripts/evaluate_*.py`, `docs/results/`, `scripts/generate_submission_evidence.py` |
| Deploy: OpenEnv, container, HF Space | Done | See `docs/deployment.md` |
| Easy / medium / hard | Done | `scenarios_easy.jsonl`, `scenarios_medium.jsonl`, `scenarios_hard.jsonl` |

## Themes (world modeling, multi-agent, self-improvement)

| Theme | Status | Notes |
| --- | --- | --- |
| World modeling / professional tasks | Primary fit | Stateful regimen, verifiers, tool-like actions |
| Multi-agent | Partial | Supervisor/orchestrator and policy stack (`app/agents/orchestrator.py`, `supervisor_agent.py`); not a separate multi-player env |
| Self-improving systems | Via GRPO | Environment-backed RLVR-style training, not online self-play |

## “What to submit” checklist

| Deliverable | Status |
| --- | --- |
| GitHub repo + URLs in README | Root + `polyguard-rl/README.md` |
| HF Space URL | In README |
| Points from doc | `docs/participant_guide_traceability.md`, this file |
| Colab | `PolyGuard_SFT_GRPO_One_Run_Runner.ipynb`, `notebooks/09_training_loop.ipynb` |
| Video or blog | README links blog; **publish** draft in `docs/hf_blog_draft.md` or swap URL |

## Future ideas from your notes (not claimed as done)

- Medicine images / barcodes: listed under Future Work in README.
- Web search agents: sub-env `WEB_SEARCH_MISSING_DATA` exists; “full web agent product” is beyond current scope.

## Fresh clone reminder

Generated data and many `outputs/` reports are produced by scripts (see `scripts/bootstrap_data.py`, `scripts/acceptance_gate.py` `REQUIRED_ARTIFACTS`). Run the bootstrap/build pipeline before expecting strict `POLYGUARD_ENFORCE_SUBMISSION_LINKS=true` acceptance to pass on an empty workspace.