polyguard-openenv-workbench / polyguard-rl /docs /submission_artifacts.md
TheJackBright's picture
Deploy GitHub root master to Space
c296d62
# Submission Artifact Index
This page points reviewers to the shared environment, training scripts, and
training logs/results. It is intentionally path-based so the artifacts can be
found from a fresh clone without relying on local `outputs/` or `checkpoints/`
folders.
## Environment And Runtime
Core OpenEnv/runtime files:
- `openenv.yaml` - OpenEnv package entrypoint and deployment metadata.
- `server/app.py` - ASGI/FastAPI bridge used by OpenEnv validation and Space deployment.
- `app/env/env_core.py` - canonical `PolyGuardEnv` reset/step/state implementation.
- `app/env/fastapi_app.py` - HTTP API, catalog, reset, step, and candidate-step routes.
- `app/env/reward_router.py` - verifier-backed reward routing.
- `app/env/reward_scaling.py` - reward clamping/rounding to `[0.001, 0.999]`.
- `app/env/anti_cheat.py` - anti-hacking and invalid-action checks.
- `app/env/catalog.py` - task preset and sub-environment catalog.
Dependency and container files:
- `pyproject.toml` and `uv.lock` - local Python environment lock.
- `requirements.txt` - local/runtime pip dependency export.
- `requirements-space.txt` - Hugging Face Space dependency export.
- `.env.example` - non-secret environment variable template.
- `Dockerfile` - local/container runtime.
- `Dockerfile.space` - product HF Space runtime.
- `app/hf_space/Dockerfile` - HF training/evidence Space runtime.
- `configs/sft.yaml` and `configs/grpo.yaml` - train-loop defaults.
- `configs/rewards.yaml`, `configs/curriculum.yaml`, and `configs/env_*.yaml` - environment/reward/curriculum configuration.
Secrets are not committed. Hugging Face access is supplied through `HF_TOKEN`
as an environment variable or notebook/Space secret.
## Training Scripts And Notebooks
End-to-end runner notebooks:
- `PolyGuard_SFT_GRPO_One_Run_Runner.ipynb` - one-run data build, SFT, GRPO, artifact pull, inference validation, chart generation, and Space deployment.
- `notebooks/09_training_loop.ipynb` - modular walkthrough of the same loop.
Dataset and corpus scripts:
- `scripts/bootstrap_data.py`
- `scripts/build_training_corpus.py`
- `scripts/generate_sft_data.py`
SFT/GRPO training scripts:
- `scripts/train_sft_trl.py` - TRL SFT baseline.
- `scripts/train_grpo_trl.py` - TRL GRPO with environment-backed reward.
- `scripts/train_grpo_policy.py`
- `scripts/train_grpo_planner.py`
- `scripts/train_grpo_supervisor.py`
- `scripts/train_grpo_dosing.py`
- `app/training/sft_trl.py`
- `app/training/grpo_trl.py`
- `app/training/openenv_wrapper.py`
- `app/training/reward_functions.py`
- `app/training/callbacks.py`
- `app/training/checkpointing.py`
Hugging Face training/evidence scripts:
- `scripts/deploy_training_space.py` - creates/runs the GPU training Space.
- `app/hf_space/training_runner.py` - Space-side training orchestrator.
- `scripts/monitor_training_space_status.py` - Space status/log monitor.
- `scripts/pull_training_artifacts.py` - artifact puller from the HF model repo.
- `scripts/deploy_evidence_space.py` and `app/hf_space/evidence_runner.py` - evaluation-only evidence Space.
- `scripts/generate_hf_training_report.py` - training/sweep chart generation.
- `scripts/generate_submission_evidence.py` - evidence bundle generation without retraining.
- `scripts/deploy_final_artifact_space.py` - packages final public evidence/model artifacts into the final HF Space.
Post-training and inference scripts:
- `scripts/merge_adapters_safe.py`
- `scripts/test_inference_postsave.py`
- `scripts/benchmark_inference.py`
- `scripts/activate_sweep_model.py`
- `scripts/install_hf_active_bundle.py`
## Training Logs And Result Evidence
Final curated evidence:
- `docs/results/final_submission_evidence/README.md` - final evidence overview.
- `docs/results/final_submission_evidence/manifest.json` - artifact availability and final HF Space manifest.
- `docs/results/final_submission_evidence/reports/submission_summary.json` - final three-model summary.
- `docs/results/final_submission_evidence/reports/grpo_trl_run.json` - Qwen 3B GRPO training run report.
- `docs/results/final_submission_evidence/reports/postsave_inference_grpo.json` - post-save GRPO inference check.
- `docs/results/final_submission_evidence/reports/grpo_ablation_report.json` - GRPO/policy ablation report.
- `docs/results/final_submission_evidence/reports/basic_llm_vs_polyguard_report.json` - baseline LLM-style policy vs full PolyGuard pipeline.
- `docs/results/final_submission_evidence/reports/action_traces.jsonl` - matched action traces with verifier output.
- `docs/results/final_submission_evidence/charts/curated/README.md` - visually reviewed chart index.
Per-model sweep histories:
- `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_history.json`
- `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_trl_run.json`
- `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_history.json`
- `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_trl_run.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_history.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_trl_run.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_history.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_trl_run.json`
Three-model submission evidence:
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-0-5b-instruct/sft_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-1-5b-instruct/sft_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/sft_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_reward_components.jsonl`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/remote_stage_records.json`
Completed-run status snapshots:
- `docs/results/qwen_completed_runs/reports/remote_status/live_hf_status_snapshot.json`
- `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_completed_commands.json`
- `docs/results/qwen_completed_runs/reports/remote_status/qwen_1_5b_completed_commands.json`
- `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_1_5b_remote_stage_durations.json`
- `docs/results/submission_evidence/qwen_3b_continuation/training_space_runtime_status.json`
Legacy/local smoke logs are retained under `docs/results/active_model/`,
`docs/results/grpo_training_cycle/`, and `submission_bundle/` for auditability.
## Model Artifacts
The public final artifact/evidence Space is:
- https://huggingface.co/spaces/adithya9903/polyguard-openenv-final-artifacts
The tracked local manifest is:
- `docs/results/final_submission_evidence/manifest.json`
At packaging time, Qwen 3B had SFT and GRPO adapter directories plus checkpoint
metadata in the final Space. Qwen 0.5B and 1.5B have reports/histories in this
repo, but their adapter directories were not present in the checked artifact
mirrors and are labeled `reports_only_or_partial`.
The final artifact Space and this checked-in evidence mirror are the public
review paths. Authenticated downloads, when needed by maintainers, are
operational details rather than part of the public submission narrative.
## Reproduction Paths
Local smoke path: build the small corpus, run a short SFT pass, run a short GRPO
pass, validate post-save inference, and generate local reports.
Full HF Space path: use the one-run notebook or training Space runner when you
control the required Hugging Face credentials and hardware. The public evidence
for review is the final curated bundle, not private training commands.