| # Submission Artifact Index |
|
|
| This page points reviewers to the shared environment, training scripts, and |
| training logs/results. It is intentionally path-based so the artifacts can be |
| found from a fresh clone without relying on local `outputs/` or `checkpoints/` |
| folders. |
|
|
| ## Environment And Runtime |
|
|
| Core OpenEnv/runtime files: |
|
|
| - `openenv.yaml` - OpenEnv package entrypoint and deployment metadata. |
| - `server/app.py` - ASGI/FastAPI bridge used by OpenEnv validation and Space deployment. |
| - `app/env/env_core.py` - canonical `PolyGuardEnv` reset/step/state implementation. |
| - `app/env/fastapi_app.py` - HTTP API, catalog, reset, step, and candidate-step routes. |
| - `app/env/reward_router.py` - verifier-backed reward routing. |
| - `app/env/reward_scaling.py` - reward clamping/rounding to `[0.001, 0.999]`. |
| - `app/env/anti_cheat.py` - anti-hacking and invalid-action checks. |
| - `app/env/catalog.py` - task preset and sub-environment catalog. |
|
|
| Dependency and container files: |
|
|
| - `pyproject.toml` and `uv.lock` - local Python environment lock. |
| - `requirements.txt` - local/runtime pip dependency export. |
| - `requirements-space.txt` - Hugging Face Space dependency export. |
| - `.env.example` - non-secret environment variable template. |
| - `Dockerfile` - local/container runtime. |
| - `Dockerfile.space` - product HF Space runtime. |
| - `app/hf_space/Dockerfile` - HF training/evidence Space runtime. |
| - `configs/sft.yaml` and `configs/grpo.yaml` - train-loop defaults. |
| - `configs/rewards.yaml`, `configs/curriculum.yaml`, and `configs/env_*.yaml` - environment/reward/curriculum configuration. |
|
|
| Secrets are not committed. Hugging Face access is supplied through `HF_TOKEN` |
| as an environment variable or notebook/Space secret. |
|
|
| ## Training Scripts And Notebooks |
|
|
| End-to-end runner notebooks: |
|
|
| - `PolyGuard_SFT_GRPO_One_Run_Runner.ipynb` - one-run data build, SFT, GRPO, artifact pull, inference validation, chart generation, and Space deployment. |
| - `notebooks/09_training_loop.ipynb` - modular walkthrough of the same loop. |
|
|
| Dataset and corpus scripts: |
|
|
| - `scripts/bootstrap_data.py` |
| - `scripts/build_training_corpus.py` |
| - `scripts/generate_sft_data.py` |
|
|
| SFT/GRPO training scripts: |
|
|
| - `scripts/train_sft_trl.py` - TRL SFT baseline. |
| - `scripts/train_grpo_trl.py` - TRL GRPO with environment-backed reward. |
| - `scripts/train_grpo_policy.py` |
| - `scripts/train_grpo_planner.py` |
| - `scripts/train_grpo_supervisor.py` |
| - `scripts/train_grpo_dosing.py` |
| - `app/training/sft_trl.py` |
| - `app/training/grpo_trl.py` |
| - `app/training/openenv_wrapper.py` |
| - `app/training/reward_functions.py` |
| - `app/training/callbacks.py` |
| - `app/training/checkpointing.py` |
|
|
| Hugging Face training/evidence scripts: |
|
|
| - `scripts/deploy_training_space.py` - creates/runs the GPU training Space. |
| - `app/hf_space/training_runner.py` - Space-side training orchestrator. |
| - `scripts/monitor_training_space_status.py` - Space status/log monitor. |
| - `scripts/pull_training_artifacts.py` - artifact puller from the HF model repo. |
| - `scripts/deploy_evidence_space.py` and `app/hf_space/evidence_runner.py` - evaluation-only evidence Space. |
| - `scripts/generate_hf_training_report.py` - training/sweep chart generation. |
| - `scripts/generate_submission_evidence.py` - evidence bundle generation without retraining. |
| - `scripts/deploy_final_artifact_space.py` - packages final public evidence/model artifacts into the final HF Space. |
|
|
| Post-training and inference scripts: |
|
|
| - `scripts/merge_adapters_safe.py` |
| - `scripts/test_inference_postsave.py` |
| - `scripts/benchmark_inference.py` |
| - `scripts/activate_sweep_model.py` |
| - `scripts/install_hf_active_bundle.py` |
|
|
| ## Training Logs And Result Evidence |
|
|
| Final curated evidence: |
|
|
| - `docs/results/final_submission_evidence/README.md` - final evidence overview. |
| - `docs/results/final_submission_evidence/manifest.json` - artifact availability and final HF Space manifest. |
| - `docs/results/final_submission_evidence/reports/submission_summary.json` - final three-model summary. |
| - `docs/results/final_submission_evidence/reports/grpo_trl_run.json` - Qwen 3B GRPO training run report. |
| - `docs/results/final_submission_evidence/reports/postsave_inference_grpo.json` - post-save GRPO inference check. |
| - `docs/results/final_submission_evidence/reports/grpo_ablation_report.json` - GRPO/policy ablation report. |
| - `docs/results/final_submission_evidence/reports/basic_llm_vs_polyguard_report.json` - baseline LLM-style policy vs full PolyGuard pipeline. |
| - `docs/results/final_submission_evidence/reports/action_traces.jsonl` - matched action traces with verifier output. |
| - `docs/results/final_submission_evidence/charts/curated/README.md` - visually reviewed chart index. |
|
|
| Per-model sweep histories: |
|
|
| - `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_history.json` |
| - `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_trl_run.json` |
| - `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_history.json` |
| - `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_trl_run.json` |
| - `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_history.json` |
| - `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_trl_run.json` |
| - `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_history.json` |
| - `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_trl_run.json` |
|
|
| Three-model submission evidence: |
|
|
| - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-0-5b-instruct/sft_history.json` |
| - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-1-5b-instruct/sft_history.json` |
| - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/sft_history.json` |
| - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_history.json` |
| - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_reward_components.jsonl` |
| - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/remote_stage_records.json` |
|
|
| Completed-run status snapshots: |
|
|
| - `docs/results/qwen_completed_runs/reports/remote_status/live_hf_status_snapshot.json` |
| - `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_completed_commands.json` |
| - `docs/results/qwen_completed_runs/reports/remote_status/qwen_1_5b_completed_commands.json` |
| - `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_1_5b_remote_stage_durations.json` |
| - `docs/results/submission_evidence/qwen_3b_continuation/training_space_runtime_status.json` |
|
|
| Legacy/local smoke logs are retained under `docs/results/active_model/`, |
| `docs/results/grpo_training_cycle/`, and `submission_bundle/` for auditability. |
|
|
| ## Model Artifacts |
|
|
| The public final artifact/evidence Space is: |
|
|
| - https://huggingface.co/spaces/adithya9903/polyguard-openenv-final-artifacts |
|
|
| The tracked local manifest is: |
|
|
| - `docs/results/final_submission_evidence/manifest.json` |
|
|
| At packaging time, Qwen 3B had SFT and GRPO adapter directories plus checkpoint |
| metadata in the final Space. Qwen 0.5B and 1.5B have reports/histories in this |
| repo, but their adapter directories were not present in the checked artifact |
| mirrors and are labeled `reports_only_or_partial`. |
|
|
| The final artifact Space and this checked-in evidence mirror are the public |
| review paths. Authenticated downloads, when needed by maintainers, are |
| operational details rather than part of the public submission narrative. |
|
|
| ## Reproduction Paths |
|
|
| Local smoke path: build the small corpus, run a short SFT pass, run a short GRPO |
| pass, validate post-save inference, and generate local reports. |
|
|
| Full HF Space path: use the one-run notebook or training Space runner when you |
| control the required Hugging Face credentials and hardware. The public evidence |
| for review is the final curated bundle, not private training commands. |
|
|