# Submission Artifact Index This page points reviewers to the shared environment, training scripts, and training logs/results. It is intentionally path-based so the artifacts can be found from a fresh clone without relying on local `outputs/` or `checkpoints/` folders. ## Environment And Runtime Core OpenEnv/runtime files: - `openenv.yaml` - OpenEnv package entrypoint and deployment metadata. - `server/app.py` - ASGI/FastAPI bridge used by OpenEnv validation and Space deployment. - `app/env/env_core.py` - canonical `PolyGuardEnv` reset/step/state implementation. - `app/env/fastapi_app.py` - HTTP API, catalog, reset, step, and candidate-step routes. - `app/env/reward_router.py` - verifier-backed reward routing. - `app/env/reward_scaling.py` - reward clamping/rounding to `[0.001, 0.999]`. - `app/env/anti_cheat.py` - anti-hacking and invalid-action checks. - `app/env/catalog.py` - task preset and sub-environment catalog. Dependency and container files: - `pyproject.toml` and `uv.lock` - local Python environment lock. - `requirements.txt` - local/runtime pip dependency export. - `requirements-space.txt` - Hugging Face Space dependency export. - `.env.example` - non-secret environment variable template. - `Dockerfile` - local/container runtime. - `Dockerfile.space` - product HF Space runtime. - `app/hf_space/Dockerfile` - HF training/evidence Space runtime. - `configs/sft.yaml` and `configs/grpo.yaml` - train-loop defaults. - `configs/rewards.yaml`, `configs/curriculum.yaml`, and `configs/env_*.yaml` - environment/reward/curriculum configuration. Secrets are not committed. Hugging Face access is supplied through `HF_TOKEN` as an environment variable or notebook/Space secret. ## Training Scripts And Notebooks End-to-end runner notebooks: - `PolyGuard_SFT_GRPO_One_Run_Runner.ipynb` - one-run data build, SFT, GRPO, artifact pull, inference validation, chart generation, and Space deployment. - `notebooks/09_training_loop.ipynb` - modular walkthrough of the same loop. Dataset and corpus scripts: - `scripts/bootstrap_data.py` - `scripts/build_training_corpus.py` - `scripts/generate_sft_data.py` SFT/GRPO training scripts: - `scripts/train_sft_trl.py` - TRL SFT baseline. - `scripts/train_grpo_trl.py` - TRL GRPO with environment-backed reward. - `scripts/train_grpo_policy.py` - `scripts/train_grpo_planner.py` - `scripts/train_grpo_supervisor.py` - `scripts/train_grpo_dosing.py` - `app/training/sft_trl.py` - `app/training/grpo_trl.py` - `app/training/openenv_wrapper.py` - `app/training/reward_functions.py` - `app/training/callbacks.py` - `app/training/checkpointing.py` Hugging Face training/evidence scripts: - `scripts/deploy_training_space.py` - creates/runs the GPU training Space. - `app/hf_space/training_runner.py` - Space-side training orchestrator. - `scripts/monitor_training_space_status.py` - Space status/log monitor. - `scripts/pull_training_artifacts.py` - artifact puller from the HF model repo. - `scripts/deploy_evidence_space.py` and `app/hf_space/evidence_runner.py` - evaluation-only evidence Space. - `scripts/generate_hf_training_report.py` - training/sweep chart generation. - `scripts/generate_submission_evidence.py` - evidence bundle generation without retraining. - `scripts/deploy_final_artifact_space.py` - packages final public evidence/model artifacts into the final HF Space. Post-training and inference scripts: - `scripts/merge_adapters_safe.py` - `scripts/test_inference_postsave.py` - `scripts/benchmark_inference.py` - `scripts/activate_sweep_model.py` - `scripts/install_hf_active_bundle.py` ## Training Logs And Result Evidence Final curated evidence: - `docs/results/final_submission_evidence/README.md` - final evidence overview. - `docs/results/final_submission_evidence/manifest.json` - artifact availability and final HF Space manifest. - `docs/results/final_submission_evidence/reports/submission_summary.json` - final three-model summary. - `docs/results/final_submission_evidence/reports/grpo_trl_run.json` - Qwen 3B GRPO training run report. - `docs/results/final_submission_evidence/reports/postsave_inference_grpo.json` - post-save GRPO inference check. - `docs/results/final_submission_evidence/reports/grpo_ablation_report.json` - GRPO/policy ablation report. - `docs/results/final_submission_evidence/reports/basic_llm_vs_polyguard_report.json` - baseline LLM-style policy vs full PolyGuard pipeline. - `docs/results/final_submission_evidence/reports/action_traces.jsonl` - matched action traces with verifier output. - `docs/results/final_submission_evidence/charts/curated/README.md` - visually reviewed chart index. Per-model sweep histories: - `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_history.json` - `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_trl_run.json` - `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_history.json` - `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_trl_run.json` - `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_history.json` - `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_trl_run.json` - `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_history.json` - `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_trl_run.json` Three-model submission evidence: - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-0-5b-instruct/sft_history.json` - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-1-5b-instruct/sft_history.json` - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/sft_history.json` - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_history.json` - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_reward_components.jsonl` - `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/remote_stage_records.json` Completed-run status snapshots: - `docs/results/qwen_completed_runs/reports/remote_status/live_hf_status_snapshot.json` - `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_completed_commands.json` - `docs/results/qwen_completed_runs/reports/remote_status/qwen_1_5b_completed_commands.json` - `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_1_5b_remote_stage_durations.json` - `docs/results/submission_evidence/qwen_3b_continuation/training_space_runtime_status.json` Legacy/local smoke logs are retained under `docs/results/active_model/`, `docs/results/grpo_training_cycle/`, and `submission_bundle/` for auditability. ## Model Artifacts The public final artifact/evidence Space is: - https://huggingface.co/spaces/adithya9903/polyguard-openenv-final-artifacts The tracked local manifest is: - `docs/results/final_submission_evidence/manifest.json` At packaging time, Qwen 3B had SFT and GRPO adapter directories plus checkpoint metadata in the final Space. Qwen 0.5B and 1.5B have reports/histories in this repo, but their adapter directories were not present in the checked artifact mirrors and are labeled `reports_only_or_partial`. The final artifact Space and this checked-in evidence mirror are the public review paths. Authenticated downloads, when needed by maintainers, are operational details rather than part of the public submission narrative. ## Reproduction Paths Local smoke path: build the small corpus, run a short SFT pass, run a short GRPO pass, validate post-save inference, and generate local reports. Full HF Space path: use the one-run notebook or training Space runner when you control the required Hugging Face credentials and hardware. The public evidence for review is the final curated bundle, not private training commands.