Submission Artifact Index
This page points reviewers to the shared environment, training scripts, and
training logs/results. It is intentionally path-based so the artifacts can be
found from a fresh clone without relying on local outputs/ or checkpoints/
folders.
Environment And Runtime
Core OpenEnv/runtime files:
openenv.yaml- OpenEnv package entrypoint and deployment metadata.server/app.py- ASGI/FastAPI bridge used by OpenEnv validation and Space deployment.app/env/env_core.py- canonicalPolyGuardEnvreset/step/state implementation.app/env/fastapi_app.py- HTTP API, catalog, reset, step, and candidate-step routes.app/env/reward_router.py- verifier-backed reward routing.app/env/reward_scaling.py- reward clamping/rounding to[0.001, 0.999].app/env/anti_cheat.py- anti-hacking and invalid-action checks.app/env/catalog.py- task preset and sub-environment catalog.
Dependency and container files:
pyproject.tomlanduv.lock- local Python environment lock.requirements.txt- local/runtime pip dependency export.requirements-space.txt- Hugging Face Space dependency export..env.example- non-secret environment variable template.Dockerfile- local/container runtime.Dockerfile.space- product HF Space runtime.app/hf_space/Dockerfile- HF training/evidence Space runtime.configs/sft.yamlandconfigs/grpo.yaml- train-loop defaults.configs/rewards.yaml,configs/curriculum.yaml, andconfigs/env_*.yaml- environment/reward/curriculum configuration.
Secrets are not committed. Hugging Face access is supplied through HF_TOKEN
as an environment variable or notebook/Space secret.
Training Scripts And Notebooks
End-to-end runner notebooks:
PolyGuard_SFT_GRPO_One_Run_Runner.ipynb- one-run data build, SFT, GRPO, artifact pull, inference validation, chart generation, and Space deployment.notebooks/09_training_loop.ipynb- modular walkthrough of the same loop.
Dataset and corpus scripts:
scripts/bootstrap_data.pyscripts/build_training_corpus.pyscripts/generate_sft_data.py
SFT/GRPO training scripts:
scripts/train_sft_trl.py- TRL SFT baseline.scripts/train_grpo_trl.py- TRL GRPO with environment-backed reward.scripts/train_grpo_policy.pyscripts/train_grpo_planner.pyscripts/train_grpo_supervisor.pyscripts/train_grpo_dosing.pyapp/training/sft_trl.pyapp/training/grpo_trl.pyapp/training/openenv_wrapper.pyapp/training/reward_functions.pyapp/training/callbacks.pyapp/training/checkpointing.py
Hugging Face training/evidence scripts:
scripts/deploy_training_space.py- creates/runs the GPU training Space.app/hf_space/training_runner.py- Space-side training orchestrator.scripts/monitor_training_space_status.py- Space status/log monitor.scripts/pull_training_artifacts.py- artifact puller from the HF model repo.scripts/deploy_evidence_space.pyandapp/hf_space/evidence_runner.py- evaluation-only evidence Space.scripts/generate_hf_training_report.py- training/sweep chart generation.scripts/generate_submission_evidence.py- evidence bundle generation without retraining.scripts/deploy_final_artifact_space.py- packages final public evidence/model artifacts into the final HF Space.
Post-training and inference scripts:
scripts/merge_adapters_safe.pyscripts/test_inference_postsave.pyscripts/benchmark_inference.pyscripts/activate_sweep_model.pyscripts/install_hf_active_bundle.py
Training Logs And Result Evidence
Final curated evidence:
docs/results/final_submission_evidence/README.md- final evidence overview.docs/results/final_submission_evidence/manifest.json- artifact availability and final HF Space manifest.docs/results/final_submission_evidence/reports/submission_summary.json- final three-model summary.docs/results/final_submission_evidence/reports/grpo_trl_run.json- Qwen 3B GRPO training run report.docs/results/final_submission_evidence/reports/postsave_inference_grpo.json- post-save GRPO inference check.docs/results/final_submission_evidence/reports/grpo_ablation_report.json- GRPO/policy ablation report.docs/results/final_submission_evidence/reports/basic_llm_vs_polyguard_report.json- baseline LLM-style policy vs full PolyGuard pipeline.docs/results/final_submission_evidence/reports/action_traces.jsonl- matched action traces with verifier output.docs/results/final_submission_evidence/charts/curated/README.md- visually reviewed chart index.
Per-model sweep histories:
docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_history.jsondocs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_trl_run.jsondocs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_history.jsondocs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_trl_run.jsondocs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_history.jsondocs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_trl_run.jsondocs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_history.jsondocs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_trl_run.json
Three-model submission evidence:
docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-0-5b-instruct/sft_history.jsondocs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-1-5b-instruct/sft_history.jsondocs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/sft_history.jsondocs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_history.jsondocs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_reward_components.jsonldocs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/remote_stage_records.json
Completed-run status snapshots:
docs/results/qwen_completed_runs/reports/remote_status/live_hf_status_snapshot.jsondocs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_completed_commands.jsondocs/results/qwen_completed_runs/reports/remote_status/qwen_1_5b_completed_commands.jsondocs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_1_5b_remote_stage_durations.jsondocs/results/submission_evidence/qwen_3b_continuation/training_space_runtime_status.json
Legacy/local smoke logs are retained under docs/results/active_model/,
docs/results/grpo_training_cycle/, and submission_bundle/ for auditability.
Model Artifacts
The public final artifact/evidence Space is:
The tracked local manifest is:
docs/results/final_submission_evidence/manifest.json
At packaging time, Qwen 3B had SFT and GRPO adapter directories plus checkpoint
metadata in the final Space. Qwen 0.5B and 1.5B have reports/histories in this
repo, but their adapter directories were not present in the checked artifact
mirrors and are labeled reports_only_or_partial.
The final artifact Space and this checked-in evidence mirror are the public review paths. Authenticated downloads, when needed by maintainers, are operational details rather than part of the public submission narrative.
Reproduction Paths
Local smoke path: build the small corpus, run a short SFT pass, run a short GRPO pass, validate post-save inference, and generate local reports.
Full HF Space path: use the one-run notebook or training Space runner when you control the required Hugging Face credentials and hardware. The public evidence for review is the final curated bundle, not private training commands.