File size: 7,684 Bytes
f8a246b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | # Submission Artifact Index
This page points reviewers to the shared environment, training scripts, and
training logs/results. It is intentionally path-based so the artifacts can be
found from a fresh clone without relying on local `outputs/` or `checkpoints/`
folders.
## Environment And Runtime
Core OpenEnv/runtime files:
- `openenv.yaml` - OpenEnv package entrypoint and deployment metadata.
- `server/app.py` - ASGI/FastAPI bridge used by OpenEnv validation and Space deployment.
- `app/env/env_core.py` - canonical `PolyGuardEnv` reset/step/state implementation.
- `app/env/fastapi_app.py` - HTTP API, catalog, reset, step, and candidate-step routes.
- `app/env/reward_router.py` - verifier-backed reward routing.
- `app/env/reward_scaling.py` - reward clamping/rounding to `[0.001, 0.999]`.
- `app/env/anti_cheat.py` - anti-hacking and invalid-action checks.
- `app/env/catalog.py` - task preset and sub-environment catalog.
Dependency and container files:
- `pyproject.toml` and `uv.lock` - local Python environment lock.
- `requirements.txt` - local/runtime pip dependency export.
- `requirements-space.txt` - Hugging Face Space dependency export.
- `.env.example` - non-secret environment variable template.
- `Dockerfile` - local/container runtime.
- `Dockerfile.space` - product HF Space runtime.
- `app/hf_space/Dockerfile` - HF training/evidence Space runtime.
- `configs/sft.yaml` and `configs/grpo.yaml` - train-loop defaults.
- `configs/rewards.yaml`, `configs/curriculum.yaml`, and `configs/env_*.yaml` - environment/reward/curriculum configuration.
Secrets are not committed. Hugging Face access is supplied through `HF_TOKEN`
as an environment variable or notebook/Space secret.
## Training Scripts And Notebooks
End-to-end runner notebooks:
- `PolyGuard_SFT_GRPO_One_Run_Runner.ipynb` - one-run data build, SFT, GRPO, artifact pull, inference validation, chart generation, and Space deployment.
- `notebooks/09_training_loop.ipynb` - modular walkthrough of the same loop.
Dataset and corpus scripts:
- `scripts/bootstrap_data.py`
- `scripts/build_training_corpus.py`
- `scripts/generate_sft_data.py`
SFT/GRPO training scripts:
- `scripts/train_sft_trl.py` - TRL SFT baseline.
- `scripts/train_grpo_trl.py` - TRL GRPO with environment-backed reward.
- `scripts/train_grpo_policy.py`
- `scripts/train_grpo_planner.py`
- `scripts/train_grpo_supervisor.py`
- `scripts/train_grpo_dosing.py`
- `app/training/sft_trl.py`
- `app/training/grpo_trl.py`
- `app/training/openenv_wrapper.py`
- `app/training/reward_functions.py`
- `app/training/callbacks.py`
- `app/training/checkpointing.py`
Hugging Face training/evidence scripts:
- `scripts/deploy_training_space.py` - creates/runs the GPU training Space.
- `app/hf_space/training_runner.py` - Space-side training orchestrator.
- `scripts/monitor_training_space_status.py` - Space status/log monitor.
- `scripts/pull_training_artifacts.py` - artifact puller from the HF model repo.
- `scripts/deploy_evidence_space.py` and `app/hf_space/evidence_runner.py` - evaluation-only evidence Space.
- `scripts/generate_hf_training_report.py` - training/sweep chart generation.
- `scripts/generate_submission_evidence.py` - evidence bundle generation without retraining.
- `scripts/deploy_final_artifact_space.py` - packages final public evidence/model artifacts into the final HF Space.
Post-training and inference scripts:
- `scripts/merge_adapters_safe.py`
- `scripts/test_inference_postsave.py`
- `scripts/benchmark_inference.py`
- `scripts/activate_sweep_model.py`
- `scripts/install_hf_active_bundle.py`
## Training Logs And Result Evidence
Final curated evidence:
- `docs/results/final_submission_evidence/README.md` - final evidence overview.
- `docs/results/final_submission_evidence/manifest.json` - artifact availability and final HF Space manifest.
- `docs/results/final_submission_evidence/reports/submission_summary.json` - final three-model summary.
- `docs/results/final_submission_evidence/reports/grpo_trl_run.json` - Qwen 3B GRPO training run report.
- `docs/results/final_submission_evidence/reports/postsave_inference_grpo.json` - post-save GRPO inference check.
- `docs/results/final_submission_evidence/reports/grpo_ablation_report.json` - GRPO/policy ablation report.
- `docs/results/final_submission_evidence/reports/basic_llm_vs_polyguard_report.json` - baseline LLM-style policy vs full PolyGuard pipeline.
- `docs/results/final_submission_evidence/reports/action_traces.jsonl` - matched action traces with verifier output.
- `docs/results/final_submission_evidence/charts/curated/README.md` - visually reviewed chart index.
Per-model sweep histories:
- `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_history.json`
- `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_trl_run.json`
- `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_history.json`
- `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_trl_run.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_history.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_trl_run.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_history.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_trl_run.json`
Three-model submission evidence:
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-0-5b-instruct/sft_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-1-5b-instruct/sft_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/sft_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_reward_components.jsonl`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/remote_stage_records.json`
Completed-run status snapshots:
- `docs/results/qwen_completed_runs/reports/remote_status/live_hf_status_snapshot.json`
- `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_completed_commands.json`
- `docs/results/qwen_completed_runs/reports/remote_status/qwen_1_5b_completed_commands.json`
- `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_1_5b_remote_stage_durations.json`
- `docs/results/submission_evidence/qwen_3b_continuation/training_space_runtime_status.json`
Legacy/local smoke logs are retained under `docs/results/active_model/`,
`docs/results/grpo_training_cycle/`, and `submission_bundle/` for auditability.
## Model Artifacts
The public final artifact/evidence Space is:
- https://huggingface.co/spaces/adithya9903/polyguard-openenv-final-artifacts
The tracked local manifest is:
- `docs/results/final_submission_evidence/manifest.json`
At packaging time, Qwen 3B had SFT and GRPO adapter directories plus checkpoint
metadata in the final Space. Qwen 0.5B and 1.5B have reports/histories in this
repo, but their adapter directories were not present in the checked artifact
mirrors and are labeled `reports_only_or_partial`.
The final artifact Space and this checked-in evidence mirror are the public
review paths. Authenticated downloads, when needed by maintainers, are
operational details rather than part of the public submission narrative.
## Reproduction Paths
Local smoke path: build the small corpus, run a short SFT pass, run a short GRPO
pass, validate post-save inference, and generate local reports.
Full HF Space path: use the one-run notebook or training Space runner when you
control the required Hugging Face credentials and hardware. The public evidence
for review is the final curated bundle, not private training commands.
|