File size: 7,684 Bytes
f8a246b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# Submission Artifact Index

This page points reviewers to the shared environment, training scripts, and
training logs/results. It is intentionally path-based so the artifacts can be
found from a fresh clone without relying on local `outputs/` or `checkpoints/`
folders.

## Environment And Runtime

Core OpenEnv/runtime files:

- `openenv.yaml` - OpenEnv package entrypoint and deployment metadata.
- `server/app.py` - ASGI/FastAPI bridge used by OpenEnv validation and Space deployment.
- `app/env/env_core.py` - canonical `PolyGuardEnv` reset/step/state implementation.
- `app/env/fastapi_app.py` - HTTP API, catalog, reset, step, and candidate-step routes.
- `app/env/reward_router.py` - verifier-backed reward routing.
- `app/env/reward_scaling.py` - reward clamping/rounding to `[0.001, 0.999]`.
- `app/env/anti_cheat.py` - anti-hacking and invalid-action checks.
- `app/env/catalog.py` - task preset and sub-environment catalog.

Dependency and container files:

- `pyproject.toml` and `uv.lock` - local Python environment lock.
- `requirements.txt` - local/runtime pip dependency export.
- `requirements-space.txt` - Hugging Face Space dependency export.
- `.env.example` - non-secret environment variable template.
- `Dockerfile` - local/container runtime.
- `Dockerfile.space` - product HF Space runtime.
- `app/hf_space/Dockerfile` - HF training/evidence Space runtime.
- `configs/sft.yaml` and `configs/grpo.yaml` - train-loop defaults.
- `configs/rewards.yaml`, `configs/curriculum.yaml`, and `configs/env_*.yaml` - environment/reward/curriculum configuration.

Secrets are not committed. Hugging Face access is supplied through `HF_TOKEN`
as an environment variable or notebook/Space secret.

## Training Scripts And Notebooks

End-to-end runner notebooks:

- `PolyGuard_SFT_GRPO_One_Run_Runner.ipynb` - one-run data build, SFT, GRPO, artifact pull, inference validation, chart generation, and Space deployment.
- `notebooks/09_training_loop.ipynb` - modular walkthrough of the same loop.

Dataset and corpus scripts:

- `scripts/bootstrap_data.py`
- `scripts/build_training_corpus.py`
- `scripts/generate_sft_data.py`

SFT/GRPO training scripts:

- `scripts/train_sft_trl.py` - TRL SFT baseline.
- `scripts/train_grpo_trl.py` - TRL GRPO with environment-backed reward.
- `scripts/train_grpo_policy.py`
- `scripts/train_grpo_planner.py`
- `scripts/train_grpo_supervisor.py`
- `scripts/train_grpo_dosing.py`
- `app/training/sft_trl.py`
- `app/training/grpo_trl.py`
- `app/training/openenv_wrapper.py`
- `app/training/reward_functions.py`
- `app/training/callbacks.py`
- `app/training/checkpointing.py`

Hugging Face training/evidence scripts:

- `scripts/deploy_training_space.py` - creates/runs the GPU training Space.
- `app/hf_space/training_runner.py` - Space-side training orchestrator.
- `scripts/monitor_training_space_status.py` - Space status/log monitor.
- `scripts/pull_training_artifacts.py` - artifact puller from the HF model repo.
- `scripts/deploy_evidence_space.py` and `app/hf_space/evidence_runner.py` - evaluation-only evidence Space.
- `scripts/generate_hf_training_report.py` - training/sweep chart generation.
- `scripts/generate_submission_evidence.py` - evidence bundle generation without retraining.
- `scripts/deploy_final_artifact_space.py` - packages final public evidence/model artifacts into the final HF Space.

Post-training and inference scripts:

- `scripts/merge_adapters_safe.py`
- `scripts/test_inference_postsave.py`
- `scripts/benchmark_inference.py`
- `scripts/activate_sweep_model.py`
- `scripts/install_hf_active_bundle.py`

## Training Logs And Result Evidence

Final curated evidence:

- `docs/results/final_submission_evidence/README.md` - final evidence overview.
- `docs/results/final_submission_evidence/manifest.json` - artifact availability and final HF Space manifest.
- `docs/results/final_submission_evidence/reports/submission_summary.json` - final three-model summary.
- `docs/results/final_submission_evidence/reports/grpo_trl_run.json` - Qwen 3B GRPO training run report.
- `docs/results/final_submission_evidence/reports/postsave_inference_grpo.json` - post-save GRPO inference check.
- `docs/results/final_submission_evidence/reports/grpo_ablation_report.json` - GRPO/policy ablation report.
- `docs/results/final_submission_evidence/reports/basic_llm_vs_polyguard_report.json` - baseline LLM-style policy vs full PolyGuard pipeline.
- `docs/results/final_submission_evidence/reports/action_traces.jsonl` - matched action traces with verifier output.
- `docs/results/final_submission_evidence/charts/curated/README.md` - visually reviewed chart index.

Per-model sweep histories:

- `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_history.json`
- `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_trl_run.json`
- `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_history.json`
- `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_trl_run.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_history.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_trl_run.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_history.json`
- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_trl_run.json`

Three-model submission evidence:

- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-0-5b-instruct/sft_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-1-5b-instruct/sft_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/sft_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_history.json`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_reward_components.jsonl`
- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/remote_stage_records.json`

Completed-run status snapshots:

- `docs/results/qwen_completed_runs/reports/remote_status/live_hf_status_snapshot.json`
- `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_completed_commands.json`
- `docs/results/qwen_completed_runs/reports/remote_status/qwen_1_5b_completed_commands.json`
- `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_1_5b_remote_stage_durations.json`
- `docs/results/submission_evidence/qwen_3b_continuation/training_space_runtime_status.json`

Legacy/local smoke logs are retained under `docs/results/active_model/`,
`docs/results/grpo_training_cycle/`, and `submission_bundle/` for auditability.

## Model Artifacts

The public final artifact/evidence Space is:

- https://huggingface.co/spaces/adithya9903/polyguard-openenv-final-artifacts

The tracked local manifest is:

- `docs/results/final_submission_evidence/manifest.json`

At packaging time, Qwen 3B had SFT and GRPO adapter directories plus checkpoint
metadata in the final Space. Qwen 0.5B and 1.5B have reports/histories in this
repo, but their adapter directories were not present in the checked artifact
mirrors and are labeled `reports_only_or_partial`.

The final artifact Space and this checked-in evidence mirror are the public
review paths. Authenticated downloads, when needed by maintainers, are
operational details rather than part of the public submission narrative.

## Reproduction Paths

Local smoke path: build the small corpus, run a short SFT pass, run a short GRPO
pass, validate post-save inference, and generate local reports.

Full HF Space path: use the one-run notebook or training Space runner when you
control the required Hugging Face credentials and hardware. The public evidence
for review is the final curated bundle, not private training commands.