Spaces:

TheJackBright
/

polyguard-openenv-workbench

Sleeping

App Files Files Community

polyguard-openenv-workbench / polyguard-rl /docs /submission_artifacts.md

TheJackBright

Deploy GitHub root master to Space

c296d62 11 days ago

preview code

raw

history blame contribute delete

7.68 kB

Submission Artifact Index

This page points reviewers to the shared environment, training scripts, and training logs/results. It is intentionally path-based so the artifacts can be found from a fresh clone without relying on local outputs/ or checkpoints/ folders.

Environment And Runtime

Core OpenEnv/runtime files:

openenv.yaml - OpenEnv package entrypoint and deployment metadata.
server/app.py - ASGI/FastAPI bridge used by OpenEnv validation and Space deployment.
app/env/env_core.py - canonical PolyGuardEnv reset/step/state implementation.
app/env/fastapi_app.py - HTTP API, catalog, reset, step, and candidate-step routes.
app/env/reward_router.py - verifier-backed reward routing.
app/env/reward_scaling.py - reward clamping/rounding to [0.001, 0.999].
app/env/anti_cheat.py - anti-hacking and invalid-action checks.
app/env/catalog.py - task preset and sub-environment catalog.

Dependency and container files:

pyproject.toml and uv.lock - local Python environment lock.
requirements.txt - local/runtime pip dependency export.
requirements-space.txt - Hugging Face Space dependency export.
.env.example - non-secret environment variable template.
Dockerfile - local/container runtime.
Dockerfile.space - product HF Space runtime.
app/hf_space/Dockerfile - HF training/evidence Space runtime.
configs/sft.yaml and configs/grpo.yaml - train-loop defaults.
configs/rewards.yaml, configs/curriculum.yaml, and configs/env_*.yaml - environment/reward/curriculum configuration.

Secrets are not committed. Hugging Face access is supplied through HF_TOKEN as an environment variable or notebook/Space secret.

Training Scripts And Notebooks

End-to-end runner notebooks:

PolyGuard_SFT_GRPO_One_Run_Runner.ipynb - one-run data build, SFT, GRPO, artifact pull, inference validation, chart generation, and Space deployment.
notebooks/09_training_loop.ipynb - modular walkthrough of the same loop.

Dataset and corpus scripts:

scripts/bootstrap_data.py
scripts/build_training_corpus.py
scripts/generate_sft_data.py

SFT/GRPO training scripts:

scripts/train_sft_trl.py - TRL SFT baseline.
scripts/train_grpo_trl.py - TRL GRPO with environment-backed reward.
scripts/train_grpo_policy.py
scripts/train_grpo_planner.py
scripts/train_grpo_supervisor.py
scripts/train_grpo_dosing.py
app/training/sft_trl.py
app/training/grpo_trl.py
app/training/openenv_wrapper.py
app/training/reward_functions.py
app/training/callbacks.py
app/training/checkpointing.py

Hugging Face training/evidence scripts:

scripts/deploy_training_space.py - creates/runs the GPU training Space.
app/hf_space/training_runner.py - Space-side training orchestrator.
scripts/monitor_training_space_status.py - Space status/log monitor.
scripts/pull_training_artifacts.py - artifact puller from the HF model repo.
scripts/deploy_evidence_space.py and app/hf_space/evidence_runner.py - evaluation-only evidence Space.
scripts/generate_hf_training_report.py - training/sweep chart generation.
scripts/generate_submission_evidence.py - evidence bundle generation without retraining.
scripts/deploy_final_artifact_space.py - packages final public evidence/model artifacts into the final HF Space.

Post-training and inference scripts:

scripts/merge_adapters_safe.py
scripts/test_inference_postsave.py
scripts/benchmark_inference.py
scripts/activate_sweep_model.py
scripts/install_hf_active_bundle.py

Training Logs And Result Evidence

Final curated evidence:

docs/results/final_submission_evidence/README.md - final evidence overview.
docs/results/final_submission_evidence/manifest.json - artifact availability and final HF Space manifest.
docs/results/final_submission_evidence/reports/submission_summary.json - final three-model summary.
docs/results/final_submission_evidence/reports/grpo_trl_run.json - Qwen 3B GRPO training run report.
docs/results/final_submission_evidence/reports/postsave_inference_grpo.json - post-save GRPO inference check.
docs/results/final_submission_evidence/reports/grpo_ablation_report.json - GRPO/policy ablation report.
docs/results/final_submission_evidence/reports/basic_llm_vs_polyguard_report.json - baseline LLM-style policy vs full PolyGuard pipeline.
docs/results/final_submission_evidence/reports/action_traces.jsonl - matched action traces with verifier output.
docs/results/final_submission_evidence/charts/curated/README.md - visually reviewed chart index.

Per-model sweep histories:

docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_history.json
docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_trl_run.json
docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_history.json
docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_trl_run.json
docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_history.json
docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_trl_run.json
docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_history.json
docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_trl_run.json

Three-model submission evidence:

docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-0-5b-instruct/sft_history.json
docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-1-5b-instruct/sft_history.json
docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/sft_history.json
docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_history.json
docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_reward_components.jsonl
docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/remote_stage_records.json

Completed-run status snapshots:

docs/results/qwen_completed_runs/reports/remote_status/live_hf_status_snapshot.json
docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_completed_commands.json
docs/results/qwen_completed_runs/reports/remote_status/qwen_1_5b_completed_commands.json
docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_1_5b_remote_stage_durations.json
docs/results/submission_evidence/qwen_3b_continuation/training_space_runtime_status.json

Legacy/local smoke logs are retained under docs/results/active_model/, docs/results/grpo_training_cycle/, and submission_bundle/ for auditability.

Model Artifacts

The public final artifact/evidence Space is:

https://huggingface.co/spaces/adithya9903/polyguard-openenv-final-artifacts

The tracked local manifest is:

docs/results/final_submission_evidence/manifest.json

At packaging time, Qwen 3B had SFT and GRPO adapter directories plus checkpoint metadata in the final Space. Qwen 0.5B and 1.5B have reports/histories in this repo, but their adapter directories were not present in the checked artifact mirrors and are labeled reports_only_or_partial.

The final artifact Space and this checked-in evidence mirror are the public review paths. Authenticated downloads, when needed by maintainers, are operational details rather than part of the public submission narrative.

Reproduction Paths

Local smoke path: build the small corpus, run a short SFT pass, run a short GRPO pass, validate post-save inference, and generate local reports.

Full HF Space path: use the one-run notebook or training Space runner when you control the required Hugging Face credentials and hardware. The public evidence for review is the final curated bundle, not private training commands.