Spaces:

TheJackBright
/

polyguard-openenv-workbench

Sleeping

App Files Files Community

polyguard-openenv-workbench / polyguard-rl /docs /submission_artifacts.md

TheJackBright

Deploy GitHub root master to Space

c296d62 12 days ago

preview code

raw

history blame contribute delete

7.68 kB

	# Submission Artifact Index

	This page points reviewers to the shared environment, training scripts, and
	training logs/results. It is intentionally path-based so the artifacts can be
	found from a fresh clone without relying on local `outputs/` or `checkpoints/`
	folders.

	## Environment And Runtime

	Core OpenEnv/runtime files:

	- `openenv.yaml` - OpenEnv package entrypoint and deployment metadata.
	- `server/app.py` - ASGI/FastAPI bridge used by OpenEnv validation and Space deployment.
	- `app/env/env_core.py` - canonical `PolyGuardEnv` reset/step/state implementation.
	- `app/env/fastapi_app.py` - HTTP API, catalog, reset, step, and candidate-step routes.
	- `app/env/reward_router.py` - verifier-backed reward routing.
	- `app/env/reward_scaling.py` - reward clamping/rounding to `[0.001, 0.999]`.
	- `app/env/anti_cheat.py` - anti-hacking and invalid-action checks.
	- `app/env/catalog.py` - task preset and sub-environment catalog.

	Dependency and container files:

	- `pyproject.toml` and `uv.lock` - local Python environment lock.
	- `requirements.txt` - local/runtime pip dependency export.
	- `requirements-space.txt` - Hugging Face Space dependency export.
	- `.env.example` - non-secret environment variable template.
	- `Dockerfile` - local/container runtime.
	- `Dockerfile.space` - product HF Space runtime.
	- `app/hf_space/Dockerfile` - HF training/evidence Space runtime.
	- `configs/sft.yaml` and `configs/grpo.yaml` - train-loop defaults.
	- `configs/rewards.yaml`, `configs/curriculum.yaml`, and `configs/env_*.yaml` - environment/reward/curriculum configuration.

	Secrets are not committed. Hugging Face access is supplied through `HF_TOKEN`
	as an environment variable or notebook/Space secret.

	## Training Scripts And Notebooks

	End-to-end runner notebooks:

	- `PolyGuard_SFT_GRPO_One_Run_Runner.ipynb` - one-run data build, SFT, GRPO, artifact pull, inference validation, chart generation, and Space deployment.
	- `notebooks/09_training_loop.ipynb` - modular walkthrough of the same loop.

	Dataset and corpus scripts:

	- `scripts/bootstrap_data.py`
	- `scripts/build_training_corpus.py`
	- `scripts/generate_sft_data.py`

	SFT/GRPO training scripts:

	- `scripts/train_sft_trl.py` - TRL SFT baseline.
	- `scripts/train_grpo_trl.py` - TRL GRPO with environment-backed reward.
	- `scripts/train_grpo_policy.py`
	- `scripts/train_grpo_planner.py`
	- `scripts/train_grpo_supervisor.py`
	- `scripts/train_grpo_dosing.py`
	- `app/training/sft_trl.py`
	- `app/training/grpo_trl.py`
	- `app/training/openenv_wrapper.py`
	- `app/training/reward_functions.py`
	- `app/training/callbacks.py`
	- `app/training/checkpointing.py`

	Hugging Face training/evidence scripts:

	- `scripts/deploy_training_space.py` - creates/runs the GPU training Space.
	- `app/hf_space/training_runner.py` - Space-side training orchestrator.
	- `scripts/monitor_training_space_status.py` - Space status/log monitor.
	- `scripts/pull_training_artifacts.py` - artifact puller from the HF model repo.
	- `scripts/deploy_evidence_space.py` and `app/hf_space/evidence_runner.py` - evaluation-only evidence Space.
	- `scripts/generate_hf_training_report.py` - training/sweep chart generation.
	- `scripts/generate_submission_evidence.py` - evidence bundle generation without retraining.
	- `scripts/deploy_final_artifact_space.py` - packages final public evidence/model artifacts into the final HF Space.

	Post-training and inference scripts:

	- `scripts/merge_adapters_safe.py`
	- `scripts/test_inference_postsave.py`
	- `scripts/benchmark_inference.py`
	- `scripts/activate_sweep_model.py`
	- `scripts/install_hf_active_bundle.py`

	## Training Logs And Result Evidence

	Final curated evidence:

	- `docs/results/final_submission_evidence/README.md` - final evidence overview.
	- `docs/results/final_submission_evidence/manifest.json` - artifact availability and final HF Space manifest.
	- `docs/results/final_submission_evidence/reports/submission_summary.json` - final three-model summary.
	- `docs/results/final_submission_evidence/reports/grpo_trl_run.json` - Qwen 3B GRPO training run report.
	- `docs/results/final_submission_evidence/reports/postsave_inference_grpo.json` - post-save GRPO inference check.
	- `docs/results/final_submission_evidence/reports/grpo_ablation_report.json` - GRPO/policy ablation report.
	- `docs/results/final_submission_evidence/reports/basic_llm_vs_polyguard_report.json` - baseline LLM-style policy vs full PolyGuard pipeline.
	- `docs/results/final_submission_evidence/reports/action_traces.jsonl` - matched action traces with verifier output.
	- `docs/results/final_submission_evidence/charts/curated/README.md` - visually reviewed chart index.

	Per-model sweep histories:

	- `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_history.json`
	- `docs/results/sweeps/qwen-qwen2-5-0-5b-instruct/sft_trl_run.json`
	- `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_history.json`
	- `docs/results/sweeps/qwen-qwen2-5-1-5b-instruct/sft_trl_run.json`
	- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_history.json`
	- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/sft_trl_run.json`
	- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_history.json`
	- `docs/results/sweeps/qwen-qwen2-5-3b-instruct/grpo_trl_run.json`

	Three-model submission evidence:

	- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-0-5b-instruct/sft_history.json`
	- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-1-5b-instruct/sft_history.json`
	- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/sft_history.json`
	- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_history.json`
	- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/runs/qwen-qwen2-5-3b-instruct/grpo_reward_components.jsonl`
	- `docs/results/submission_evidence_qwen_0_5b_1_5b_3b/reports/remote_stage_records.json`

	Completed-run status snapshots:

	- `docs/results/qwen_completed_runs/reports/remote_status/live_hf_status_snapshot.json`
	- `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_completed_commands.json`
	- `docs/results/qwen_completed_runs/reports/remote_status/qwen_1_5b_completed_commands.json`
	- `docs/results/qwen_completed_runs/reports/remote_status/qwen_0_5b_1_5b_remote_stage_durations.json`
	- `docs/results/submission_evidence/qwen_3b_continuation/training_space_runtime_status.json`

	Legacy/local smoke logs are retained under `docs/results/active_model/`,
	`docs/results/grpo_training_cycle/`, and `submission_bundle/` for auditability.

	## Model Artifacts

	The public final artifact/evidence Space is:

	- https://huggingface.co/spaces/adithya9903/polyguard-openenv-final-artifacts

	The tracked local manifest is:

	- `docs/results/final_submission_evidence/manifest.json`

	At packaging time, Qwen 3B had SFT and GRPO adapter directories plus checkpoint
	metadata in the final Space. Qwen 0.5B and 1.5B have reports/histories in this
	repo, but their adapter directories were not present in the checked artifact
	mirrors and are labeled `reports_only_or_partial`.

	The final artifact Space and this checked-in evidence mirror are the public
	review paths. Authenticated downloads, when needed by maintainers, are
	operational details rather than part of the public submission narrative.

	## Reproduction Paths

	Local smoke path: build the small corpus, run a short SFT pass, run a short GRPO
	pass, validate post-save inference, and generate local reports.

	Full HF Space path: use the one-run notebook or training Space runner when you
	control the required Hugging Face credentials and hardware. The public evidence
	for review is the final curated bundle, not private training commands.