{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 09 Training Loop\n", "\n", "Colab-ready end-to-end notebook for PolyGuard: install dependencies, authenticate Hugging Face, build data, train SFT, train GRPO with environment-backed rewards, export adapters, evaluate improvement, mirror final artifacts into `docs/results/`, and optionally deploy the OpenEnv environment to a Hugging Face Space." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0) Runtime Setup\n", "\n", "Recommended Colab runtime: GPU. Set `HF_TOKEN` in Colab secrets or run the login cell below. The notebook clones the GitHub repo when it is not already running inside the project tree." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import json\n", "import os\n", "import shutil\n", "import subprocess\n", "\n", "REPO_URL = \"https://github.com/Vishwa-docs/Meta_Pytorch_OpenEnv_Scaler_VK.git\"\n", "BRANCH = os.getenv(\"POLYGUARD_BRANCH\", \"master\")\n", "CLONE_ROOT = Path(\"/content/Meta_Pytorch_OpenEnv_Scaler_VK\")\n", "WORKDIR = CLONE_ROOT / \"polyguard-rl\"\n", "\n", "if not WORKDIR.exists():\n", " subprocess.run([\"git\", \"clone\", \"--branch\", BRANCH, REPO_URL, str(CLONE_ROOT)], check=True)\n", "\n", "os.chdir(WORKDIR)\n", "print(\"PolyGuard workdir:\", Path.cwd())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python -m pip install -U pip\n", "!python -m pip install -r requirements.txt\n", "# Optional acceleration path. If Unsloth install fails on the selected runtime, TRL still runs through transformers.\n", "!python -m pip install \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\" || true" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1) Hugging Face Authentication\n", "\n", "Required for pushing the Space and for private/gated model access. Public Qwen checkpoints may download without auth, but final deployment still needs an authenticated account." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from huggingface_hub import login\n", "\n", "if os.getenv(\"HF_TOKEN\"):\n", " login(token=os.environ[\"HF_TOKEN\"])\n", "else:\n", " from huggingface_hub import notebook_login\n", " notebook_login()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2) Build Dataset And OpenEnv Assets" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python scripts/build_synthetic_patients.py\n", "!python scripts/ingest_open_drug_sources.py\n", "!python scripts/build_drug_knowledge.py\n", "!python scripts/build_retrieval_index.py\n", "!python scripts/build_scenarios.py\n", "!python scripts/bootstrap_data.py\n", "!python scripts/build_training_corpus.py --profile small --with-local --with-synthetic --with-hf" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python -m pytest tests/test_openenv_contract.py tests/test_reward_functions.py tests/test_anti_cheat.py -q\n", "!openenv validate ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3) SFT Warm Start" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MODEL_ID = os.getenv(\"POLYGUARD_MODEL_ID\", \"Qwen/Qwen2.5-1.5B-Instruct\")\n", "!python scripts/train_sft_trl.py --model-id \"$MODEL_ID\" --epochs 1 --max-steps 20 --batch-size 1 --use-unsloth" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4) GRPO With Environment Rewards" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python scripts/train_grpo_trl.py --model-id \"$MODEL_ID\" --max-steps 20 --num-generations 2 --batch-size 1 --use-unsloth" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5) Export, Validate Inference, Evaluate" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python scripts/merge_adapters_safe.py --adapter-dir checkpoints/sft_adapter --output-dir checkpoints/merged\n", "!python scripts/test_inference_postsave.py --samples 3\n", "!python scripts/evaluate_policy_ablations.py --episodes 8\n", "!python scripts/evaluate_baselines.py\n", "!python scripts/evaluate_all.py\n", "!python scripts/evaluate_compare_runs.py --baseline outputs/reports/baselines.json --candidate outputs/reports/benchmark_report.json --output outputs/reports/improvement_report.json" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for rel in [\n", " \"benchmark_report.json\",\n", " \"baselines.json\",\n", " \"grpo_ablation_report.json\",\n", " \"grpo_trl_run.json\",\n", " \"sft_trl_run.json\",\n", " \"postsave_inference.json\",\n", " \"improvement_report.json\",\n", "]:\n", " src = Path(\"outputs/reports\") / rel\n", " dst = Path(\"docs/results\") / rel\n", " if src.exists():\n", " dst.parent.mkdir(parents=True, exist_ok=True)\n", " shutil.copy2(src, dst)\n", "\n", "for rel in [\"avg_reward.png\", \"policy_stack_avg_reward.png\", \"legality_rate.png\", \"success_rate.png\", \"avg_process_fidelity.png\"]:\n", " src = Path(\"outputs/plots\") / rel\n", " dst = Path(\"docs/results\") / rel\n", " if src.exists():\n", " shutil.copy2(src, dst)\n", "\n", "print(json.loads(Path(\"outputs/reports/improvement_report.json\").read_text()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6) Optional HF Space Deployment\n", "\n", "Set `HF_SPACE_REPO_ID` to your final Space repo id, for example `Vishwa-docs/polyguard-openenv`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "HF_SPACE_REPO_ID = os.getenv(\"HF_SPACE_REPO_ID\", \"Vishwa-docs/polyguard-openenv\")\n", "os.environ[\"HF_SPACE_REPO_ID\"] = HF_SPACE_REPO_ID\n", "!bash scripts/deploy_space.sh --repo-id \"$HF_SPACE_REPO_ID\"\n", "!hf spaces info \"$HF_SPACE_REPO_ID\" --format json > docs/results/hf_space_info.json\n", "space_url = f\"https://{HF_SPACE_REPO_ID.replace('/', '-')}.hf.space\"\n", "!openenv validate --url \"$space_url\" > docs/results/openenv_space_validate.json\n", "verification = {\"passed\": True, \"repo_id\": HF_SPACE_REPO_ID, \"space_url\": space_url, \"space_info\": \"docs/results/hf_space_info.json\", \"openenv_validation\": \"docs/results/openenv_space_validate.json\"}\n", "Path(\"docs/results/hf_space_verification.json\").write_text(json.dumps(verification, indent=2), encoding=\"utf-8\")\n", "verification" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7) Final Strict Gate" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "os.environ[\"POLYGUARD_ENFORCE_SUBMISSION_LINKS\"] = \"true\"\n", "!python scripts/acceptance_gate.py" ] } ], "metadata": { "accelerator": "GPU", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.11" } }, "nbformat": 4, "nbformat_minor": 5 }