{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 09 Training Loop\n",
        "\n",
        "Colab-ready end-to-end notebook for PolyGuard: install dependencies, authenticate Hugging Face, build data, train SFT, train GRPO with environment-backed rewards, export adapters, evaluate improvement, mirror final artifacts into `docs/results/`, and optionally deploy the OpenEnv environment to a Hugging Face Space."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 0) Runtime Setup\n",
        "\n",
        "Recommended Colab runtime: GPU. Set `HF_TOKEN` in Colab secrets or run the login cell below. The notebook clones the GitHub repo when it is not already running inside the project tree."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from pathlib import Path\n",
        "import json\n",
        "import os\n",
        "import shutil\n",
        "import subprocess\n",
        "\n",
        "REPO_URL = \"https://github.com/Vishwa-docs/Meta_Pytorch_OpenEnv_Scaler_VK.git\"\n",
        "BRANCH = os.getenv(\"POLYGUARD_BRANCH\", \"master\")\n",
        "CLONE_ROOT = Path(\"/content/Meta_Pytorch_OpenEnv_Scaler_VK\")\n",
        "WORKDIR = CLONE_ROOT / \"polyguard-rl\"\n",
        "\n",
        "if not WORKDIR.exists():\n",
        "    subprocess.run([\"git\", \"clone\", \"--branch\", BRANCH, REPO_URL, str(CLONE_ROOT)], check=True)\n",
        "\n",
        "os.chdir(WORKDIR)\n",
        "print(\"PolyGuard workdir:\", Path.cwd())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "!python -m pip install -U pip\n",
        "!python -m pip install -r requirements.txt\n",
        "# Optional acceleration path. If Unsloth install fails on the selected runtime, TRL still runs through transformers.\n",
        "!python -m pip install \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\" || true"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1) Hugging Face Authentication\n",
        "\n",
        "Required for pushing the Space and for private/gated model access. Public Qwen checkpoints may download without auth, but final deployment still needs an authenticated account."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from huggingface_hub import login\n",
        "\n",
        "if os.getenv(\"HF_TOKEN\"):\n",
        "    login(token=os.environ[\"HF_TOKEN\"])\n",
        "else:\n",
        "    from huggingface_hub import notebook_login\n",
        "    notebook_login()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2) Build Dataset And OpenEnv Assets"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "!python scripts/build_synthetic_patients.py\n",
        "!python scripts/ingest_open_drug_sources.py\n",
        "!python scripts/build_drug_knowledge.py\n",
        "!python scripts/build_retrieval_index.py\n",
        "!python scripts/build_scenarios.py\n",
        "!python scripts/bootstrap_data.py\n",
        "!python scripts/build_training_corpus.py --profile small --with-local --with-synthetic --with-hf"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "!python -m pytest tests/test_openenv_contract.py tests/test_reward_functions.py tests/test_anti_cheat.py -q\n",
        "!openenv validate ."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3) SFT Warm Start"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "MODEL_ID = os.getenv(\"POLYGUARD_MODEL_ID\", \"Qwen/Qwen2.5-1.5B-Instruct\")\n",
        "!python scripts/train_sft_trl.py --model-id \"$MODEL_ID\" --epochs 1 --max-steps 20 --batch-size 1 --use-unsloth"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 4) GRPO With Environment Rewards"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "!python scripts/train_grpo_trl.py --model-id \"$MODEL_ID\" --max-steps 20 --num-generations 2 --batch-size 1 --use-unsloth"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 5) Export, Validate Inference, Evaluate"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "!python scripts/merge_adapters_safe.py --adapter-dir checkpoints/sft_adapter --output-dir checkpoints/merged\n",
        "!python scripts/test_inference_postsave.py --samples 3\n",
        "!python scripts/evaluate_policy_ablations.py --episodes 8\n",
        "!python scripts/evaluate_baselines.py\n",
        "!python scripts/evaluate_all.py\n",
        "!python scripts/evaluate_compare_runs.py --baseline outputs/reports/baselines.json --candidate outputs/reports/benchmark_report.json --output outputs/reports/improvement_report.json"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "for rel in [\n",
        "    \"benchmark_report.json\",\n",
        "    \"baselines.json\",\n",
        "    \"grpo_ablation_report.json\",\n",
        "    \"grpo_trl_run.json\",\n",
        "    \"sft_trl_run.json\",\n",
        "    \"postsave_inference.json\",\n",
        "    \"improvement_report.json\",\n",
        "]:\n",
        "    src = Path(\"outputs/reports\") / rel\n",
        "    dst = Path(\"docs/results\") / rel\n",
        "    if src.exists():\n",
        "        dst.parent.mkdir(parents=True, exist_ok=True)\n",
        "        shutil.copy2(src, dst)\n",
        "\n",
        "for rel in [\"avg_reward.png\", \"policy_stack_avg_reward.png\", \"legality_rate.png\", \"success_rate.png\", \"avg_process_fidelity.png\"]:\n",
        "    src = Path(\"outputs/plots\") / rel\n",
        "    dst = Path(\"docs/results\") / rel\n",
        "    if src.exists():\n",
        "        shutil.copy2(src, dst)\n",
        "\n",
        "print(json.loads(Path(\"outputs/reports/improvement_report.json\").read_text()))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 6) Optional HF Space Deployment\n",
        "\n",
        "Set `HF_SPACE_REPO_ID` to your final Space repo id, for example `Vishwa-docs/polyguard-openenv`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "HF_SPACE_REPO_ID = os.getenv(\"HF_SPACE_REPO_ID\", \"Vishwa-docs/polyguard-openenv\")\n",
        "os.environ[\"HF_SPACE_REPO_ID\"] = HF_SPACE_REPO_ID\n",
        "!bash scripts/deploy_space.sh --repo-id \"$HF_SPACE_REPO_ID\"\n",
        "!hf spaces info \"$HF_SPACE_REPO_ID\" --format json > docs/results/hf_space_info.json\n",
        "space_url = f\"https://{HF_SPACE_REPO_ID.replace('/', '-')}.hf.space\"\n",
        "!openenv validate --url \"$space_url\" > docs/results/openenv_space_validate.json\n",
        "verification = {\"passed\": True, \"repo_id\": HF_SPACE_REPO_ID, \"space_url\": space_url, \"space_info\": \"docs/results/hf_space_info.json\", \"openenv_validation\": \"docs/results/openenv_space_validate.json\"}\n",
        "Path(\"docs/results/hf_space_verification.json\").write_text(json.dumps(verification, indent=2), encoding=\"utf-8\")\n",
        "verification"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 7) Final Strict Gate"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "os.environ[\"POLYGUARD_ENFORCE_SUBMISSION_LINKS\"] = \"true\"\n",
        "!python scripts/acceptance_gate.py"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.11"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}