Spaces:

sh4shv4t
/

Parlay

Paused

App Files Files Community

sh4shv4t commited on 12 days ago

Commit

00a2188

1 Parent(s): 1b1c2d9

Relocate training notebooks, add BLOG and Google Colab links (SFT + GRPO HF Job), dashboard updates, and eval artifacts

Browse files

Files changed (26) hide show

.gitignore +6 -1
.pre-commit-config.yaml +11 -0
parlay_hf_article.md → BLOG.md +6 -0
README.md +16 -4
README_SPACES.md +4 -0
dashboard/api.py +264 -1
dashboard/index.html +3 -1
dashboard/interact.html +639 -0
dashboard/judge.html +441 -0
dashboard/train_results.html +51 -41
images/Parlay_square logo.png +3 -0
images/grpo_loss_curve.png +3 -0
images/grpo_reward_curve.png +3 -0
images/training_curves.png +3 -0
main.py +18 -0
openenv.yaml +1 -1
requirements-dev.txt +2 -0
results/eval_results.json +6 -0
results/random_baseline.json +9 -0
scripts/check_staged_not_pycache.py +37 -0
scripts/push_dataset.py +1 -1
training/GRPO_HF_RUNBOOK.md +1 -1
{notebooks → training/notebooks}/parlay_grpo_colab.ipynb +0 -0
training/notebooks/parlay_grpo_hf_job_log_summary.ipynb +86 -0
training/notebooks/parlay_hf_only_eval_colab.ipynb +371 -0
{notebooks → training/notebooks}/parlay_sft_colab.ipynb +0 -0

.gitignore CHANGED Viewed

@@ -1,6 +1,9 @@
 # Python
 __pycache__/
 *.py[cod]
 *.pyo
 .Python
 build/
@@ -16,7 +19,9 @@ parlay.db
 telemetry.json
 data/
 models/
-results/
 # Environment
 .env

 # Python
 __pycache__/
+**/__pycache__/
 *.py[cod]
+*.pyc
+*.pyo
 *.pyo
 .Python
 build/
 telemetry.json
 data/
 models/
+results/*
+!results/eval_results.json
+!results/random_baseline.json
 # Environment
 .env

.pre-commit-config.yaml ADDED Viewed

	@@ -0,0 +1,11 @@

+# Optional: pip install pre-commit && pre-commit install
+repos:
+  - repo: local
+    hooks:
+      - id: no-pycache-in-commit
+        name: Forbid __pycache__ and .pyc in commits
+        entry: python scripts/check_staged_not_pycache.py
+        language: system
+        pass_filenames: false
+        always_run: true
+        stages: [pre-commit]

parlay_hf_article.md → BLOG.md RENAMED Viewed

@@ -1,5 +1,9 @@
 # ◈ Parlay — I Built an AI That Finally Beats Me at Negotiation
 *Teaching language models to close deals under hidden information, bluffing, and a world that doesn't stand still.*
 ---
@@ -130,6 +134,8 @@ No prior negotiation RL paper has this layer.
 ## Training Pipeline
 ```text
 Gemini self-play (generate_data.py)
     → 80 quality-filtered episodes across 9 persona×scenario combos

 # ◈ Parlay — I Built an AI That Finally Beats Me at Negotiation
+<p align="center">
+  <img src="images/Parlay_square%20logo.png" alt="Parlay logo" width="220">
+</p>
 *Teaching language models to close deals under hidden information, bluffing, and a world that doesn't stand still.*
 ---
 ## Training Pipeline
+**Google Colab:** [parlay_sft_colab](https://colab.research.google.com/drive/1x5uZMbdKF7XeDNm-bM5YSPdpd1srgArA?usp=sharing) · [parlay_grpo_hf_job](https://colab.research.google.com/drive/1DNYogmRlR_YJrEO6GN3YC7xj8lfycDuL?usp=sharing) (in-repo: [`training/notebooks/parlay_sft_colab.ipynb`](https://github.com/sh4shv4t/Parlay/blob/main/training/notebooks/parlay_sft_colab.ipynb), plus [`training/GRPO_HF_RUNBOOK.md`](https://github.com/sh4shv4t/Parlay/blob/main/training/GRPO_HF_RUNBOOK.md) / `scripts/hf_grpo_entry.sh` for the job-style GRPO run).
 ```text
 Gemini self-play (generate_data.py)
     → 80 quality-filtered episodes across 9 persona×scenario combos

README.md CHANGED Viewed

@@ -8,24 +8,27 @@ pinned: false
 tags: ["openenv", "hackathon", "rl", "gametheory"]
 ---
 # Parlay ◈ — The Arena Where AIs Learn to Close
 **[▶ Play Now — HuggingFace Space](https://huggingface.co/spaces/sh4shv4t/Parlay)** |
-[Blog Post](https://huggingface.co/blog/sh4shv4t/parlay) |
 [SFT Model](https://huggingface.co/sh4shv4t/parlay-sft-1-5b) |
 [GRPO Model](https://huggingface.co/sh4shv4t/parlay-grpo-1-5b) |
 [Dataset](https://huggingface.co/datasets/sh4shv4t/parlay-episodes) |
 [Training (HF / TRL pipeline)](training/notebooks/parlay_training.ipynb) |
 [OpenEnv reset/step rollouts](training/notebooks/openenv_rollout_training.ipynb) |
 [OpenEnv Manifest](openenv.yaml)
 ![Python 3.11](https://img.shields.io/badge/Python-3.11-blue)
 ![OpenEnv Compliant](https://img.shields.io/badge/OpenEnv-Compliant-00C853)
 ![MIT License](https://img.shields.io/badge/License-MIT-green)
 ![HF Spaces](https://img.shields.io/badge/HF%20Spaces-Ready-yellow)
-`Python 3.11` | `FastAPI` | `Gemini` | `GRPO` | `OpenEnv WebSocket`
 ---
 ## The Problem
@@ -299,6 +302,15 @@ python smoke_test.py              # 7 integration tests
 pytest tests/ -v
 ```
 ### Focused modules (optional)
 ```bash

 tags: ["openenv", "hackathon", "rl", "gametheory"]
 ---
+<p align="center">
+  <img src="images/Parlay_square%20logo.png" alt="Parlay logo" width="220">
+</p>
 # Parlay ◈ — The Arena Where AIs Learn to Close
 **[▶ Play Now — HuggingFace Space](https://huggingface.co/spaces/sh4shv4t/Parlay)** |
+[Blog Post](BLOG.md) |
 [SFT Model](https://huggingface.co/sh4shv4t/parlay-sft-1-5b) |
 [GRPO Model](https://huggingface.co/sh4shv4t/parlay-grpo-1-5b) |
 [Dataset](https://huggingface.co/datasets/sh4shv4t/parlay-episodes) |
 [Training (HF / TRL pipeline)](training/notebooks/parlay_training.ipynb) |
 [OpenEnv reset/step rollouts](training/notebooks/openenv_rollout_training.ipynb) |
+[SFT — Colab (`parlay_sft_colab`)](https://colab.research.google.com/drive/1x5uZMbdKF7XeDNm-bM5YSPdpd1srgArA?usp=sharing) |
+[GRPO HF Job — Colab (`parlay_grpo_hf_job`)](https://colab.research.google.com/drive/1DNYogmRlR_YJrEO6GN3YC7xj8lfycDuL?usp=sharing) |
 [OpenEnv Manifest](openenv.yaml)
 ![Python 3.11](https://img.shields.io/badge/Python-3.11-blue)
 ![OpenEnv Compliant](https://img.shields.io/badge/OpenEnv-Compliant-00C853)
 ![MIT License](https://img.shields.io/badge/License-MIT-green)
 ![HF Spaces](https://img.shields.io/badge/HF%20Spaces-Ready-yellow)
 ---
 ## The Problem
 pytest tests/ -v
 ```
+### Git hooks (optional)
+```bash
+pip install -r requirements-dev.txt
+pre-commit install
+```
+Staged `__pycache__` paths and `.pyc` / `.pyo` files are blocked by a local pre-commit check (see `scripts/check_staged_not_pycache.py`).
 ### Focused modules (optional)
 ```bash

README_SPACES.md CHANGED Viewed

@@ -9,6 +9,10 @@ pinned: true
 tags: ["openenv", "hackathon", "rl", "gametheory"]
 ---
 Parlay is a negotiation RL environment and playable browser game where agents bargain under hidden information.
 It combines Theory-of-Mind belief tracking, dynamic ZOPA erosion under sustained tension, and tactical negotiation moves.
 This Space exposes the live game UI and OpenEnv-style WebSocket flow so you can test policies interactively.

 tags: ["openenv", "hackathon", "rl", "gametheory"]
 ---
+<p align="center">
+  <img src="images/Parlay_square%20logo.png" alt="Parlay logo" width="200">
+</p>
 Parlay is a negotiation RL environment and playable browser game where agents bargain under hidden information.
 It combines Theory-of-Mind belief tracking, dynamic ZOPA erosion under sustained tension, and tactical negotiation moves.
 This Space exposes the live game UI and OpenEnv-style WebSocket flow so you can test policies interactively.

dashboard/api.py CHANGED Viewed

@@ -101,6 +101,20 @@ class SetOpponentRequest(BaseModel):
     model: str  # "trained" | "gemini"
 def _get_tension(state: ParlayState, player_move: Optional[TacticalMove], opponent_move: Optional[TacticalMove]) -> float:
     base = 20.0 + ((state.step_count + 1) / MAX_TURNS) * 55.0
     if player_move == TacticalMove.ANCHOR_HIGH or opponent_move == TacticalMove.ANCHOR_HIGH:
@@ -345,6 +359,23 @@ def _training_status_payload() -> dict[str, Any]:
                 rnd = float(rnd)
         except Exception:  # noqa: BLE001
             has_results = False
     repo = (os.environ.get("HF_MODEL_REPO") or "").strip() or None
     sft_loss_path: str | None = None
@@ -365,6 +396,14 @@ def _training_status_payload() -> dict[str, Any]:
     elif (_RESULTS_DIR / "grpo_loss_curve.png").is_file():
         grpo_loss_url = "/results/grpo_loss_curve.png"
     return {
         "has_results": has_results,
         "grpo_mean_reward": grpo,
@@ -375,10 +414,11 @@ def _training_status_payload() -> dict[str, Any]:
         "sft_loss_url": sft_loss_path,
         "grpo_reward_url": grpo_reward_url,
         "grpo_loss_url": grpo_loss_url,
         "plots_available": {
             "reward_curve": grpo_reward_url is not None,
             "grpo_loss": grpo_loss_url is not None,
-            "comparison": (_RESULTS_DIR / "training_curves.png").is_file(),
             "transcript": (_RESULTS_DIR / "before_after_transcript.html").is_file(),
             "sft_loss": sft_loss_path is not None,
         },
@@ -391,6 +431,229 @@ async def get_training_status() -> dict:
     return _training_status_payload()
 @router.post("/set-opponent")
 async def set_opponent(req: SetOpponentRequest) -> dict:
     """

     model: str  # "trained" | "gemini"
+class ModelChatMessage(BaseModel):
+    role: str  # "user" | "assistant"
+    text: str
+class ModelChatRequest(BaseModel):
+    message: str
+    scenario_id: str = "saas_enterprise"
+    persona: str = "shark"
+    history: list[ModelChatMessage] = []
+    temperature: float = 0.7
+    max_tokens: int = 300
 def _get_tension(state: ParlayState, player_move: Optional[TacticalMove], opponent_move: Optional[TacticalMove]) -> float:
     base = 20.0 + ((state.step_count + 1) / MAX_TURNS) * 55.0
     if player_move == TacticalMove.ANCHOR_HIGH or opponent_move == TacticalMove.ANCHOR_HIGH:
                 rnd = float(rnd)
         except Exception:  # noqa: BLE001
             has_results = False
+    if rnd is None and eval_path.is_file():
+        for baseline_path in (
+            _RESULTS_DIR / "random_baseline.json",
+            _RESULTS_DIR / "baseline.json",
+        ):
+            if not baseline_path.is_file():
+                continue
+            try:
+                blob = json.loads(baseline_path.read_text(encoding="utf-8"))
+                v = blob.get("mean_reward")
+                if v is None:
+                    v = blob.get("avg_reward")
+                if v is not None:
+                    rnd = float(v)
+                    break
+            except Exception:  # noqa: BLE001
+                continue
     repo = (os.environ.get("HF_MODEL_REPO") or "").strip() or None
     sft_loss_path: str | None = None
     elif (_RESULTS_DIR / "grpo_loss_curve.png").is_file():
         grpo_loss_url = "/results/grpo_loss_curve.png"
+    comparison_url: str | None = None
+    if (_RESULTS_DIR / "training_curves.png").is_file():
+        comparison_url = "/results/training_curves.png"
+    elif (_IMAGES_DIR / "training_curves.png").is_file():
+        comparison_url = "/images/training_curves.png"
+    elif (_IMAGES_DIR / "comparison.png").is_file():
+        comparison_url = "/images/comparison.png"
     return {
         "has_results": has_results,
         "grpo_mean_reward": grpo,
         "sft_loss_url": sft_loss_path,
         "grpo_reward_url": grpo_reward_url,
         "grpo_loss_url": grpo_loss_url,
+        "comparison_url": comparison_url,
         "plots_available": {
             "reward_curve": grpo_reward_url is not None,
             "grpo_loss": grpo_loss_url is not None,
+            "comparison": comparison_url is not None,
             "transcript": (_RESULTS_DIR / "before_after_transcript.html").is_file(),
             "sft_loss": sft_loss_path is not None,
         },
     return _training_status_payload()
+# Default model ID for docs / judge UI (see openenv.yaml grpo_model)
+GRPO_MODEL_REPO_DEFAULT = "sh4shv4t/parlay-grpo-1-5b"
+@router.get("/judge-config")
+async def get_judge_config() -> dict:
+    """
+    Status for the /judge page: whether Hub weights are configured and current opponent mode.
+    """
+    repo = (os.environ.get("HF_MODEL_REPO") or "").strip() or None
+    return {
+        "hf_model_configured": bool(repo),
+        "model_repo": repo,
+        "suggested_grpo_repo": GRPO_MODEL_REPO_DEFAULT,
+        "opponent_mode": OPPONENT_MODE,
+    }
+@router.get("/model/info")
+async def model_info() -> dict:
+    """
+    Status + metadata for the /interact page.
+    Reports whether the GRPO Hub model is reachable and what repo is configured.
+    """
+    repo = (os.environ.get("HF_MODEL_REPO") or "").strip() or None
+    fallback_repo = GRPO_MODEL_REPO_DEFAULT
+    hub_url = f"https://huggingface.co/{repo or fallback_repo}"
+    return {
+        "configured": bool(repo),
+        "model_repo": repo or fallback_repo,
+        "hub_url": hub_url,
+        "base_model": "Qwen/Qwen2.5-1.5B-Instruct",
+        "training": "GRPO (TRL) on Parlay negotiation self-play episodes",
+        "note": (
+            "Model outputs structured JSON: utterance, optional offer_amount, optional tactical_move."
+            if repo
+            else (
+                "HF_MODEL_REPO is not set — using the public fallback repo. "
+                "Set HF_MODEL_REPO in Space secrets to enable local GPU inference."
+            )
+        ),
+    }
+def _build_interact_system_prompt(scenario_id: str, persona: str) -> str:
+    """Lightweight system prompt for the /interact page (no live game state)."""
+    from agent.personas import PERSONAS
+    from parlay_env.models import PersonaType
+    try:
+        pt = PersonaType(persona)
+    except ValueError:
+        pt = PersonaType.SHARK
+    cfg = PERSONAS[pt]
+    sc = get_scenario(scenario_id)
+    mid = (sc.batna_seller + sc.batna_buyer) / 2
+    return (
+        f"You are {cfg.name} ({cfg.emoji}), an experienced negotiator.\n\n"
+        f"SCENARIO: {sc.title}\n"
+        f"{sc.description}\n"
+        f"The deal range is roughly {sc.batna_seller:,.0f}–{sc.batna_buyer:,.0f} {sc.currency}.\n"
+        f"You are negotiating from the opposing side, targeting around {mid:,.0f}.\n\n"
+        f"YOUR STYLE:\n{cfg.style}\n\n"
+        "RULES:\n"
+        "- Stay in character at all times.\n"
+        '- Respond ONLY with valid JSON: {"utterance": "...", "offer_amount": <number or null>, "tactical_move": <string or null>}\n'
+        "- Keep utterances under 100 words.\n"
+    )
+async def _run_hf_inference(
+    system_prompt: str,
+    history: list[ModelChatMessage],
+    message: str,
+    temperature: float,
+    max_tokens: int,
+) -> dict[str, Any]:
+    """Load the Hub model (via hf_opponent._sync_generate) and run inference."""
+    from agent.hf_opponent import _sync_generate, _get_lock, _build_prompt, _parse_json_block  # noqa: PLC0415
+    messages = []
+    for h in history:
+        role = "user" if h.role == "user" else "model"
+        messages.append({"role": role, "parts": [h.text]})
+    messages.append({"role": "user", "parts": [message]})
+    loop = asyncio.get_event_loop()
+    repo = (os.environ.get("HF_MODEL_REPO") or "").strip() or GRPO_MODEL_REPO_DEFAULT
+    os.environ.setdefault("HF_MODEL_REPO", repo)
+    result = await loop.run_in_executor(
+        None,
+        lambda: _sync_generate(system_prompt, messages, min(max_tokens, 512)),
+    )
+    return result
+async def _run_hf_api_inference(
+    system_prompt: str,
+    history: list[ModelChatMessage],
+    message: str,
+    temperature: float,
+    max_tokens: int,
+    repo: str,
+) -> dict[str, Any]:
+    """
+    Call the HF Inference API for the given repo.
+    Tries the new /v1/chat/completions endpoint first, then falls back to the
+    legacy text-generation endpoint.
+    """
+    import httpx  # noqa: PLC0415
+    from agent.hf_opponent import _parse_json_block  # noqa: PLC0415
+    token = os.environ.get("HF_TOKEN", "")
+    headers = {"Authorization": f"Bearer {token}"} if token else {}
+    # Build chat messages for /v1/chat/completions
+    chat_msgs = [{"role": "system", "content": system_prompt}]
+    for h in history:
+        chat_msgs.append({"role": h.role, "content": h.text})
+    chat_msgs.append({"role": "user", "content": message})
+    url = f"https://api-inference.huggingface.co/models/{repo}/v1/chat/completions"
+    payload = {
+        "model": repo,
+        "messages": chat_msgs,
+        "max_tokens": min(max_tokens, 512),
+        "temperature": temperature,
+    }
+    async with httpx.AsyncClient(timeout=120.0) as client:
+        resp = await client.post(url, json=payload, headers=headers)
+        if resp.status_code == 200:
+            data = resp.json()
+            raw = data["choices"][0]["message"]["content"]
+            return _parse_json_block(raw)
+        # Legacy text-generation endpoint
+        legacy_url = f"https://api-inference.huggingface.co/models/{repo}"
+        # Format as ChatML for Qwen
+        eot = "<|im_end|>"
+        prompt_parts = [f"<|im_start|>system\n{system_prompt}\n{eot}\n"]
+        for h in history:
+            r = "user" if h.role == "user" else "assistant"
+            prompt_parts.append(f"<|im_start|>{r}\n{h.text}\n{eot}\n")
+        prompt_parts.append(f"<|im_start|>user\n{message}\n{eot}\n")
+        prompt_parts.append(
+            "<|im_start|>assistant\n"
+            'Respond ONLY with valid JSON: {"utterance": "...", "offer_amount": <number or null>, "tactical_move": <string or null>}\n'
+        )
+        legacy_payload = {
+            "inputs": "".join(prompt_parts),
+            "parameters": {"max_new_tokens": min(max_tokens, 256), "temperature": temperature, "return_full_text": False},
+        }
+        resp2 = await client.post(legacy_url, json=legacy_payload, headers=headers)
+        resp2.raise_for_status()
+        data2 = resp2.json()
+        raw2 = data2[0]["generated_text"] if isinstance(data2, list) else str(data2)
+        return _parse_json_block(raw2)
+@router.post("/model/chat")
+async def model_chat(req: ModelChatRequest) -> dict:
+    """
+    Direct inference against the GRPO-finetuned negotiation model.
+    Used by the /interact page for free-form chat with the model.
+    Strategy:
+      1. If torch + model weights are loadable locally (GPU Space), load and run.
+      2. Otherwise hit the HF Inference API (works on CPU Spaces, may have cold-start).
+    """
+    try:
+        scenario = get_scenario(req.scenario_id)
+    except InvalidScenarioError:
+        raise HTTPException(status_code=400, detail=f"Unknown scenario: {req.scenario_id!r}")
+    valid_personas = {"shark", "diplomat", "veteran"}
+    if req.persona not in valid_personas:
+        raise HTTPException(status_code=400, detail=f"Unknown persona: {req.persona!r}")
+    system_prompt = _build_interact_system_prompt(req.scenario_id, req.persona)
+    repo = (os.environ.get("HF_MODEL_REPO") or "").strip() or GRPO_MODEL_REPO_DEFAULT
+    # Attempt 1 — local model (fast on GPU Spaces, slow on CPU)
+    try:
+        import torch  # noqa: PLC0415
+        result = await _run_hf_inference(
+            system_prompt, req.history, req.message, req.temperature, req.max_tokens
+        )
+        return {
+            "utterance": result.get("utterance", ""),
+            "offer_amount": result.get("offer_amount"),
+            "tactical_move": result.get("tactical_move"),
+            "backend": "local",
+            "model_repo": repo,
+        }
+    except Exception as local_exc:
+        logger.info("Local inference unavailable, trying HF API: %s", local_exc)
+    # Attempt 2 — HF Inference API (no GPU needed)
+    try:
+        result = await _run_hf_api_inference(
+            system_prompt, req.history, req.message, req.temperature, req.max_tokens, repo
+        )
+        return {
+            "utterance": result.get("utterance", ""),
+            "offer_amount": result.get("offer_amount"),
+            "tactical_move": result.get("tactical_move"),
+            "backend": "hf_api",
+            "model_repo": repo,
+        }
+    except Exception as api_exc:
+        logger.warning("HF API inference failed: %s", api_exc)
+        raise HTTPException(
+            status_code=503,
+            detail=(
+                f"Model inference failed on both local and HF API backends. "
+                f"Model: {repo}. Error: {api_exc}"
+            ),
+        )
 @router.post("/set-opponent")
 async def set_opponent(req: SetOpponentRequest) -> dict:
     """

dashboard/index.html CHANGED Viewed

@@ -84,7 +84,9 @@
   </div>
   <nav class="header-nav" aria-label="Site navigation">
-    <a href="/index.html" class="active">Game</a>
   </nav>
   <div class="header-actions">

   </div>
   <nav class="header-nav" aria-label="Site navigation">
+    <a href="/" class="active">Game</a>
+    <a href="/interact">Interact</a>
+    <a href="/judge">GRPO demo</a>
   </nav>
   <div class="header-actions">

dashboard/interact.html ADDED Viewed

	@@ -0,0 +1,639 @@

+<!DOCTYPE html>
+<html lang="en" data-theme="dark">
+<head>
+  <meta charset="UTF-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>Parlay — Talk to the Model</title>
+  <link rel="icon" type="image/svg+xml" href="/static/favicon/favicon.svg?v=1" />
+  <link rel="preconnect" href="https://fonts.googleapis.com" />
+  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
+  <link href="https://fonts.googleapis.com/css2?family=Playfair+Display:ital,wght@0,400;0,600;0,700;1,400&family=EB+Garamond:ital,wght@0,400;0,500&family=DM+Mono:wght@400;500&display=swap" rel="stylesheet" />
+  <style>
+    /* ── Design tokens (same as train_results.html) ────────────────────────── */
+    :root {
+      --felt:           #1c2b1a;
+      --felt-light:     #2a3d28;
+      --mahogany:       #2c1810;
+      --mahogany-light: #3d2518;
+      --cream:          #f5f0e8;
+      --gold:           #c9a84c;
+      --smoke:          #8a8070;
+      --ink:            #1a1208;
+      --scarlet:        #8b1a1a;
+      --emerald:        #1a5c2a;
+      --ivory:          #faf6ee;
+      --red-accent:     #c0392b;
+      --green-accent:   #27ae60;
+      --font-display:   "Playfair Display", Georgia, serif;
+      --font-body:      "EB Garamond", Georgia, serif;
+      --font-mono:      "DM Mono", "Courier New", monospace;
+    }
+    *, *::before, *::after { box-sizing: border-box; }
+    body {
+      margin: 0; min-height: 100vh;
+      font-family: var(--font-body); font-size: 1rem;
+      color: var(--cream); background: var(--felt);
+      background-image:
+        repeating-linear-gradient(45deg,
+          rgba(255,255,255,0.018) 0, rgba(255,255,255,0.018) 1px,
+          transparent 1px, transparent 6px);
+    }
+    /* ── Page shell ───────────────────────────────────────────────────────── */
+    .page { max-width: 820px; margin: 0 auto; padding: 2.5rem 1.25rem 5rem; }
+    a.back {
+      color: var(--smoke); text-decoration: none; font-size: 0.9rem;
+      display: inline-flex; align-items: center; gap: 4px; margin-bottom: 1.5rem;
+    }
+    a.back:hover { color: var(--gold); }
+    h1 {
+      font-family: var(--font-display); color: var(--gold);
+      font-size: 2rem; font-style: italic; font-weight: 600;
+      margin: 0 0 0.4rem 0;
+    }
+    .subtitle { color: var(--smoke); font-size: 1.05rem; margin: 0 0 0.9rem; }
+    h2 {
+      font-family: var(--font-display); color: var(--gold); font-size: 1.3rem;
+      font-weight: 600; border-bottom: 1px solid rgba(201,168,76,0.3);
+      padding-bottom: 0.3rem; margin: 0 0 0.9rem;
+    }
+    section { margin-top: 2rem; }
+    /* ── Status badge ─────────────────────────────────────────────────────── */
+    .badge {
+      display: inline-block; padding: 4px 12px; border-radius: 4px;
+      font-size: 0.75rem; font-family: var(--font-mono); letter-spacing: 0.06em;
+    }
+    .badge.ok   { background: #1a3d22; color: #a8d4b0; border: 1px solid var(--emerald); }
+    .badge.warn { background: #3a3018; color: #e8d49a; border: 1px solid var(--gold); }
+    .badge.err  { background: #3d1010; color: #e8a0a0; border: 1px solid var(--scarlet); }
+    /* ── Model info card ──────────────────────────────────────────────────── */
+    .model-info-card {
+      background: var(--mahogany); border: 1px solid rgba(201,168,76,0.4);
+      border-radius: 4px; padding: 1rem 1.25rem;
+      display: grid; grid-template-columns: 1fr 1fr; gap: 0.6rem 1.5rem;
+    }
+    @media (max-width: 560px) { .model-info-card { grid-template-columns: 1fr; } }
+    .info-row { display: flex; flex-direction: column; gap: 2px; }
+    .info-label {
+      font-family: var(--font-mono); font-size: 0.68rem;
+      text-transform: uppercase; letter-spacing: 0.1em; color: var(--smoke);
+    }
+    .info-val { font-size: 0.9rem; color: var(--cream); word-break: break-all; }
+    .info-val a { color: var(--gold); text-decoration: none; }
+    .info-val a:hover { text-decoration: underline; }
+    /* ── Context pickers ──────────────────────────────────────────────────── */
+    .context-row {
+      display: grid; grid-template-columns: 1fr 1fr; gap: 1rem;
+      margin-bottom: 1.2rem;
+    }
+    @media (max-width: 560px) { .context-row { grid-template-columns: 1fr; } }
+    .context-group { display: flex; flex-direction: column; gap: 0.35rem; }
+    .context-group label {
+      font-family: var(--font-mono); font-size: 0.7rem;
+      text-transform: uppercase; letter-spacing: 0.1em; color: var(--smoke);
+    }
+    select {
+      background: var(--mahogany-light); border: 1px solid rgba(201,168,76,0.35);
+      color: var(--cream); font-family: var(--font-body); font-size: 0.95rem;
+      padding: 8px 10px; border-radius: 3px; width: 100%; cursor: pointer;
+      appearance: none;
+      background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='8' viewBox='0 0 12 8'%3E%3Cpath fill='%23c9a84c' d='M1 1l5 5 5-5'/%3E%3C/svg%3E");
+      background-repeat: no-repeat; background-position: right 10px center;
+    }
+    select:focus { outline: 1px solid var(--gold); }
+    /* ── Chat window ──────────────────────────────────────────────────────── */
+    .chat-window {
+      background: var(--mahogany); border: 1px solid rgba(201,168,76,0.3);
+      border-radius: 4px; min-height: 340px; max-height: 480px;
+      overflow-y: auto; padding: 1rem; display: flex; flex-direction: column;
+      gap: 0.75rem; margin-bottom: 0.9rem;
+      scroll-behavior: smooth;
+    }
+    .chat-window:empty::after {
+      content: "Choose a scenario and persona above, then type a message to begin.";
+      color: var(--smoke); font-style: italic; font-size: 0.95rem;
+      margin: auto;
+    }
+    /* message bubbles */
+    .msg { display: flex; flex-direction: column; max-width: 82%; }
+    .msg.user  { align-self: flex-end; align-items: flex-end; }
+    .msg.model { align-self: flex-start; align-items: flex-start; }
+    .msg.system { align-self: center; align-items: center; max-width: 100%; }
+    .msg-role {
+      font-family: var(--font-mono); font-size: 0.65rem;
+      text-transform: uppercase; letter-spacing: 0.1em;
+      color: var(--smoke); margin-bottom: 2px;
+    }
+    .msg-body {
+      padding: 0.65rem 0.9rem; border-radius: 4px;
+      font-size: 0.97rem; line-height: 1.5;
+    }
+    .msg.user  .msg-body { background: #2d4030; border: 1px solid rgba(201,168,76,0.25); }
+    .msg.model .msg-body {
+      background: var(--mahogany-light); border: 1px solid rgba(201,168,76,0.4);
+    }
+    .msg.system .msg-body {
+      background: transparent; border: none;
+      color: var(--smoke); font-style: italic; font-size: 0.88rem; text-align: center;
+    }
+    /* offer chip inside model bubble */
+    .offer-chip {
+      display: inline-block; margin-top: 0.45rem;
+      background: rgba(201,168,76,0.15); border: 1px solid var(--gold);
+      color: var(--gold); font-family: var(--font-mono); font-size: 0.75rem;
+      padding: 2px 10px; border-radius: 3px;
+    }
+    .tactic-pill {
+      display: inline-block; margin-top: 0.35rem; margin-left: 0.4rem;
+      background: rgba(139,26,26,0.25); border: 1px solid var(--scarlet);
+      color: #e0a0a0; font-family: var(--font-mono); font-size: 0.68rem;
+      padding: 2px 8px; border-radius: 3px; text-transform: uppercase;
+    }
+    /* thinking pulse */
+    .thinking .msg-body { opacity: 0.65; }
+    .dot-pulse {
+      display: inline-flex; gap: 4px; align-items: center; padding: 2px 0;
+    }
+    .dot-pulse span {
+      width: 6px; height: 6px; border-radius: 50%; background: var(--gold);
+      animation: pulse 1.2s infinite ease-in-out;
+    }
+    .dot-pulse span:nth-child(2) { animation-delay: 0.2s; }
+    .dot-pulse span:nth-child(3) { animation-delay: 0.4s; }
+    @keyframes pulse { 0%,80%,100% { opacity: 0.2; transform: scale(0.8); } 40% { opacity: 1; transform: scale(1); } }
+    /* ── Input bar ────────────────────────────────────────────────────────── */
+    .input-bar { display: flex; gap: 0.65rem; }
+    .chat-input {
+      flex: 1; background: var(--mahogany-light);
+      border: 1px solid rgba(201,168,76,0.35);
+      color: var(--cream); font-family: var(--font-body); font-size: 1rem;
+      padding: 10px 14px; border-radius: 3px; resize: none; min-height: 44px;
+    }
+    .chat-input:focus { outline: 1px solid var(--gold); }
+    .chat-input::placeholder { color: var(--smoke); }
+    .btn-gold {
+      background: var(--gold); color: var(--ink);
+      border: none; padding: 10px 20px;
+      font-family: var(--font-mono); font-size: 0.85rem;
+      cursor: pointer; border-radius: 3px; font-weight: 500;
+      white-space: nowrap; align-self: flex-end;
+    }
+    .btn-gold:hover { filter: brightness(1.08); }
+    .btn-gold:disabled { opacity: 0.45; cursor: not-allowed; filter: none; }
+    .btn-ghost {
+      background: none; border: 1px solid rgba(201,168,76,0.35);
+      color: var(--smoke); padding: 8px 14px; border-radius: 3px;
+      font-family: var(--font-mono); font-size: 0.8rem; cursor: pointer;
+    }
+    .btn-ghost:hover { border-color: var(--gold); color: var(--gold); }
+    /* ── Footer toolbar under chat ────────────────────────────────────────── */
+    .chat-toolbar {
+      display: flex; justify-content: space-between; align-items: center;
+      margin-top: 0.55rem;
+    }
+    .char-hint { font-family: var(--font-mono); font-size: 0.7rem; color: var(--smoke); }
+    .backend-tag { font-family: var(--font-mono); font-size: 0.68rem; color: var(--smoke); }
+    .backend-tag span { color: var(--gold); }
+    /* ── Error box ────────────────────────────────────────────────────────── */
+    .error-box {
+      background: #3d1010; border: 1px solid var(--scarlet);
+      border-radius: 4px; padding: 0.75rem 1rem;
+      color: #e8a0a0; font-size: 0.9rem; display: none;
+    }
+    /* ── Explainer read-block (mirrors train_results) ─────────────────────── */
+    .read-block {
+      background: var(--mahogany); border: 1px solid rgba(201,168,76,0.28);
+      border-radius: 4px; padding: 1rem 1.15rem; margin-top: 0.8rem;
+      font-size: 0.95rem; line-height: 1.5; color: var(--cream);
+    }
+    .read-block h3 {
+      font-family: var(--font-display); color: var(--gold);
+      font-size: 1rem; margin: 0 0 0.4rem; font-style: italic;
+    }
+    .read-block p { margin: 0 0 0.6rem; }
+    .read-block p:last-child { margin-bottom: 0; }
+    .read-block code {
+      font-family: var(--font-mono); font-size: 0.82rem;
+      background: rgba(255,255,255,0.07); padding: 1px 5px; border-radius: 2px;
+    }
+    /* ── JSON viewer (collapsible raw output) ─────────────────────────────── */
+    details.raw-output {
+      background: var(--mahogany); border: 1px solid rgba(201,168,76,0.2);
+      border-radius: 4px; padding: 0 1rem; margin-top: 0.5rem;
+    }
+    details.raw-output summary {
+      cursor: pointer; color: var(--smoke); padding: 0.6rem 0;
+      font-family: var(--font-mono); font-size: 0.72rem; letter-spacing: 0.05em;
+      text-transform: uppercase;
+    }
+    details.raw-output pre {
+      font-family: var(--font-mono); font-size: 0.78rem; color: var(--cream);
+      background: var(--felt); padding: 0.75rem; border-radius: 3px;
+      overflow-x: auto; margin: 0 0 0.75rem;
+    }
+    /* ── Temperature control ──────────────────────────────────────────────── */
+    .param-row {
+      display: flex; align-items: center; gap: 0.75rem;
+      margin-top: 0.9rem;
+    }
+    .param-row label {
+      font-family: var(--font-mono); font-size: 0.7rem;
+      text-transform: uppercase; letter-spacing: 0.08em; color: var(--smoke);
+      white-space: nowrap;
+    }
+    input[type="range"] {
+      flex: 1; accent-color: var(--gold); cursor: pointer;
+    }
+    .param-val {
+      font-family: var(--font-mono); font-size: 0.82rem; color: var(--gold);
+      min-width: 28px; text-align: right;
+    }
+  </style>
+</head>
+<body>
+  <div class="page">
+    <p><a class="back" href="/">← Back to the Deal Room</a></p>
+    <header>
+      <h1>◈ Talk to the Model</h1>
+      <p class="subtitle">Direct inference with the GRPO-finetuned negotiator (Qwen2.5-1.5B)</p>
+      <p id="status-badge"></p>
+    </header>
+    <!-- Model info ──────────────────────────────────────────────────────── -->
+    <section>
+      <h2>Model</h2>
+      <div class="model-info-card" id="model-info-card">
+        <div class="info-row"><span class="info-label">Repo</span><span class="info-val" id="info-repo">—</span></div>
+        <div class="info-row"><span class="info-label">Base</span><span class="info-val" id="info-base">—</span></div>
+        <div class="info-row"><span class="info-label">Training</span><span class="info-val" id="info-training">—</span></div>
+        <div class="info-row"><span class="info-label">Output format</span><span class="info-val" id="info-note">—</span></div>
+      </div>
+    </section>
+    <!-- Chat interface ───────────────────────────────────────────────────── -->
+    <section>
+      <h2>Chat</h2>
+      <!-- Context pickers -->
+      <div class="context-row">
+        <div class="context-group">
+          <label for="sel-scenario">Scenario (gives the model context)</label>
+          <select id="sel-scenario">
+            <option value="saas_enterprise">Enterprise SaaS Contract — $125k–$165k ACV</option>
+            <option value="hiring_package">Senior Engineer Offer — $195k–$265k total comp</option>
+            <option value="acquisition_term_sheet">Startup Acquisition — $10.5M–$16M valuation</option>
+          </select>
+        </div>
+        <div class="context-group">
+          <label for="sel-persona">Persona (model negotiating style)</label>
+          <select id="sel-persona">
+            <option value="shark">🦈 The Shark — aggressive, anchors hard</option>
+            <option value="diplomat">🤝 The Diplomat — collaborative, reveals constraints</option>
+            <option value="veteran">🧓 The Veteran — strategic silence, k=2 ToM</option>
+          </select>
+        </div>
+      </div>
+      <!-- Temperature -->
+      <div class="param-row">
+        <label for="temp-slider">Temperature</label>
+        <input type="range" id="temp-slider" min="0.1" max="1.4" step="0.05" value="0.7" />
+        <span class="param-val" id="temp-val">0.7</span>
+        <button class="btn-ghost" id="btn-reset-chat" title="Start a new conversation">New chat</button>
+      </div>
+      <!-- Window -->
+      <div class="chat-window" id="chat-window" role="log" aria-live="polite" aria-label="Conversation with the model"></div>
+      <!-- Error -->
+      <div class="error-box" id="error-box"></div>
+      <!-- Input -->
+      <div class="input-bar">
+        <textarea
+          id="chat-input"
+          class="chat-input"
+          rows="1"
+          placeholder="Type your opening offer or message…"
+          aria-label="Your message"
+        ></textarea>
+        <button class="btn-gold" id="btn-send" type="button">Send</button>
+      </div>
+      <div class="chat-toolbar">
+        <span class="char-hint">Enter to send · Shift+Enter for new line</span>
+        <span class="backend-tag" id="backend-tag"></span>
+      </div>
+      <!-- Last raw output -->
+      <details class="raw-output" id="raw-details" style="display:none;">
+        <summary>Raw model JSON output</summary>
+        <pre id="raw-pre"></pre>
+      </details>
+    </section>
+    <!-- About the model ─────────────────────────────────────────────────── -->
+    <section>
+      <h2>About this model</h2>
+      <div class="read-block">
+        <h3>What it is</h3>
+        <p>
+          <strong>parlay-grpo-1-5b</strong> is a Qwen2.5-1.5B-Instruct model fine-tuned in two
+          stages: first with SFT on Gemini-generated negotiation transcripts, then with GRPO using
+          the Parlay reward function — a mix of ZOPA progress, Theory-of-Mind accuracy, tactical
+          card usage, and drift adaptation bonuses.
+        </p>
+        <h3>What it outputs</h3>
+        <p>
+          Every response is a JSON object with three fields:<br/>
+          <code>utterance</code> — the natural language negotiation turn,<br/>
+          <code>offer_amount</code> — a numeric bid (or <code>null</code> for conversational turns),<br/>
+          <code>tactical_move</code> — optional card played (<code>anchor_high</code>, <code>batna_reveal</code>, <code>silence</code>).
+        </p>
+        <h3>How to read the responses here</h3>
+        <p>
+          The <em>utterance</em> is displayed as the chat bubble. If the model includes an
+          <em>offer_amount</em>, it appears as a gold chip below the text. You can expand
+          "Raw model JSON output" to see the full structured response.
+        </p>
+        <h3>Backend</h3>
+        <p>
+          On a GPU Space the model runs locally (fast after the first load). On a CPU Space
+          inference falls back to the Hugging Face Inference API — the first request may take
+          20–40 s while the model warms up; subsequent requests are faster.
+        </p>
+      </div>
+    </section>
+  </div><!-- /.page -->
+  <script>
+    // ── State ──────────────────────────────────────────────────────────────
+    let history = [];
+    let isLoading = false;
+    let lastBackend = "";
+    // ── Init ───────────────────────────────────────────────────────────────
+    document.addEventListener("DOMContentLoaded", () => {
+      loadModelInfo();
+      const sendBtn  = document.getElementById("btn-send");
+      const inputEl  = document.getElementById("chat-input");
+      const tempSldr = document.getElementById("temp-slider");
+      const tempVal  = document.getElementById("temp-val");
+      const resetBtn = document.getElementById("btn-reset-chat");
+      sendBtn.addEventListener("click", sendMessage);
+      inputEl.addEventListener("keydown", (e) => {
+        if (e.key === "Enter" && !e.shiftKey) {
+          e.preventDefault();
+          sendMessage();
+        }
+        // auto-grow textarea
+        requestAnimationFrame(() => {
+          inputEl.style.height = "auto";
+          inputEl.style.height = Math.min(inputEl.scrollHeight, 140) + "px";
+        });
+      });
+      tempSldr.addEventListener("input", () => {
+        tempVal.textContent = parseFloat(tempSldr.value).toFixed(2);
+      });
+      resetBtn.addEventListener("click", resetChat);
+      // Changing scenario/persona resets the conversation
+      document.getElementById("sel-scenario").addEventListener("change", resetChat);
+      document.getElementById("sel-persona").addEventListener("change", resetChat);
+    });
+    // ── Load model info ────────────────────────────────────────────────────
+    async function loadModelInfo() {
+      const badge   = document.getElementById("status-badge");
+      try {
+        const res  = await fetch("/api/model/info");
+        const data = await res.json();
+        const repoEl = document.getElementById("info-repo");
+        if (data.hub_url && data.model_repo) {
+          repoEl.innerHTML = `<a href="${data.hub_url}" target="_blank" rel="noopener">${data.model_repo}</a>`;
+        }
+        document.getElementById("info-base").textContent = data.base_model || "—";
+        document.getElementById("info-training").textContent = data.training || "—";
+        document.getElementById("info-note").textContent    = data.note || "—";
+        if (data.configured) {
+          badge.innerHTML = '<span class="badge ok">Trained model configured — inference ready</span>';
+        } else {
+          badge.innerHTML = '<span class="badge warn">Using public Hub repo — set HF_MODEL_REPO secret for local GPU inference</span>';
+        }
+      } catch (_e) {
+        badge.innerHTML = '<span class="badge err">Could not reach server</span>';
+      }
+    }
+    // ── Reset chat ─────────────────────────────────────────────────────────
+    function resetChat() {
+      history = [];
+      const win = document.getElementById("chat-window");
+      win.innerHTML = "";
+      setError(null);
+      document.getElementById("raw-details").style.display = "none";
+      document.getElementById("backend-tag").textContent = "";
+      addSystemMsg("Conversation reset. Send a message to begin.");
+    }
+    // ── Send ───────────────────────────────────────────────────────────────
+    async function sendMessage() {
+      if (isLoading) return;
+      const inputEl   = document.getElementById("chat-input");
+      const text      = inputEl.value.trim();
+      if (!text) return;
+      const scenarioId = document.getElementById("sel-scenario").value;
+      const persona    = document.getElementById("sel-persona").value;
+      const temp       = parseFloat(document.getElementById("temp-slider").value);
+      // Render user bubble
+      addMsg("user", text, null, null);
+      history.push({ role: "user", text });
+      inputEl.value = "";
+      inputEl.style.height = "auto";
+      setError(null);
+      // Thinking indicator
+      const thinkId = addThinking();
+      setLoading(true);
+      try {
+        const res = await fetch("/api/model/chat", {
+          method: "POST",
+          headers: { "Content-Type": "application/json" },
+          body: JSON.stringify({
+            message: text,
+            scenario_id: scenarioId,
+            persona,
+            history: history.slice(0, -1),  // exclude the just-added user turn
+            temperature: temp,
+            max_tokens: 300,
+          }),
+        });
+        removeThinking(thinkId);
+        if (!res.ok) {
+          const err = await res.json().catch(() => ({}));
+          throw new Error(err.detail || `HTTP ${res.status}`);
+        }
+        const data = await res.json();
+        lastBackend = data.backend || "";
+        // Update backend tag
+        const backendTag = document.getElementById("backend-tag");
+        backendTag.innerHTML = `backend: <span>${lastBackend === "local" ? "local GPU" : "HF Inference API"}</span>`;
+        // Render model bubble
+        const utterance = data.utterance || "(no utterance)";
+        const offer     = data.offer_amount ?? null;
+        const tactic    = data.tactical_move || null;
+        addMsg("model", utterance, offer, tactic);
+        history.push({ role: "assistant", text: utterance });
+        // Show raw output
+        const rawDetails = document.getElementById("raw-details");
+        const rawPre     = document.getElementById("raw-pre");
+        rawPre.textContent = JSON.stringify({
+          utterance: data.utterance,
+          offer_amount: data.offer_amount,
+          tactical_move: data.tactical_move,
+        }, null, 2);
+        rawDetails.style.display = "block";
+      } catch (e) {
+        removeThinking(thinkId);
+        setError("Inference failed: " + e.message);
+        // remove last user turn from history so user can retry
+        history.pop();
+      } finally {
+        setLoading(false);
+      }
+    }
+    // ── DOM helpers ──────────────────────────────────��─────────────────────
+    function formatCurrency(v) {
+      if (v == null) return null;
+      const n = parseFloat(v);
+      if (isNaN(n)) return null;
+      if (n >= 1_000_000) return "$" + (n / 1_000_000).toFixed(2) + "M";
+      if (n >= 1_000)     return "$" + (n / 1_000).toFixed(0) + "k";
+      return "$" + n.toFixed(0);
+    }
+    function addMsg(role, utterance, offerAmount, tacticMove) {
+      const win = document.getElementById("chat-window");
+      // Remove placeholder text if present
+      const empty = win.querySelector(".empty-hint");
+      if (empty) empty.remove();
+      const wrap = document.createElement("div");
+      wrap.className = `msg ${role}`;
+      const roleLabel = document.createElement("div");
+      roleLabel.className = "msg-role";
+      roleLabel.textContent = role === "user" ? "You" : "Model";
+      wrap.appendChild(roleLabel);
+      const body = document.createElement("div");
+      body.className = "msg-body";
+      body.textContent = utterance;
+      if (offerAmount != null) {
+        const chip = document.createElement("div");
+        chip.className = "offer-chip";
+        chip.textContent = formatCurrency(offerAmount) || String(offerAmount);
+        body.appendChild(chip);
+      }
+      if (tacticMove) {
+        const pill = document.createElement("span");
+        pill.className = "tactic-pill";
+        const labels = { anchor_high: "⚓ anchor", batna_reveal: "🃏 BATNA reveal", silence: "🤫 silence" };
+        pill.textContent = labels[tacticMove] || tacticMove;
+        body.appendChild(pill);
+      }
+      wrap.appendChild(body);
+      win.appendChild(wrap);
+      win.scrollTop = win.scrollHeight;
+      return wrap;
+    }
+    function addSystemMsg(text) {
+      const win = document.getElementById("chat-window");
+      const wrap = document.createElement("div");
+      wrap.className = "msg system";
+      const body = document.createElement("div");
+      body.className = "msg-body";
+      body.textContent = text;
+      wrap.appendChild(body);
+      win.appendChild(wrap);
+    }
+    let _thinkingSeq = 0;
+    function addThinking() {
+      const id = "think-" + (++_thinkingSeq);
+      const win = document.getElementById("chat-window");
+      const wrap = document.createElement("div");
+      wrap.className = "msg model thinking";
+      wrap.id = id;
+      const roleLabel = document.createElement("div");
+      roleLabel.className = "msg-role";
+      roleLabel.textContent = "Model";
+      const body = document.createElement("div");
+      body.className = "msg-body";
+      body.innerHTML = '<div class="dot-pulse"><span></span><span></span><span></span></div>';
+      wrap.appendChild(roleLabel);
+      wrap.appendChild(body);
+      win.appendChild(wrap);
+      win.scrollTop = win.scrollHeight;
+      return id;
+    }
+    function removeThinking(id) {
+      document.getElementById(id)?.remove();
+    }
+    function setLoading(on) {
+      isLoading = on;
+      document.getElementById("btn-send").disabled = on;
+      document.getElementById("chat-input").disabled = on;
+    }
+    function setError(msg) {
+      const box = document.getElementById("error-box");
+      if (msg) {
+        box.textContent = msg;
+        box.style.display = "block";
+      } else {
+        box.style.display = "none";
+      }
+    }
+  </script>
+</body>
+</html>

dashboard/judge.html ADDED Viewed

	@@ -0,0 +1,441 @@

+<!DOCTYPE html>
+<html lang="en" data-theme="dark">
+<head>
+  <meta charset="UTF-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <meta name="description" content="Parlay — The Deal Room. An RL-powered negotiation arena." />
+  <title>Parlay — The Deal Room</title>
+  <link rel="icon" type="image/svg+xml" href="/static/favicon/favicon.svg?v=1" />
+  <link rel="icon" type="image/x-icon" href="/favicon.ico" />
+  <script src="https://cdnjs.cloudflare.com/ajax/libs/three.js/r128/three.min.js"
+          crossorigin="anonymous" referrerpolicy="no-referrer"></script>
+  <script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.js"
+          crossorigin="anonymous" referrerpolicy="no-referrer"></script>
+  <link rel="stylesheet" href="/static/style.css?v=5" />
+</head>
+<body>
+<!-- DEMO BANNER -->
+<div id="demo-banner" class="demo-banner hidden" role="status">
+  Demo mode — AI responses are simulated · Add GOOGLE_API_KEY to .env for real gameplay
+  <button class="demo-banner-dismiss" type="button" id="btn-dismiss-demo" aria-label="Dismiss">✕</button>
+</div>
+<!-- LOADING OVERLAY -->
+<div id="loading-overlay" class="loading-overlay hidden" role="status" aria-live="polite">
+  <div class="loading-card">
+    <div class="spinner" aria-hidden="true"></div>
+    <p class="loading-text">The room is thinking…</p>
+  </div>
+</div>
+<!-- ONBOARDING — STEP 1 -->
+<div id="onboarding-step-1" class="onboarding-overlay start-active" role="dialog" aria-modal="true" aria-label="Enter your name">
+  <div class="onboarding-card">
+    <div class="onboarding-step-num">Step 1 of 3</div>
+    <h1 class="onboarding-headline">Who's at<br>the table?</h1>
+    <p class="onboarding-sub">Every deal starts with a name on the door.</p>
+    <input type="text" id="step1-name" class="onboarding-name-input"
+      placeholder="Your name…" maxlength="40" autocomplete="off" autofocus />
+    <div id="step1-error" class="onboarding-error"></div>
+    <div class="onboarding-footer">
+      <button id="step1-continue" class="btn btn-primary" type="button">Continue &rarr;</button>
+    </div>
+  </div>
+</div>
+<!-- ONBOARDING — STEP 2 -->
+<div id="onboarding-step-2" class="onboarding-overlay" role="dialog" aria-modal="true" aria-label="Choose a scenario" inert>
+  <div class="onboarding-card wide">
+    <div class="onboarding-step-num">Step 2 of 3</div>
+    <h1 class="onboarding-headline">Choose your deal</h1>
+    <p class="onboarding-sub">Select a case from the dossier.</p>
+    <div id="scenario-dossier-grid" class="scenario-dossier-grid" role="radiogroup" aria-label="Negotiation scenarios"></div>
+    <div id="step2-error" class="onboarding-error"></div>
+    <div class="onboarding-footer">
+      <button id="step2-back" class="btn btn-ghost" type="button">&larr; Back</button>
+      <button id="step2-continue" class="btn btn-primary" type="button">Continue &rarr;</button>
+    </div>
+  </div>
+</div>
+<!-- ONBOARDING — STEP 3 -->
+<div id="onboarding-step-3" class="onboarding-overlay" role="dialog" aria-modal="true" aria-label="Choose your opponent" inert>
+  <div class="onboarding-card wide">
+    <div class="onboarding-step-num">Step 3 of 3</div>
+    <h1 class="onboarding-headline">Choose your opponent</h1>
+    <p class="onboarding-sub">Study the faces across the table.</p>
+    <div id="persona-cards-grid" class="persona-cards-grid" role="radiogroup" aria-label="Negotiator personas"></div>
+    <div id="step3-error" class="onboarding-error"></div>
+    <div class="onboarding-footer">
+      <button id="step3-back" class="btn btn-ghost" type="button">&larr; Back</button>
+      <button id="step3-start" class="btn btn-primary" type="button">Enter the Room &rarr;</button>
+    </div>
+  </div>
+</div>
+<!-- APP HEADER — item 19 topbar polish -->
+<header class="app-header" role="banner">
+  <div class="header-brand" aria-label="Parlay">
+    <span class="brand-par">par</span><span class="brand-gem">◈</span><span class="brand-lay">lay</span>
+  </div>
+  <nav class="header-nav" aria-label="Site navigation">
+    <a href="/">Game</a>
+    <a href="/interact">Interact</a>
+    <a href="/judge" class="active">GRPO demo</a>
+  </nav>
+  <div class="header-actions">
+    <a href="/train" target="_blank" rel="noopener" style="color: var(--smoke); font-size: 0.8125rem; text-decoration: none; letter-spacing: 0.06em;">Training Results &rarr;</a>
+    <p id="global-error" class="hidden text-red text-sm" role="alert"></p>
+    <!-- Theme toggle — item 27 -->
+    <button id="theme-toggle" class="dark-toggle" type="button"
+            aria-label="Toggle display mode" title="Toggle light/dark">●</button>
+  </div>
+</header>
+<!-- 3-COLUMN BODY -->
+<main class="app-body" role="main">
+  <!-- LEFT COLUMN -->
+  <aside class="col-left" aria-label="Player info">
+    <!-- Player Card -->
+    <section class="player-card panel" aria-label="Player card">
+      <div class="player-card-header">
+        <div id="player-avatar" class="player-avatar" aria-hidden="true">P</div>
+        <div class="player-info">
+          <div id="player-name-display" class="player-name">Player</div>
+          <div class="player-rank">#— Unranked</div>
+        </div>
+      </div>
+      <div class="cp-section">
+        <div class="cp-label-row">
+          <span class="cp-label">
+            <span class="gloss-wrap">Credibility Points
+              <span class="gloss-icon" aria-label="What are Credibility Points?">ⓘ</span>
+              <span class="gloss-tip" role="tooltip">Your tactical budget. Spend them to play power moves. Regenerates each turn.</span>
+            </span>
+          </span>
+          <span id="cp-value" class="cp-value">100 / 100</span>
+        </div>
+        <div class="cp-track" role="progressbar" aria-label="Credibility Points" aria-valuemin="0" aria-valuemax="100">
+          <div id="cp-fill" class="cp-fill" style="width:100%;"></div>
+        </div>
+      </div>
+    </section>
+  </aside>
+  <!-- CENTER COLUMN -->
+  <section class="col-center" aria-label="Negotiation arena">
+    <!-- Scenario Header -->
+    <div class="scenario-header" id="scenario-header">
+      <div>
+        <div id="scenario-title" class="scenario-title">Waiting for game…</div>
+        <div id="scenario-meta" class="scenario-meta text-muted">Select a scenario to begin</div>
+        <div id="session-id-label" class="scenario-meta text-muted">Session: —</div>
+      </div>
+    </div>
+    <!-- Drift Alert -->
+    <div id="drift-alert" class="drift-alert hidden" role="alert" aria-live="assertive">
+      <span class="drift-alert-icon" aria-hidden="true">⚠️</span>
+      <span id="drift-alert-text" class="drift-alert-text">Market conditions have shifted.</span>
+      <button id="btn-dismiss-drift" class="drift-dismiss" type="button" aria-label="Dismiss drift alert">✕</button>
+    </div>
+    <!-- Chat Thread with briefing overlay — items 7 & 15 -->
+    <div class="chat-thread-wrap" style="position:relative;">
+      <!-- Deal Briefing Panel — item 15 -->
+      <div id="briefing-overlay" class="briefing-overlay" aria-label="Deal briefing" style="display:none;">
+        <div class="briefing-card">
+          <div class="briefing-case-num" id="briefing-case-num">CASE FILE #SaaS-001</div>
+          <div class="briefing-title" id="briefing-title">Enterprise SaaS Contract</div>
+          <div class="briefing-section">
+            <div class="briefing-section-label">Your Goal</div>
+            <div class="briefing-section-text" id="briefing-your-goal">
+              Close the deal above $125,000. Your ideal: $165,000.
+            </div>
+          </div>
+          <div class="briefing-section">
+            <div class="briefing-section-label">Their Goal</div>
+            <div class="briefing-section-text" id="briefing-their-goal">
+              Pay as little as possible. They'll push hard on price.
+            </div>
+          </div>
+          <div class="briefing-range" id="briefing-range">
+            A deal is possible between $125k and $165k.
+          </div>
+          <button class="briefing-begin" id="btn-briefing-begin" type="button">
+            Begin Negotiation &rarr;
+          </button>
+        </div>
+      </div>
+      <div id="chat-thread" class="chat-thread" role="log" aria-live="polite" aria-label="Negotiation conversation">
+        <!-- Initial system message — item 7: plain italic text, no highlight -->
+        <div class="system-msg">Step through the door to begin negotiating.</div>
+      </div>
+    </div>
+    <!-- Result Banner -->
+    <div id="result-banner" class="result-banner hidden" aria-live="polite">
+      <div class="result-title">—</div>
+      <div class="result-amount">—</div>
+      <div class="result-score">Score: —</div>
+    </div>
+    <!-- Input Area — item 14 (simplified) -->
+    <div class="input-area" role="form" aria-label="Your move">
+      <div id="tactical-buttons" class="tactical-bar">
+        <button class="tactic-btn" data-card="anchor_high" data-cost="0" type="button">
+          ⚓ Anchor High <span class="cp-cost">0 CP</span>
+        </button>
+        <button class="tactic-btn" data-card="batna_reveal" data-cost="20" type="button">
+          🃏 BATNA Reveal <span class="cp-cost">20 CP</span>
+        </button>
+        <button class="tactic-btn" data-card="silence" data-cost="5" type="button">
+          🤫 Silence <span class="cp-cost">5 CP</span>
+        </button>
+      </div>
+      <!-- Main text input -->
+      <div class="input-main-row">
+        <label class="sr-only" for="offer-input">Your message or offer</label>
+        <input
+          type="text"
+          id="offer-input"
+          class="offer-input"
+          placeholder="Type your message or make an offer…"
+          aria-label="Type your message or offer amount"
+          disabled
+        />
+        <button id="btn-submit" class="btn btn-primary" type="button" disabled>Send</button>
+      </div>
+      <!-- Inline offer stepper (appears when Make Offer clicked) -->
+      <div id="offer-stepper" class="offer-stepper" aria-label="Offer amount stepper">
+        <button class="stepper-btn" id="stepper-down" type="button" aria-label="Decrease offer">−</button>
+        <span class="stepper-value" id="stepper-value">$145,000</span>
+        <button class="stepper-btn" id="stepper-up" type="button" aria-label="Increase offer">+</button>
+        <button class="btn btn-sm btn-primary" id="stepper-use" type="button">Use</button>
+        <button class="btn btn-sm btn-ghost" id="stepper-cancel" type="button">✕</button>
+      </div>
+      <!-- Quick action chips -->
+      <div class="quick-actions">
+        <button class="quick-chip offer-chip-btn" id="chip-offer" type="button" disabled>
+          Make Offer ▼
+        </button>
+        <button class="quick-chip accept-chip" id="chip-accept" type="button" disabled>
+          Accept Deal ✓
+        </button>
+        <button class="quick-chip walk-chip" id="chip-walk" type="button" disabled>
+          Walk Away ✕
+        </button>
+      </div>
+    </div>
+    <!-- ZOPA Bar — item 10 -->
+    <section class="zopa-section" aria-label="Zone of Possible Agreement">
+      <div class="panel-header">
+        <span class="panel-title">
+          <span class="gloss-wrap">ZOPA
+            <span class="gloss-icon" aria-label="What is ZOPA?">ⓘ</span>
+            <span class="gloss-tip" role="tooltip">The price range where a deal is possible — between both parties' minimum acceptable prices</span>
+          </span>
+        </span>
+        <span class="text-xs text-muted">Zone of Possible Agreement</span>
+      </div>
+      <div id="zopa-track" class="zopa-track-outer" role="img" aria-label="ZOPA visual range">
+        <div id="zopa-zone" class="zopa-zone"></div>
+        <div id="marker-player" class="zopa-marker marker-player" style="left:20%;">
+          <div class="zopa-marker-triangle"></div>
+          <div class="zopa-marker-line"></div>
+          <span class="zopa-label">Your floor</span>
+        </div>
+        <div id="marker-opponent" class="zopa-marker marker-opponent" style="left:80%;">
+          <div class="zopa-marker-triangle"></div>
+          <div class="zopa-marker-line"></div>
+          <span class="zopa-label">Their floor</span>
+        </div>
+        <div id="marker-current" class="zopa-marker marker-current" style="left:50%; display:none;">
+          <div class="zopa-marker-triangle"></div>
+          <div class="zopa-marker-line"></div>
+          <span class="zopa-label">Offer</span>
+        </div>
+        <!-- Nash diamond with label — always visible -->
+        <div id="nash-marker" style="position:absolute; top:0; bottom:0; left:50%; transform:translateX(-50%); pointer-events:none; display:none;">
+          <div id="nash-diamond" class="nash-diamond" style="position:absolute; top:50%; transform:translate(-50%,-50%);"></div>
+          <span class="nash-label">Fair deal</span>
+        </div>
+      </div>
+      <div class="zopa-labels-row">
+        <span id="zopa-label-low">$0</span>
+        <span class="gloss-wrap text-amber text-xs">◆ Nash Point
+          <span class="gloss-icon" aria-label="What is the Nash Point?">ⓘ</span>
+          <span class="gloss-tip" role="tooltip">The mathematically fair deal price, where both sides gain equally</span>
+        </span>
+        <span id="zopa-label-high">$100K</span>
+      </div>
+      <div id="zopa-width-indicator" class="zopa-width-indicator">Deal zone: 100%</div>
+    </section>
+    <!-- Tension Meter — item 12 -->
+    <div class="tension-section" aria-label="Tension meter">
+      <span class="tension-label">
+        <span class="gloss-wrap">Tension
+          <span class="gloss-icon" aria-label="What is Tension?">ⓘ</span>
+          <span class="gloss-tip" role="tooltip">How heated the negotiation is. High tension = opponent may make mistakes or walk away.</span>
+        </span>
+      </span>
+      <div class="tension-track" role="progressbar" aria-label="Negotiation tension" aria-valuemin="0" aria-valuemax="100">
+        <div id="tension-fill" class="tension-fill" data-level="low" style="width:0%;"></div>
+      </div>
+      <span id="tension-value" class="tension-value">0%</span>
+      <span id="tension-descriptor" class="tension-descriptor">· Calm</span>
+    </div>
+  </section>
+  <!-- RIGHT COLUMN -->
+  <aside class="col-right" aria-label="Opponent and analytics">
+    <!-- Three.js Character Canvas — item 28 (280×380) -->
+    <section class="panel" aria-label="Opponent character">
+      <div class="panel-header">
+        <span class="panel-title">Opponent</span>
+        <span id="character-state-label" class="stat-chip blue">idle</span>
+      </div>
+      <div class="character-canvas-wrap">
+        <canvas id="character-canvas" width="280" height="380" aria-label="3D negotiator character"></canvas>
+        <div class="character-state-badge">idle</div>
+      </div>
+      <!-- Persona name plate — item 32 -->
+      <div id="persona-nameplate" class="persona-nameplate">
+        <span id="nameplate-symbol" class="nameplate-symbol" style="color: var(--gold);">◈</span>
+        <div class="nameplate-text">
+          <div id="nameplate-name" class="nameplate-name">—</div>
+          <div id="nameplate-tag" class="nameplate-tag">Choose a persona</div>
+        </div>
+      </div>
+    </section>
+    <!-- ToM Belief State — item 11 -->
+    <section class="panel tom-section" aria-label="Theory of Mind belief state">
+      <div class="panel-header">
+        <span class="panel-title">
+          <span class="gloss-wrap">ToM Belief State
+            <span class="gloss-icon" aria-label="What is ToM?">ⓘ</span>
+            <span class="gloss-tip" role="tooltip">Theory of Mind — what the AI thinks it knows about you</span>
+          </span>
+        </span>
+      </div>
+      <div class="tom-beliefs">
+        <div class="belief-row">
+          <span class="belief-label">Cooperative</span>
+          <div class="belief-track" role="progressbar" aria-label="Cooperative belief">
+            <div id="belief-cooperative-fill" class="belief-fill cooperative" style="width:50%;"></div>
+          </div>
+          <span id="belief-cooperative-pct" class="belief-pct">50%</span>
+          <div id="belief-cooperative-conf" class="belief-confidence confidence-medium"></div>
+        </div>
+        <div class="belief-row">
+          <span class="belief-label">Competitive</span>
+          <div class="belief-track" role="progressbar" aria-label="Competitive belief">
+            <div id="belief-competitive-fill" class="belief-fill competitive" style="width:50%;"></div>
+          </div>
+          <span id="belief-competitive-pct" class="belief-pct">50%</span>
+          <div id="belief-competitive-conf" class="belief-confidence confidence-medium"></div>
+        </div>
+        <div class="belief-row">
+          <span class="belief-label">Reservation</span>
+          <div class="belief-track" role="progressbar" aria-label="Reservation sensitivity">
+            <div id="belief-reservation-fill" class="belief-fill reservation" style="width:76%;"></div>
+          </div>
+          <span id="belief-reservation-pct" class="belief-pct">76%</span>
+          <div id="belief-reservation-conf" class="belief-confidence confidence-high"></div>
+        </div>
+        <div class="belief-row">
+          <span class="belief-label">Flexibility</span>
+          <div class="belief-track" role="progressbar" aria-label="Flexibility belief">
+            <div id="belief-flexibility-fill" class="belief-fill flexibility" style="width:50%;"></div>
+          </div>
+          <span id="belief-flexibility-pct" class="belief-pct">50%</span>
+          <div id="belief-flexibility-conf" class="belief-confidence confidence-medium"></div>
+        </div>
+      </div>
+      <div class="mt-4 sparkline-wrap" style="height:80px;">
+        <canvas id="belief-chart" class="sparkline-canvas" aria-label="Belief confidence over time"></canvas>
+      </div>
+    </section>
+    <!-- Offer History Sparkline -->
+    <section class="panel" aria-label="Offer history">
+      <div class="panel-header"><span class="panel-title">Offer History</span></div>
+      <div class="sparkline-wrap" style="height:110px;">
+        <canvas id="offer-sparkline" class="sparkline-canvas" aria-label="Offer history sparkline"></canvas>
+      </div>
+      <div class="sparkline-labels">
+        <span id="sparkline-lo" class="sparkline-label">—</span>
+        <span class="gloss-wrap text-amber">◆ Nash
+          <span class="gloss-icon" aria-label="What is BATNA?">ⓘ</span>
+          <span class="gloss-tip" role="tooltip">Best Alternative To a Negotiated Agreement — your walk-away point. Never accept below this.</span>
+        </span>
+        <span id="sparkline-hi" class="sparkline-label">—</span>
+      </div>
+    </section>
+    <!-- Leaderboard -->
+    <section class="panel" aria-label="Leaderboard">
+      <div class="panel-header">
+        <span class="panel-title">Top 5</span>
+        <button class="btn btn-ghost btn-sm" type="button"
+                onclick="loadLeaderboard()" aria-label="Refresh leaderboard" title="Refresh">↻</button>
+      </div>
+      <table class="leaderboard-table" role="table" aria-label="Top players leaderboard">
+        <thead>
+          <tr>
+            <th scope="col">#</th>
+            <th scope="col">Player</th>
+            <th class="num" scope="col">Score</th>
+            <th class="num" scope="col">Deals</th>
+          </tr>
+        </thead>
+        <tbody id="leaderboard-body">
+          <tr><td colspan="4" class="empty-state text-muted">No games yet</td></tr>
+        </tbody>
+      </table>
+    </section>
+  </aside>
+</main>
+<script src="/static/character.js?v=5"></script>
+<script src="/static/chart.js?v=5"></script>
+<script src="/static/app.js?v=5"></script>
+</body>
+</html>

dashboard/train_results.html CHANGED Viewed

@@ -77,15 +77,6 @@
     }
     .fig-placeholder a { color: var(--gold); }
     .caption { font-size: 0.9rem; color: var(--smoke); margin-top: 0.5rem; }
-    .transcript-box {
-      max-height: 500px; overflow-y: auto;
-      background: var(--mahogany);
-      border: 1px solid var(--gold);
-      padding: 1rem; border-radius: 4px;
-    }
-    .transcript-box::-webkit-scrollbar { width: 8px; }
-    .transcript-box::-webkit-scrollbar-track { background: var(--felt); }
-    .transcript-box::-webkit-scrollbar-thumb { background: #4a3d28; border-radius: 4px; }
     .hub-card {
       border: 2px solid var(--gold); background: var(--mahogany);
       padding: 1.25rem; border-radius: 4px;
@@ -105,11 +96,28 @@
       background: var(--felt); padding: 1rem; border-radius: 3px; overflow-x: auto; }
     a.back { color: var(--smoke); text-decoration: none; font-size: 0.9rem; }
     a.back:hover { color: var(--gold); }
   </style>
 </head>
 <body>
   <div class="page">
-    <p><a class="back" href="/">← Back to the Deal Room</a></p>
     <header>
       <h1>◈ Training Results</h1>
@@ -119,13 +127,14 @@
     <section>
       <h2>Key Numbers</h2>
       <div class="card-row" id="key-numbers">
         <div class="num-card">
-          <h3>Random Baseline</h3>
           <div class="val scar" id="k-random">—</div>
         </div>
         <div class="num-card">
-          <h3>Base Model</h3>
           <div class="val smoke" id="k-base">—</div>
         </div>
         <div class="num-card">
@@ -161,8 +170,19 @@
     </section>
     <section>
-      <h2>What Changed: Base Model vs Trained Agent</h2>
-      <div id="transcript-container"></div>
     </section>
     <section>
@@ -192,9 +212,9 @@
     function renderStatus(data) {
       const badge = document.getElementById("status-badge");
       if (data.model_on_hub) {
-        badge.innerHTML = '<span class="badge ok">Model on Hub</span>';
       } else {
-        badge.innerHTML = '<span class="badge wait">Training not yet run</span>';
       }
     }
     function renderKeyNumbers(data) {
@@ -246,38 +266,28 @@
         cap.hidden = true;
       }
     }
-    function compareSection(pa) {
       const el = document.getElementById("fig-compare");
       const cap = document.getElementById("cap-compare");
-      if (pa.comparison) {
-        el.innerHTML = '<img src="/results/training_curves.png" alt="Comparison" style="width:100%;border:1px solid var(--gold);border-radius:2px" />';
         cap.hidden = false;
       } else {
-        el.innerHTML = '<div class="fig-placeholder">Four-bar chart will appear after evaluation.</div>';
         cap.hidden = true;
       }
     }
-    async function transcriptSection(pa) {
-      const c = document.getElementById("transcript-container");
-      if (pa.transcript) {
-        try {
-          const res = await fetch("/results/before_after_transcript.html", { cache: "no-cache" });
-          const html = await res.text();
-          c.innerHTML = '<div class="transcript-box">' + html + '</div>';
-        } catch (e) {
-          c.innerHTML = '<div class="fig-placeholder">Could not load transcript.</div>';
-        }
-      } else {
-        c.innerHTML = '<div class="fig-placeholder">Transcript comparison will appear after evaluation run.</div>';
-      }
-    }
     function hubSection(data) {
       const h = document.getElementById("hub-block");
       if (data.model_on_hub) {
         h.innerHTML =
           '<div class="hub-card">' +
-          '<p><strong>Trained model available on Hugging Face Hub</strong></p>' +
-          '<p><a href="https://huggingface.co/sh4shv4t/parlay-negotiator" target="_blank" rel="noopener">→ sh4shv4t/parlay-negotiator</a></p>' +
           '<button type="button" class="btn-gold" id="btn-try-trained">→ Play against the trained model</button>' +
           '</div>';
         document.getElementById("btn-try-trained").addEventListener("click", async () => {
@@ -293,9 +303,10 @@
         });
       } else {
         h.innerHTML =
-          '<div class="hub-card muted">' +
-          '<p>Model will be pushed to Hub after training completes.</p>' +
-          '<p style="color:var(--smoke)">huggingface.co/sh4shv4t/parlay-negotiator</p>' +
           '</div>';
       }
     }
@@ -311,8 +322,7 @@
         sftSection(data.sft_loss_url);
         rewardSection(data.grpo_reward_url);
         grpoLossSection(data.grpo_loss_url);
-        compareSection(data.plots_available);
-        await transcriptSection(data.plots_available);
         hubSection(data);
       } catch (e) {
         document.getElementById("status-badge").innerHTML = '<span class="badge wait">Could not load status</span>';

     }
     .fig-placeholder a { color: var(--gold); }
     .caption { font-size: 0.9rem; color: var(--smoke); margin-top: 0.5rem; }
     .hub-card {
       border: 2px solid var(--gold); background: var(--mahogany);
       padding: 1.25rem; border-radius: 4px;
       background: var(--felt); padding: 1rem; border-radius: 3px; overflow-x: auto; }
     a.back { color: var(--smoke); text-decoration: none; font-size: 0.9rem; }
     a.back:hover { color: var(--gold); }
+    .read-block {
+      background: var(--mahogany);
+      border: 1px solid rgba(201, 168, 76, 0.35);
+      border-radius: 4px; padding: 1rem 1.15rem; margin-top: 0.9rem;
+      font-size: 0.95rem; line-height: 1.45; color: var(--cream);
+    }
+    .read-block h3 {
+      font-family: var(--font-display); color: var(--gold); font-size: 1.02rem; margin: 0 0 0.4rem 0; font-style: italic;
+    }
+    .read-block p { margin: 0 0 0.65rem 0; }
+    .read-block p:last-child { margin-bottom: 0; }
   </style>
 </head>
 <body>
   <div class="page">
+    <p>
+      <a class="back" href="/">← Deal Room</a>
+      &ensp;·&ensp;
+      <a class="back" href="/interact">Talk to the Model</a>
+      &ensp;·&ensp;
+      <a class="back" href="/judge">GRPO demo</a>
+    </p>
     <header>
       <h1>◈ Training Results</h1>
     <section>
       <h2>Key Numbers</h2>
+      <p class="caption" style="margin:0 0 0.75rem 0">Mean episode reward for the same eval protocol: random play, the frozen base (instruction) model, and your GRPO model.</p>
       <div class="card-row" id="key-numbers">
         <div class="num-card">
+          <h3>Random baseline</h3>
           <div class="val scar" id="k-random">—</div>
         </div>
         <div class="num-card">
+          <h3>Base model</h3>
           <div class="val smoke" id="k-base">—</div>
         </div>
         <div class="num-card">
     </section>
     <section>
+      <h2>What you are seeing (and what compute unlocks next)</h2>
+      <div class="read-block">
+        <h3>SFT loss</h3>
+        <p>Supervised training usually shows a clear downward trend early, then a gentler slope as the model matches the Parlay format and tone. A flatter tail often means the model is close to the local optimum for that data — not that learning has “stopped” entirely.</p>
+        <h3>GRPO mean reward</h3>
+        <p>Policy-gradient training optimizes a noisy signal: each batch samples different episodes and rollouts, so the curve wiggles. Uptrends mean the value head and policy are moving toward higher rewards on average. Plateaus are common when the policy hits a local policy and the advantage estimates shrink.</p>
+        <h3>GRPO training loss</h3>
+        <p>Unlike SFT, this is not a simple cross-entropy to a single target. Loss can bump around while reward improves because the loss reflects ratios, clipping, and changing baselines, not just “closer to data.”</p>
+        <h3>Random vs base vs trained</h3>
+        <p>Random play anchors the scale: it shows what unstructured actions score under the same reward. The base model (before GRPO) reflects instruction-following without RL shaping; GRPO should lift the mean if the environment signal is learnable. Gaps that look small in absolute value can still be meaningful when rewards mix sparse bonuses and penalties.</p>
+        <h3>Compute</h3>
+        <p>We ran a compact schedule (1.5B + LoRA, modest generations per step) to keep iteration fast. With more budget, the same stack could support longer rollouts, higher group size for less noisy advantage estimates, or additional GRPO steps to see whether reward plateaus or keeps climbing — and full fine-tuning if we wanted to stress capacity over adapters.</p>
+      </div>
     </section>
     <section>
     function renderStatus(data) {
       const badge = document.getElementById("status-badge");
       if (data.model_on_hub) {
+        badge.innerHTML = '<span class="badge ok">In-app: trained model ready</span>';
       } else {
+        badge.innerHTML = '<span class="badge wait">Hub checkpoint is live; set <code>HF_MODEL_REPO</code> for the trained in-app opponent</span>';
       }
     }
     function renderKeyNumbers(data) {
         cap.hidden = true;
       }
     }
+    function compareSection(data) {
       const el = document.getElementById("fig-compare");
       const cap = document.getElementById("cap-compare");
+      const url = data.comparison_url;
+      if (url) {
+        el.innerHTML = '<img src="' + url + '" alt="Random vs Base vs GRPO comparison" style="width:100%;border:1px solid var(--gold);border-radius:2px" />';
         cap.hidden = false;
       } else {
+        el.innerHTML = '<div class="fig-placeholder">Four-bar chart: add <code>images/training_curves.png</code> (or <code>results/training_curves.png</code>) or run eval to generate a comparison.</div>';
         cap.hidden = true;
       }
     }
     function hubSection(data) {
       const h = document.getElementById("hub-block");
+      const hubUrl = "https://huggingface.co/sh4shv4t/parlay-grpo-1-5b";
+      const hubName = "sh4shv4t/parlay-grpo-1-5b";
       if (data.model_on_hub) {
         h.innerHTML =
           '<div class="hub-card">' +
+          '<p><strong>Trained model on Hugging Face Hub</strong></p>' +
+          '<p><a href="' + hubUrl + '" target="_blank" rel="noopener">→ ' + hubName + '</a></p>' +
+          '<p class="caption" style="color:var(--smoke)">GRPO LoRA on Qwen2.5-1.5B (Parlay episodes).</p>' +
           '<button type="button" class="btn-gold" id="btn-try-trained">→ Play against the trained model</button>' +
           '</div>';
         document.getElementById("btn-try-trained").addEventListener("click", async () => {
         });
       } else {
         h.innerHTML =
+          '<div class="hub-card">' +
+          '<p><strong>Trained model on Hugging Face Hub</strong></p>' +
+          '<p><a href="' + hubUrl + '" target="_blank" rel="noopener">→ ' + hubName + '</a></p>' +
+          '<p class="caption" style="color:var(--smoke)">The checkpoint is live. To use it as the opponent in this app, set the server env <code>HF_MODEL_REPO</code> to <code>' + hubName + '</code> (Hugging Face Spaces: Secrets), then use the play button on the home page after switching the opponent.</p>' +
           '</div>';
       }
     }
         sftSection(data.sft_loss_url);
         rewardSection(data.grpo_reward_url);
         grpoLossSection(data.grpo_loss_url);
+        compareSection(data);
         hubSection(data);
       } catch (e) {
         document.getElementById("status-badge").innerHTML = '<span class="badge wait">Could not load status</span>';

images/Parlay_square logo.png ADDED Viewed

Git LFS Details

SHA256: 0e2e1a93a0513de853c7161072b2aedee5b297fa7087ded3684513e444f1157a
Pointer size: 132 Bytes
Size of remote file: 3.16 MB

images/grpo_loss_curve.png ADDED Viewed

Git LFS Details

SHA256: 83ea972cabea22d17f145ed43ca92795f312b834ab83284eabcedc72d6d52a8e
Pointer size: 130 Bytes
Size of remote file: 65.2 kB

images/grpo_reward_curve.png ADDED Viewed

Git LFS Details

SHA256: 89fecc46790b4a680a3432d6ec09c8554018062516bf0588289d9278644c8618
Pointer size: 130 Bytes
Size of remote file: 55.8 kB

images/training_curves.png ADDED Viewed

Git LFS Details

SHA256: 02ee29dfcc78bd403fa8b6f91272947751be1300e6a607f85b9e82296a517df1
Pointer size: 130 Bytes
Size of remote file: 18.6 kB

main.py CHANGED Viewed

@@ -112,6 +112,24 @@ async def serve_train_results() -> FileResponse:
     )
 @app.get("/favicon.ico", include_in_schema=False, response_model=None)
 async def favicon():
     """

     )
+@app.get("/judge", include_in_schema=False)
+async def serve_judge_demo() -> FileResponse:
+    """GRPO (trained) negotiator: same game UI with opponent forced to HF model."""
+    return FileResponse(
+        "dashboard/judge.html",
+        headers={"Cache-Control": "no-cache, must-revalidate"},
+    )
+@app.get("/interact", include_in_schema=False)
+async def serve_interact() -> FileResponse:
+    """Direct model inference page — talk to the GRPO model without game scaffolding."""
+    return FileResponse(
+        "dashboard/interact.html",
+        headers={"Cache-Control": "no-cache, must-revalidate"},
+    )
 @app.get("/favicon.ico", include_in_schema=False, response_model=None)
 async def favicon():
     """

openenv.yaml CHANGED Viewed

@@ -17,7 +17,7 @@ license: "MIT"
 # URLs — judges pull the env from space_url
 space_url: "https://huggingface.co/spaces/sh4shv4t/Parlay"
 repository: "https://github.com/sh4shv4t/Parlay"
-blog: "https://huggingface.co/blog/sh4shv4t/parlay"
 dataset: "https://huggingface.co/datasets/sh4shv4t/parlay-episodes"
 sft_model: "https://huggingface.co/sh4shv4t/parlay-sft-1-5b"
 grpo_model: "https://huggingface.co/sh4shv4t/parlay-grpo-1-5b"

 # URLs — judges pull the env from space_url
 space_url: "https://huggingface.co/spaces/sh4shv4t/Parlay"
 repository: "https://github.com/sh4shv4t/Parlay"
+blog: "https://github.com/sh4shv4t/Parlay/blob/main/BLOG.md"
 dataset: "https://huggingface.co/datasets/sh4shv4t/parlay-episodes"
 sft_model: "https://huggingface.co/sh4shv4t/parlay-sft-1-5b"
 grpo_model: "https://huggingface.co/sh4shv4t/parlay-grpo-1-5b"

requirements-dev.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Development tooling (optional)
2	+ pre-commit>=3.7.0

results/eval_results.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "random_mean_reward": 70.8231,
+  "base_mean_reward": null,
+  "grpo_mean_reward": null,
+  "_comment": "random from: python -m training.random_baseline --episodes 50 --output results/random_baseline.json (local, 2026-04-26). base_mean_reward and grpo_mean_reward need: Python with torch+GPU, data/episodes.jsonl with split=eval, then python -m training.evaluate --base ... --sft ... --grpo ... -n 50 -o results/eval_results.json (merges these keys)."
+}

results/random_baseline.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "episodes_requested": 50,
+  "episodes_completed": 50,
+  "avg_reward": 70.8231,
+  "deal_rate": 1.0,
+  "avg_efficiency": 0.4836,
+  "avg_tom_accuracy": 0.663,
+  "bluffs_caught": 0
+}

scripts/check_staged_not_pycache.py ADDED Viewed

	@@ -0,0 +1,37 @@

+"""Fail if any staged file is under __pycache__ or is a .pyc / .pyo (pre-commit local hook)."""
+from __future__ import annotations
+import subprocess
+import sys
+from pathlib import Path
+def main() -> int:
+    out = subprocess.run(
+        ["git", "diff", "--cached", "--name-only", "-z"],
+        check=True,
+        capture_output=True,
+    ).stdout
+    if not out:
+        return 0
+    bad: list[Path] = []
+    for raw in out.split(b"\0"):
+        if not raw:
+            continue
+        p = raw.decode("utf-8", errors="replace")
+        pl = p.lower()
+        if "__pycache__" in p:
+            bad.append(Path(p))
+        elif pl.endswith(".pyc") or pl.endswith(".pyo"):
+            bad.append(Path(p))
+    if not bad:
+        return 0
+    print("Refuse to commit bytecode or __pycache__ paths:", file=sys.stderr)
+    for p in bad:
+        print(f"  {p}", file=sys.stderr)
+    print("Remove from the index: git reset HEAD -- <file>", file=sys.stderr)
+    return 1
+if __name__ == "__main__":
+    raise SystemExit(main())

scripts/push_dataset.py CHANGED Viewed

@@ -76,7 +76,7 @@ deal_efficiency, tom_accuracy, drift_adapted
 [Space](https://huggingface.co/spaces/sh4shv4t/Parlay) |
 [GitHub](https://github.com/sh4shv4t/Parlay) |
 [SFT Model](https://huggingface.co/sh4shv4t/parlay-sft-1-5b) |
-[Blog](https://huggingface.co/blog/sh4shv4t/parlay)
 """
     with tempfile.NamedTemporaryFile(
         mode="w",

 [Space](https://huggingface.co/spaces/sh4shv4t/Parlay) |
 [GitHub](https://github.com/sh4shv4t/Parlay) |
 [SFT Model](https://huggingface.co/sh4shv4t/parlay-sft-1-5b) |
+[Blog](https://github.com/sh4shv4t/Parlay/blob/main/BLOG.md)
 """
     with tempfile.NamedTemporaryFile(
         mode="w",

training/GRPO_HF_RUNBOOK.md CHANGED Viewed

@@ -140,7 +140,7 @@ GRPO already **builds charts in code** (`training/grpo_train.py` → `_save_trai
 ## 4. Alternative: Colab (no Jobs)
-Use `notebooks/parlay_grpo_colab.ipynb`. In the **config** cell, set:
 ```python
 JSONL_VIA_HF = ("sh4shv4t/parlay-episodes", "episodes_v2.jsonl")

 ## 4. Alternative: Colab (no Jobs)
+Use `training/notebooks/parlay_grpo_colab.ipynb`. In the **config** cell, set:
 ```python
 JSONL_VIA_HF = ("sh4shv4t/parlay-episodes", "episodes_v2.jsonl")

{notebooks → training/notebooks}/parlay_grpo_colab.ipynb RENAMED Viewed

File without changes

training/notebooks/parlay_grpo_hf_job_log_summary.ipynb ADDED Viewed

	@@ -0,0 +1,86 @@

+{
+  "nbformat": 4,
+  "nbformat_minor": 5,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "gpuType": "A100"
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# Hugging Face Job — GRPO run (truncated log)\n",
+        "\n",
+        "**Why:** Stage-2 **GRPO** on top of the SFT LoRA so the policy is optimized with Parlay rewards (ToM, format, anti-capitulation, etc.) and the adapter can be **pushed to the Hub** for eval / Spaces.\n",
+        "\n",
+        "**This notebook** is not a runnable training recipe — it is a **short record** of one HF Job: intent + shell entrypoint + cherry-picked log lines."
+      ],
+      "id": "md-purpose"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Command (repo root on the job, e.g. `/work`)\n",
+        "\n",
+        "After `git clone` and `pip install -r requirements-train.txt`, the job used the standard entry script with **80 steps** and **G=2** (matches console lines below)."
+      ],
+      "id": "md-cmd-intro"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "%%bash\n",
+        "# Equivalent to what the HF Job ran (set HF_TOKEN / HUGGINGFACE_HUB_TOKEN for push)\n",
+        "export GRPO_STEPS=80 GRPO_G=2\n",
+        "export DATASET_ID=sh4shv4t/parlay-episodes EPISODE_FILE=episodes_v2.jsonl\n",
+        "export SFT_MODEL=sh4shv4t/parlay-sft-1-5b HF_GRPO_REPO=sh4shv4t/parlay-grpo-1-5b OUTPUT_DIR=outputs/grpo_run\n",
+        "bash scripts/hf_grpo_entry.sh"
+      ],
+      "id": "cell-bash-cmd",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Truncated log (high signal only)\n",
+        "\n",
+        "```text\n",
+        "Job started at 2026-04-26 00:15:37\n",
+        "Cloning into '/work'...\n",
+        "... pip install -r requirements-train.txt ... (torch 2.8, transformers 5.6, trl 1.2, ...)\n",
+        "\n",
+        "==> Downloading episodes_v2.jsonl from dataset sh4shv4t/parlay-episodes ...\n",
+        "==> GRPO: SFT=sh4shv4t/parlay-sft-1-5b steps=80 G=2 out=/work/outputs/grpo_run\n",
+        "Filtered 0 records below min_reward=-50.0, 124 remaining for GRPO\n",
+        "\n",
+        "INFO: Loading SFT LoRA: adapter=sh4shv4t/parlay-sft-1-5b base=Qwen/Qwen2.5-1.5B-Instruct\n",
+        "INFO: Starting GRPO training: ... prompts=124, G=2, steps=80\n",
+        "\n",
+        "  ... 80/80 steps (~15s/step) ...\n",
+        "  {'train_loss': '0.0001051', 'epoch': '5.333', ...}\n",
+        "\n",
+        "No log history to plot\n",
+        "INFO: Model saved to /work/outputs/grpo_run\n",
+        "==> Pushing to https://huggingface.co/sh4shv4t/parlay-grpo-1-5b ...\n",
+        "Model uploaded successfully!\n",
+        "==> Done.\n",
+        "```"
+      ],
+      "id": "md-truncated-log"
+    }
+  ]
+}

training/notebooks/parlay_hf_only_eval_colab.ipynb ADDED Viewed

	@@ -0,0 +1,371 @@

+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "id": "b3dfc8be",
+      "metadata": {},
+      "source": [
+        "# Parlay — Hugging Face only eval (no GitHub)\n",
+        "\n",
+        "This notebook **only** needs a Hugging Face token (`HF_TOKEN Colab secret`) and a **GPU** runtime. After the first run of the **Install dependencies** cell, use **Runtime → Restart session** and run all cells from the top (needed so the reinstalled `torch` / `torchvision` stack is loaded). If you change packages or see `torchao`, `peft`, or `BloomPreTrainedModel` import errors, restart again.\n",
+        "\n",
+        "It will:\n",
+        "\n",
+        "1. Download `episodes_v2.jsonl` from the dataset [sh4shv4t/parlay-episodes](https://huggingface.co/datasets/sh4shv4t/parlay-episodes)\n",
+        "2. Keep rows with `split == \"eval\"`\n",
+        "3. Load **Qwen2.5-1.5B-Instruct**, [sh4shv4t/parlay-sft-1-5b](https://huggingface.co/sh4shv4t/parlay-sft-1-5b), and [sh4shv4t/parlay-grpo-1-5b](https://huggingface.co/sh4shv4t/parlay-grpo-1-5b) (LoRA adapters on top of the base)\n",
+        "4. For each episode, generate one JSON reply in the same chat style as training (`apply_chat_template`), parse `offer_amount`, and score **terminal deal efficiency** (GAMMA=100) with buyer- vs seller-AI ZOPA logic (same rules as the Parlay `reward_fn` for efficiency)\n",
+        "5. Print means and a JSON blob you can paste into `results/eval_results.json`\n",
+        "\n",
+        "This is a **single-step** policy probe on the first user turn (like `training/evaluate.py` on GPU), not a full multi-turn OpenEnv roll-out."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "00322384",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# @title 1) Install dependencies\n",
+        "%%capture\n",
+        "# Reinstall torch+torchvision+torchaudio from ONE CUDA index (fixes mismatched cu130/cu128 and the bogus BloomPreTrainedModel peft error). If cu130 wheels fail, switch URL to .../whl/cu128.\n",
+        "%pip install -q --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130\n",
+        "%pip install -q -U transformers accelerate peft bitsandbytes safetensors huggingface_hub sentencepiece\n",
+        "%pip install -q -U \"torchao>=0.16.0\""
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# @title 2) HF_TOKEN (only secret you need)\n",
+        "import os\n",
+        "from google.colab import userdata\n",
+        "\n",
+        "HF_TOKEN = (userdata.get(\"HF_TOKEN\") or os.environ.get(\"HF_TOKEN\") or \"\").strip()\n",
+        "if not HF_TOKEN:\n",
+        "    raise RuntimeError(\n",
+        "        \"Set Colab secret HF_TOKEN: open the key icon → add HF_TOKEN with a read token from huggingface.co/settings/tokens\"\n",
+        "    )\n",
+        "os.environ[\"HF_TOKEN\"] = HF_TOKEN\n",
+        "os.environ[\"HUGGING_FACE_HUB_TOKEN\"] = HF_TOKEN\n",
+        "print(\"HF_TOKEN: OK (length)\", len(HF_TOKEN))\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# @title 3) Config — Hub IDs (defaults match the Parlay README)\n",
+        "BASE_MODEL = \"Qwen/Qwen2.5-1.5B-Instruct\"\n",
+        "SFT_ADAPTER = \"sh4shv4t/parlay-sft-1-5b\"\n",
+        "GRPO_ADAPTER = \"sh4shv4t/parlay-grpo-1-5b\"\n",
+        "DATASET_REPO = \"sh4shv4t/parlay-episodes\"\n",
+        "DATASET_FILE = \"episodes_v2.jsonl\"\n",
+        "N_EVAL = 50  # set smaller (e.g. 10) for a quick smoke test\n",
+        "\n",
+        "import subprocess\n",
+        "import torch\n",
+        "if not torch.cuda.is_available():\n",
+        "    print(\"WARNING: no GPU — inference will be very slow; use Runtime → Change runtime type → GPU.\")\n",
+        "else:\n",
+        "    subprocess.run([\"nvidia-smi\", \"-L\"], check=False)\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# @title 4) Load eval rows from the Hub dataset (JSONL, no local clone)\n",
+        "import json\n",
+        "from huggingface_hub import hf_hub_download\n",
+        "\n",
+        "path = hf_hub_download(\n",
+        "    repo_id=DATASET_REPO,\n",
+        "    filename=DATASET_FILE,\n",
+        "    repo_type=\"dataset\",\n",
+        "    token=HF_TOKEN,\n",
+        ")\n",
+        "print(\"Downloaded:\", path)\n",
+        "\n",
+        "rows = []\n",
+        "with open(path, \"r\", encoding=\"utf-8\") as f:\n",
+        "    for line in f:\n",
+        "        line = line.strip()\n",
+        "        if not line:\n",
+        "            continue\n",
+        "        r = json.loads(line)\n",
+        "        if r.get(\"split\") == \"eval\":\n",
+        "            rows.append(r)\n",
+        "if not rows:\n",
+        "    raise RuntimeError(\"No rows with split=eval in JSONL — check the dataset file name / version on the Hub.\")\n",
+        "rows = rows[:N_EVAL]\n",
+        "print(\"Eval rows:\", len(rows), \"(capped to N_EVAL)\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "43ef16ab",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# @title 5) Scoring + prompt (matches Parlay `prompts_qwen` + efficiency reward semantics)\n",
+        "import re\n",
+        "import json as _json\n",
+        "from typing import Any\n",
+        "\n",
+        "GAMMA = 100.0\n",
+        "BUYER_AI = frozenset({\"hiring_package\", \"acquisition_term_sheet\"})\n",
+        "\n",
+        "\n",
+        "def _first_user_content(conversation) -> str:\n",
+        "    if not isinstance(conversation, list):\n",
+        "        return (\n",
+        "            'Please make your opening offer. Reply in valid JSON: '\n",
+        "            '{\"utterance\": \"...\", \"offer_amount\": <number or null>, \"tactical_move\": <string or null>}'\n",
+        "        )\n",
+        "    for turn in conversation:\n",
+        "        if not isinstance(turn, dict):\n",
+        "            continue\n",
+        "        if turn.get(\"role\") in (\"user\", \"negotiator\"):\n",
+        "            c = str(turn.get(\"content\", \"\")).strip()\n",
+        "            if c:\n",
+        "                return c\n",
+        "    return (\n",
+        "        'Please make your opening offer. Reply in valid JSON: '\n",
+        "        '{\"utterance\": \"...\", \"offer_amount\": <number or null>, \"tactical_move\": <string or null>}'\n",
+        "    )\n",
+        "\n",
+        "\n",
+        "def build_generation_prompt(rec: dict, tokenizer) -> str:\n",
+        "    system_msg = str(rec.get(\"prompt\", \"\")).strip()\n",
+        "    user_msg = _first_user_content(rec.get(\"conversation\", []))\n",
+        "    messages = [\n",
+        "        {\"role\": \"system\", \"content\": system_msg},\n",
+        "        {\"role\": \"user\", \"content\": user_msg},\n",
+        "    ]\n",
+        "    if hasattr(tokenizer, \"apply_chat_template\"):\n",
+        "        return tokenizer.apply_chat_template(\n",
+        "            messages, tokenize=False, add_generation_prompt=True\n",
+        "        )\n",
+        "    eot = str(\n",
+        "        bytes((60, 124, 105, 109, 95, 101, 110, 100, 124, 62)), \"ascii\"\n",
+        "    )\n",
+        "    return (\n",
+        "        f\"<|im_start|>system\\n{system_msg}{eot}\\n\"\n",
+        "        f\"<|im_start|>user\\n{user_msg}{eot}\\n\"\n",
+        "        f\"<|im_start|>assistant\\n\"\n",
+        "    )\n",
+        "\n",
+        "\n",
+        "def parse_offer(text: str) -> float:\n",
+        "    t = (text or \"\").replace(\"```json\", \"\").replace(\"```\", \"\").strip()\n",
+        "    m = re.search(r\"\\{[\\s\\S]*\\}\", t)\n",
+        "    if not m:\n",
+        "        return 0.0\n",
+        "    try:\n",
+        "        d = _json.loads(m.group(0))\n",
+        "        v = d.get(\"offer_amount\")\n",
+        "        if v is None:\n",
+        "            return 0.0\n",
+        "        return float(v)\n",
+        "    except Exception:\n",
+        "        return 0.0\n",
+        "\n",
+        "\n",
+        "def efficiency_reward(offer: float, rec: dict) -> float:\n",
+        "    \"\"\"Parlay-style E in [0,1] * GAMMA for terminal deal efficiency.\"\"\"\n",
+        "    batna_seller = float(rec.get(\"batna_seller\", 0) or 0)\n",
+        "    batna_buyer = float(rec.get(\"batna_buyer\", batna_seller) or batna_seller)\n",
+        "    zopa = max(1.0, batna_buyer - batna_seller)\n",
+        "    sid = str(rec.get(\"scenario_id\", \"\") or \"\")\n",
+        "    is_buyer = sid in BUYER_AI\n",
+        "    if offer <= 0:\n",
+        "        return 0.0\n",
+        "    if is_buyer:\n",
+        "        e = max(0.0, min(1.0, (batna_buyer - offer) / zopa))\n",
+        "    else:\n",
+        "        e = max(0.0, min(1.0, (offer - batna_seller) / zopa))\n",
+        "    return e * GAMMA\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "aeefab1f",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# @title 6) Load models (4-bit on GPU) — base, SFT, GRPO\n",
+        "from pathlib import Path\n",
+        "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n",
+        "from peft import PeftModel\n",
+        "from huggingface_hub import hf_hub_download\n",
+        "import json as _json\n",
+        "\n",
+        "\n",
+        "def load_tokenizer_for(repo_id: str):\n",
+        "    # Prefer adapter repo tokenizer; Hub adapter repos usually ship tokenizer config\n",
+        "    return AutoTokenizer.from_pretrained(\n",
+        "        repo_id, trust_remote_code=True, token=HF_TOKEN\n",
+        "    )\n",
+        "\n",
+        "\n",
+        "def is_adapter_repo(repo_id: str) -> bool:\n",
+        "    try:\n",
+        "        hf_hub_download(repo_id=repo_id, filename=\"adapter_config.json\", token=HF_TOKEN)\n",
+        "        return True\n",
+        "    except Exception:\n",
+        "        return False\n",
+        "\n",
+        "\n",
+        "def load_causal(hub_id: str, use_4bit: bool = True):\n",
+        "    use_bnb = use_4bit and torch.cuda.is_available()\n",
+        "    common = dict(trust_remote_code=True, token=HF_TOKEN)\n",
+        "    if use_bnb:\n",
+        "        bnb = BitsAndBytesConfig(\n",
+        "            load_in_4bit=True,\n",
+        "            bnb_4bit_compute_dtype=torch.bfloat16,\n",
+        "            bnb_4bit_use_double_quant=True,\n",
+        "            bnb_4bit_quant_type=\"nf4\",\n",
+        "        )\n",
+        "        mkw = {**common, \"quantization_config\": bnb, \"device_map\": \"auto\"}\n",
+        "    else:\n",
+        "        mkw = {\n",
+        "            **common,\n",
+        "            \"torch_dtype\": torch.bfloat16 if torch.cuda.is_available() else torch.float32,\n",
+        "            \"device_map\": \"auto\" if torch.cuda.is_available() else None,\n",
+        "        }\n",
+        "    if is_adapter_repo(hub_id):\n",
+        "        cfg_p = Path(hf_hub_download(repo_id=hub_id, filename=\"adapter_config.json\", token=HF_TOKEN))\n",
+        "        ac = _json.loads(cfg_p.read_text(encoding=\"utf-8\"))\n",
+        "        base = ac.get(\"base_model_name_or_path\", BASE_MODEL)\n",
+        "        base_m = AutoModelForCausalLM.from_pretrained(base, **mkw)\n",
+        "        m = PeftModel.from_pretrained(base_m, hub_id, token=HF_TOKEN)\n",
+        "    else:\n",
+        "        m = AutoModelForCausalLM.from_pretrained(hub_id, **mkw)\n",
+        "    m.eval()\n",
+        "    return m\n",
+        "\n",
+        "print(\"Load helpers: OK\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "2eeef38d",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# @title 7) Run evaluation (reload tokenizer per checkpoint for matching templates)\n",
+        "import gc\n",
+        "\n",
+        "SPECS = [\n",
+        "    (\"base\", BASE_MODEL),\n",
+        "    (\"sft\", SFT_ADAPTER),\n",
+        "    (\"grpo\", GRPO_ADAPTER),\n",
+        "]\n",
+        "\n",
+        "\n",
+        "@torch.inference_mode()\n",
+        "def run_eval_for_repo(name: str, repo_id: str) -> float:\n",
+        "    tok = load_tokenizer_for(repo_id)\n",
+        "    if tok.pad_token is None:\n",
+        "        tok.pad_token = tok.eos_token\n",
+        "    model = None\n",
+        "    try:\n",
+        "        model = load_causal(repo_id, use_4bit=True)\n",
+        "        rews = []\n",
+        "        for i, rec in enumerate(rows):\n",
+        "            prompt = build_generation_prompt(rec, tok)\n",
+        "            dev = next(model.parameters()).device\n",
+        "            batch = tok(\n",
+        "                prompt, return_tensors=\"pt\", max_length=4096, truncation=True\n",
+        "            )\n",
+        "            batch = {k: v.to(dev) for k, v in batch.items()}\n",
+        "            out = model.generate(\n",
+        "                **batch,\n",
+        "                max_new_tokens=256,\n",
+        "                do_sample=True,\n",
+        "                temperature=0.7,\n",
+        "                pad_token_id=tok.pad_token_id,\n",
+        "            )\n",
+        "            gen = out[0][batch[\"input_ids\"].shape[-1] :]\n",
+        "            text = tok.decode(gen, skip_special_tokens=True)\n",
+        "            offer = parse_offer(text)\n",
+        "            rews.append(efficiency_reward(offer, rec))\n",
+        "        return sum(rews) / max(len(rews), 1)\n",
+        "    finally:\n",
+        "        if model is not None:\n",
+        "            del model\n",
+        "        gc.collect()\n",
+        "        if torch.cuda.is_available():\n",
+        "            torch.cuda.empty_cache()\n",
+        "        if torch.cuda.is_available():\n",
+        "            torch.cuda.synchronize()\n",
+        "\n",
+        "\n",
+        "results = {}\n",
+        "for name, rid in SPECS:\n",
+        "    print(\"Evaluating:\", name, rid)\n",
+        "    m = run_eval_for_repo(name, rid)\n",
+        "    results[name] = m\n",
+        "    print(f\"  mean reward (eff proxy): {m:.3f}\")\n",
+        "\n",
+        "print(\"\\n--- Summary ---\", results)\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "fc806987",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# @title 8) Build `eval_results.json` (copy into your repo: results/eval_results.json)\n",
+        "import json\n",
+        "out = {\n",
+        "    \"base_mean_reward\": results[\"base\"],\n",
+        "    \"sft_mean_reward\": results.get(\"sft\"),\n",
+        "    \"grpo_mean_reward\": results.get(\"grpo\"),\n",
+        "    \"n_eval\": len(rows),\n",
+        "    \"dataset\": DATASET_REPO,\n",
+        "    \"data_file\": DATASET_FILE,\n",
+        "    \"models\": {\n",
+        "        \"base\": BASE_MODEL,\n",
+        "        \"sft\": SFT_ADAPTER,\n",
+        "        \"grpo\": GRPO_ADAPTER,\n",
+        "    },\n",
+        "}\n",
+        "print(json.dumps(out, indent=2))\n",
+        "from google.colab import files\n",
+        "path = \"/content/eval_results.json\"\n",
+        "with open(path, \"w\", encoding=\"utf-8\") as f:\n",
+        "    json.dump(out, f, indent=2)\n",
+        "files.download(path)\n",
+        "print(\"Downloaded eval_results.json — add random_mean_reward manually (e.g. from random_baseline) if needed.\")"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.11.0"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}

{notebooks → training/notebooks}/parlay_sft_colab.ipynb RENAMED Viewed

File without changes