Spaces:

uvpatel7271
/

python-code-review-env

Runtime error

App Files Files Community

uvpatel7271 commited on 8 days ago

Commit

737f100

1 Parent(s): 5d806ad

added inference and triage logic and better optimization and scalable

Browse files

Files changed (13) hide show

.dockerignore +4 -0
DEMO_SCRIPT.md +12 -0
README.md +159 -123
__init__.py +7 -0
pyproject.toml +4 -1
server/Dockerfile +1 -1
server/app.py +24 -8
server/demo.py +420 -0
server/requirements.txt +3 -0
tests/test_triage_pipeline.py +44 -0
triage.py +407 -0
triage_catalog.py +117 -0
triage_models.py +73 -0

.dockerignore CHANGED Viewed

@@ -8,6 +8,10 @@
 !models.py
 !openenv.yaml
 !pyproject.toml
 !server/
 !server/**
 !tasks/

 !models.py
 !openenv.yaml
 !pyproject.toml
+!DEMO_SCRIPT.md
+!triage.py
+!triage_catalog.py
+!triage_models.py
 !server/
 !server/**
 !tasks/

DEMO_SCRIPT.md ADDED Viewed

	@@ -0,0 +1,12 @@

+# TorchReview Copilot Demo Script
+## 60-90 Second Walkthrough
+1. Open the Hugging Face Space and introduce TorchReview Copilot as an AI-powered Python triage assistant built with PyTorch.
+2. Point to the single-sentence problem statement: teams lose time figuring out whether a failure is syntax, logic, or performance related.
+3. Select the `Fix the invoice total syntax regression` example to show the app loading a real broken code sample.
+4. Highlight the **Live Triage Radar** updating immediately, then call out the predicted issue class and repair risk.
+5. Explain that the PyTorch layer uses CodeBERTa embeddings to compare the input against known bug patterns from the OpenEnv task catalog.
+6. Scroll to the repair plan and note that the output is not just a label; it gives a prioritized remediation checklist and the nearest known failure pattern.
+7. Switch to the performance example to show the confidence profile change and emphasize that the system can distinguish runtime bottlenecks from correctness bugs.
+8. Close by noting that OpenEnv still powers deterministic validation under the hood, so the demo stays grounded in measurable task outcomes.

README.md CHANGED Viewed

@@ -1,189 +1,225 @@
 ---
-title: Python Code Review Environment
-emoji: snake
-colorFrom: yellow
-colorTo: blue
 sdk: docker
 pinned: false
 app_port: 8000
 tags:
   - openenv
   - code-review
-  - python
 ---
-# python_code_review_env
-`python_code_review_env` is a production-style OpenEnv environment that simulates a realistic Python code review workflow. An agent inspects broken code, edits it, runs tests, and submits a final solution against deterministic graders for syntax repair, bug fixing, and optimization/refactoring.
-## Environment design
-- `Observation` includes task instructions, current code, syntax errors, public test output, action history, and remaining attempts.
-- `Action` is structured as `analyze_code`, `edit_code`, `run_tests`, or `submit_solution`.
-- `Reward` is shaped and non-binary. The environment awards syntax progress, test progress, correctness, and quality improvements while penalizing invalid actions, timeouts, regressions, and unchanged edits.
-- `State` exposes the internal episode snapshot through `/state`.
-## Task set
-1. `syntax_fix_invoice_totals` (easy)
-   Fix a syntax regression in an invoice normalization helper.
-2. `bug_fix_session_windows` (medium)
-   Repair a session-collapsing bug using deterministic public and hidden tests.
-3. `optimization_rank_active_users` (hard)
-   Refactor a slow ranking function and earn additional score from runtime improvement plus AST/style quality.
-## Action schema
-```json
-{
-  "action_type": "edit_code",
-  "code": "def function(...):\n    ..."
-}
-```
-Supported `action_type` values:
-- `analyze_code`
-- `edit_code`
-- `run_tests`
-- `submit_solution`
-## Observation schema
-```json
-{
-  "task_description": "...",
-  "current_code": "...",
-  "errors": "...",
-  "test_results": "...",
-  "history": []
-}
-```
-The full observation also includes `task_id`, `difficulty`, `task_kind`, `visible_tests`, `attempts_remaining`, `score`, `last_action_status`, `reward`, `done`, and a structured `reward_details` breakdown.
-## Deterministic grading
-- Syntax tasks use `compile()` plus hidden behavioral checks.
-- Bug-fix tasks use deterministic function-call cases that behave like pytest assertions.
-- Optimization tasks combine correctness, runtime benchmarking, and AST/style quality scoring.
-- Infinite loops and long-running solutions are sandboxed with subprocess timeouts and receive penalties.
-- All scores are clamped to `[0.0, 1.0]`.
-## Run locally
-Install dependencies:
-```bash
-pip install .
-```
-Start the API server:
-```bash
-uvicorn server.app:app --host 0.0.0.0 --port 8000
-```
-Smoke-test the environment:
-```bash
-curl http://localhost:8000/health
-curl http://localhost:8000/state
-```
-OpenEnv validation:
-```bash
-openenv validate
-```
-## Docker build
-The Docker image no longer depends on `ghcr.io/meta-pytorch/openenv-base:latest`, which removes the TLS handshake failure from the original build path.
-```bash
-# Run from repo root
-docker build -t python-code-review-env -f server/Dockerfile .
-docker run --rm -p 8000:8000 python-code-review-env
-```
-If you run the build from inside `server/`, you must point the context at the repo root:
-```bash
-docker build -t python-code-review-env -f Dockerfile ..
 ```
-Expected health check:
 ```bash
-curl http://localhost:8000/health
 ```
-## Hugging Face Spaces deployment
-1. Create a Docker Space.
-2. Push this repository content to the Space.
-3. Ensure port `8000` is exposed.
-4. Wait for the container to build.
-5. Verify `/reset` and `/health` return `200`.
-The image is CPU-friendly and designed for a small Hugging Face Space such as `2 vCPU / 8 GB RAM`.
-## Inference baseline
-`inference.py` uses an OpenAI-compatible client:
-```python
-client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
 ```
-Supported providers include:
-- Gemini through an OpenAI-compatible gateway
-- OpenRouter
-- Together AI
-- DeepSeek-compatible OpenAI endpoints
-Run it with a free/open provider:
 ```bash
-set API_BASE_URL=https://openrouter.ai/api/v1
-set API_KEY=...
-set MODEL=deepseek/deepseek-chat-v3-0324:free
-python inference.py
 ```
-If no credentials are supplied, the script falls back to a deterministic smoke-test policy that applies the reference fix for each task so the environment can still be validated end to end.
-Example output:
-```text
-Task 1 Score: 1.0
-Task 2 Score: 1.0
-Task 3 Score: 0.9
-Final Score: 1.0
 ```
-## Project structure
 ```text
 python_env/
 ├── client.py
 ├── graders/
-│   ├── bug_fix.py
-│   ├── dispatch.py
-│   ├── optimization.py
-│   ├── shared.py
-│   └── syntax.py
-├── inference.py
-├── models.py
-├── openenv.yaml
-├── README.md
 ├── server/
 │   ├── app.py
-│   ├── Dockerfile
-│   ├── env.py
-│   └── python_env_environment.py
-└── tasks/
-    └── catalog.py
 ```

 ---
+title: TorchReview Copilot
+emoji: torch
+colorFrom: orange
+colorTo: red
 sdk: docker
 pinned: false
 app_port: 8000
 tags:
+  - pytorch
+  - gradio
+  - fastapi
   - openenv
   - code-review
 ---
+# TorchReview Copilot
+TorchReview Copilot is an **AI-powered Python code triage system using PyTorch** to classify issue type, estimate repair risk, and generate an actionable remediation plan from broken code plus failure output.
+It upgrades the original OpenEnv hackathon environment into a judge-friendly product demo: a polished Hugging Face Space on top, with the deterministic OpenEnv validation engine still preserved underneath.
+**Live demo:** [Hugging Face Space](https://huggingface.co/spaces/uvpatel7271/final-python-env)
+**Repository:** [uvpatel/final-python-env](https://github.com/uvpatel/final-python-env)
+## Problem Statement
+Engineering teams lose time during incident response and code review because broken Python snippets often arrive with noisy traces, partial test output, and unclear ownership. Before fixing anything, someone still has to answer:
+- Is this a syntax issue, a logic bug, or a performance regression?
+- How risky is the repair?
+- What should be checked first?
+That triage step is repetitive, error-prone, and often slows down the actual fix.
+## Solution
+TorchReview Copilot turns code plus traceback text into a practical triage report:
+- **Issue classification:** syntax, logic, or performance
+- **Repair risk:** low, medium, or high
+- **Live Triage Radar:** confidence visualization for all issue classes
+- **Nearest known pattern:** the closest OpenEnv task match
+- **Fix plan:** prioritized remediation steps for the engineer
+The result is a demo that feels like a real AI debugging assistant rather than a backend-only environment.
+## Why PyTorch Matters
+This project uses **PyTorch for real inference**, not placeholder branching:
+- `transformers` + `torch` load `huggingface/CodeBERTa-small-v1`
+- the model encodes code snippets and failure context into embeddings
+- embeddings are compared against curated OpenEnv issue prototypes
+- the final decision blends model similarity with lightweight static analysis signals
+That gives the demo an actual model-backed classification path while keeping it CPU-friendly for Hugging Face Spaces.
+## How It Works
+### Pipeline
+`Input code + traceback -> static checks -> PyTorch embeddings -> similarity against issue prototypes -> confidence scores -> repair plan`
+### Detailed Flow
+1. The user pastes Python code and optional traceback or benchmark output.
+2. TorchReview extracts lightweight static signals:
+   - parser success/failure
+   - assertion-style test language
+   - performance keywords
+   - nested-loop depth
+3. CodeBERTa runs through PyTorch to embed the combined input.
+4. The embedding is compared against built-in issue prototypes derived from the OpenEnv task catalog.
+5. The UI returns:
+   - top issue label
+   - confidence radar
+   - repair risk
+   - nearest known bug pattern
+   - suggested next action
+## Built-In Demo Scenarios
+The app ships with three grounded examples reused from the OpenEnv tasks:
+1. **Syntax regression:** broken invoice normalization helper
+2. **Logic bug:** session window boundary failure
+3. **Performance bottleneck:** slow active-user ranking pipeline
+These examples make the classification differences obvious during judging and video demos.
+## Tech Stack
+- **PyTorch** for embedding inference
+- **Transformers** for `CodeBERTa-small-v1`
+- **Gradio** for the polished Hugging Face Space UI
+- **FastAPI** for the app server
+- **OpenEnv** for deterministic validation endpoints and environment compatibility
+- **Pydantic** for typed schemas
+## Hugging Face Space UX
+The root app now presents a production-style triage experience:
+- a clear problem/solution hero section
+- example scenario selector
+- code and traceback inputs
+- **Live Triage Radar**
+- structured fix plan
+- visible model/backend notes
+The underlying OpenEnv endpoints remain available for compatibility and evaluation.
+## Screenshots
+Add screenshots after deployment:
+- `docs/screenshots/home.png` -> hero + inputs
+- `docs/screenshots/triage-radar.png` -> confidence visualization
+- `docs/screenshots/fix-plan.png` -> structured output panel
+Suggested markdown once captured:
+```md
+![TorchReview Copilot Home](docs/screenshots/home.png)
+![Live Triage Radar](docs/screenshots/triage-radar.png)
+![Fix Plan Output](docs/screenshots/fix-plan.png)
 ```
+## Local Setup
+### 1. Install dependencies
 ```bash
+pip install .
 ```
+### 2. Run the application
+```bash
+uvicorn server.app:app --host 0.0.0.0 --port 8000
+```
+### 3. Open the demo
+Visit:
+```text
+http://localhost:8000/
 ```
+### 4. Verify OpenEnv compatibility
+```bash
+curl http://localhost:8000/health
+curl http://localhost:8000/state
+```
+## Docker
 ```bash
+docker build -t torchreview-copilot -f server/Dockerfile .
+docker run --rm -p 8000:8000 torchreview-copilot
 ```
+Expected checks:
+```bash
+curl http://localhost:8000/
+curl http://localhost:8000/health
 ```
+## Project Structure
 ```text
 python_env/
 ├── client.py
 ├── graders/
 ├── server/
 │   ├── app.py
+│   ├── demo.py
+│   └── env.py
+├── tasks/
+├── triage.py
+├── triage_catalog.py
+├── triage_models.py
+├── inference.py
+└── tests/
 ```
+## OpenEnv Compatibility
+The hackathon backend is still present:
+- deterministic task grading
+- structured action/observation/state models
+- `/health`, `/state`, `/reset`, `/step`, and related environment routes
+This means the product demo is not detached from evaluation; it is layered on top of the original OpenEnv system.
+## Demo Script
+See [DEMO_SCRIPT.md](DEMO_SCRIPT.md) for the 60-90 second recording flow.
+Short version:
+1. Open the Space and introduce the problem.
+2. Load the syntax example.
+3. Show the Live Triage Radar and issue label.
+4. Explain the PyTorch embedding step.
+5. Show the matched pattern and fix plan.
+6. Switch to the performance example to prove the model distinguishes issue classes.
+## Limitations
+- The classifier uses pretrained embeddings plus prototype similarity, not a custom fine-tuned model.
+- First model load may take longer on a cold Hugging Face Space.
+- The current demo focuses on short Python snippets rather than full multi-file repositories.
+## Future Work
+- fine-tune the PyTorch classifier on a larger bug triage dataset
+- add repository-level file context and diff-aware analysis
+- include automated patch suggestions after triage
+- track remediation outcomes as a feedback loop for future ranking improvements

__init__.py CHANGED Viewed

@@ -9,6 +9,8 @@ from .models import (
     PythonObservation,
     PythonState,
 )
 __all__ = [
     "PythonAction",
@@ -19,4 +21,9 @@ __all__ = [
     "PythonCodeReviewState",
     "PythonCodeReviewEnv",
     "PythonEnv",
 ]

     PythonObservation,
     PythonState,
 )
+from .triage import CodeTriageEngine, HashingEmbeddingBackend, TransformersEmbeddingBackend, get_default_engine
+from .triage_models import TriageResult
 __all__ = [
     "PythonAction",
     "PythonCodeReviewState",
     "PythonCodeReviewEnv",
     "PythonEnv",
+    "CodeTriageEngine",
+    "HashingEmbeddingBackend",
+    "TransformersEmbeddingBackend",
+    "TriageResult",
+    "get_default_engine",
 ]

pyproject.toml CHANGED Viewed

@@ -5,14 +5,17 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "openenv-python-code-review-env"
 version = "1.0.0"
-description = "Production-grade OpenEnv environment for Python code review workflows."
 readme = "README.md"
 requires-python = ">=3.10"
 dependencies = [
     "fastapi>=0.111.0",
     "openai>=1.76.0",
     "openenv-core[core]>=0.2.2",
     "pytest>=8.0.0",
     "uvicorn>=0.30.0",
 ]

 [project]
 name = "openenv-python-code-review-env"
 version = "1.0.0"
+description = "TorchReview Copilot: AI-powered Python code triage with PyTorch and OpenEnv validation."
 readme = "README.md"
 requires-python = ">=3.10"
 dependencies = [
     "fastapi>=0.111.0",
+    "gradio>=5.26.0",
     "openai>=1.76.0",
     "openenv-core[core]>=0.2.2",
     "pytest>=8.0.0",
+    "torch>=2.2.0",
+    "transformers>=4.45.0",
     "uvicorn>=0.30.0",
 ]

server/Dockerfile CHANGED Viewed

@@ -6,7 +6,7 @@ ENV PYTHONDONTWRITEBYTECODE=1 \
 WORKDIR /app
-COPY pyproject.toml README.md openenv.yaml __init__.py client.py compat.py models.py inference.py /app/
 COPY server /app/server
 COPY tasks /app/tasks
 COPY graders /app/graders

 WORKDIR /app
+COPY pyproject.toml README.md DEMO_SCRIPT.md openenv.yaml __init__.py client.py compat.py models.py inference.py triage.py triage_catalog.py triage_models.py /app/
 COPY server /app/server
 COPY tasks /app/tasks
 COPY graders /app/graders

server/app.py CHANGED Viewed

@@ -1,4 +1,4 @@
-"""FastAPI entrypoint for python_code_review_env."""
 from __future__ import annotations
@@ -9,21 +9,37 @@ except Exception as exc:  # pragma: no cover
         "openenv-core is required to run the API server. Install project dependencies first."
     ) from exc
 try:
     from ..models import PythonCodeReviewAction, PythonCodeReviewObservation
     from .env import PythonCodeReviewEnvironment
 except ImportError:
     from models import PythonCodeReviewAction, PythonCodeReviewObservation
     from server.env import PythonCodeReviewEnvironment
-app = create_app(
-    PythonCodeReviewEnvironment,
-    PythonCodeReviewAction,
-    PythonCodeReviewObservation,
-    env_name="python_code_review_env",
-    max_concurrent_envs=4,
-)
 def main(host: str = "0.0.0.0", port: int = 8000) -> None:

+"""FastAPI + Gradio entrypoint for TorchReview Copilot."""
 from __future__ import annotations
         "openenv-core is required to run the API server. Install project dependencies first."
     ) from exc
+try:
+    import gradio as gr
+except Exception:
+    gr = None  # type: ignore[assignment]
 try:
     from ..models import PythonCodeReviewAction, PythonCodeReviewObservation
     from .env import PythonCodeReviewEnvironment
+    from .demo import build_demo
 except ImportError:
     from models import PythonCodeReviewAction, PythonCodeReviewObservation
     from server.env import PythonCodeReviewEnvironment
+    from server.demo import build_demo
+def build_application():
+    """Compose the OpenEnv API with the Gradio demo frontend."""
+    api_app = create_app(
+        PythonCodeReviewEnvironment,
+        PythonCodeReviewAction,
+        PythonCodeReviewObservation,
+        env_name="python_code_review_env",
+        max_concurrent_envs=4,
+    )
+    if gr is None:
+        return api_app
+    return gr.mount_gradio_app(api_app, build_demo(), path="/")
+app = build_application()
 def main(host: str = "0.0.0.0", port: int = 8000) -> None:

server/demo.py ADDED Viewed

	@@ -0,0 +1,420 @@

+"""Gradio UI for TorchReview Copilot."""
+from __future__ import annotations
+from html import escape
+import gradio as gr
+try:
+    from ..triage import get_default_engine
+except ImportError:
+    from triage import get_default_engine
+CSS = """
+:root {
+  --paper: #f6f1e8;
+  --ink: #162521;
+  --accent: #d95d39;
+  --panel: #fffdf8;
+  --border: #d6c4b8;
+  --muted: #5f6f67;
+  --good: #2d7d62;
+  --warn: #b76516;
+  --high: #b23a48;
+}
+body, .gradio-container {
+  background:
+    radial-gradient(circle at top left, rgba(247, 197, 159, 0.35), transparent 35%),
+    linear-gradient(135deg, #f9f6ef 0%, #efe5d3 100%);
+  color: var(--ink);
+  font-family: Georgia, "Times New Roman", serif;
+}
+.gradio-container {
+  max-width: 1260px !important;
+}
+.hero-card,
+.metric-card,
+.subtle-card {
+  background: rgba(255, 253, 248, 0.95);
+  border: 1px solid var(--border);
+  border-radius: 20px;
+  box-shadow: 0 16px 40px rgba(22, 37, 33, 0.08);
+}
+.hero-card {
+  padding: 28px 30px;
+  margin-bottom: 12px;
+}
+.metric-card,
+.subtle-card {
+  padding: 20px 22px;
+}
+.eyebrow {
+  text-transform: uppercase;
+  letter-spacing: 0.12em;
+  font-size: 12px;
+  color: var(--accent);
+  margin-bottom: 10px;
+}
+.hero-title {
+  font-size: 44px;
+  line-height: 1.05;
+  margin: 0 0 10px;
+}
+.hero-copy {
+  margin: 0;
+  font-size: 18px;
+  line-height: 1.55;
+  color: var(--muted);
+}
+.summary-title {
+  display: flex;
+  justify-content: space-between;
+  gap: 12px;
+  align-items: center;
+  margin-bottom: 14px;
+}
+.pill {
+  display: inline-block;
+  padding: 6px 12px;
+  border-radius: 999px;
+  font-size: 12px;
+  text-transform: uppercase;
+  letter-spacing: 0.08em;
+  background: #efe5d3;
+}
+.pill.low { color: var(--good); }
+.pill.medium { color: var(--warn); }
+.pill.high { color: var(--high); }
+.summary-grid {
+  display: grid;
+  grid-template-columns: repeat(2, minmax(0, 1fr));
+  gap: 12px;
+  margin-top: 16px;
+}
+.summary-stat {
+  background: #fff7ef;
+  border-radius: 14px;
+  padding: 12px 14px;
+  border: 1px solid rgba(214, 196, 184, 0.8);
+}
+.summary-stat strong {
+  display: block;
+  font-size: 12px;
+  text-transform: uppercase;
+  letter-spacing: 0.08em;
+  color: var(--muted);
+  margin-bottom: 6px;
+}
+.radar-wrap {
+  display: grid;
+  gap: 12px;
+}
+.bar {
+  display: grid;
+  gap: 6px;
+}
+.bar-head {
+  display: flex;
+  justify-content: space-between;
+  font-size: 13px;
+  color: var(--muted);
+}
+.bar-track {
+  width: 100%;
+  height: 12px;
+  background: #f2e5d6;
+  border-radius: 999px;
+  overflow: hidden;
+}
+.bar-fill {
+  height: 100%;
+  border-radius: 999px;
+}
+.matched-box {
+  background: #fff7ef;
+  border: 1px solid rgba(214, 196, 184, 0.8);
+  border-radius: 16px;
+  padding: 14px;
+}
+.how-grid {
+  display: grid;
+  grid-template-columns: repeat(4, minmax(0, 1fr));
+  gap: 12px;
+}
+.how-step {
+  background: rgba(255, 253, 248, 0.9);
+  border: 1px solid var(--border);
+  border-radius: 18px;
+  padding: 16px;
+}
+@media (max-width: 900px) {
+  .hero-title {
+    font-size: 34px;
+  }
+  .summary-grid,
+  .how-grid {
+    grid-template-columns: 1fr;
+  }
+}
+"""
+def _default_outputs() -> tuple[str, str, str, str, str]:
+    return (
+        "<div class='metric-card'><div class='eyebrow'>Awaiting Analysis</div><p class='hero-copy'>Paste Python code, add an optional traceback, or load one of the built-in examples.</p></div>",
+        "<div class='metric-card'><div class='eyebrow'>Live Triage Radar</div><p class='hero-copy'>Confidence bars will appear after the first analysis run.</p></div>",
+        "### Fix Plan\nAnalyze a sample to generate a prioritized remediation checklist.",
+        "### Known Pattern Match\nThe nearest OpenEnv task will be highlighted here after inference runs.",
+        "### Model Notes\nBackend and extracted signal details will appear here.",
+    )
+def _summary_html(result) -> str:
+    issue = escape(result.issue_label.title())
+    summary = escape(result.summary)
+    next_action = escape(result.suggested_next_action)
+    return f"""
+    <div class="metric-card">
+      <div class="summary-title">
+        <div>
+          <div class="eyebrow">TorchReview Verdict</div>
+          <h3 style="margin:0;font-size:30px;">{issue} Issue</h3>
+        </div>
+        <span class="pill {escape(result.repair_risk)}">{escape(result.repair_risk)} repair risk</span>
+      </div>
+      <p class="hero-copy">{summary}</p>
+      <div class="summary-grid">
+        <div class="summary-stat">
+          <strong>Matched Pattern</strong>
+          {escape(result.matched_pattern.title)}
+        </div>
+        <div class="summary-stat">
+          <strong>Similarity</strong>
+          {result.matched_pattern.similarity:.0%}
+        </div>
+        <div class="summary-stat">
+          <strong>Inference Backend</strong>
+          {escape(result.model_backend)}
+        </div>
+        <div class="summary-stat">
+          <strong>Next Action</strong>
+          {next_action}
+        </div>
+      </div>
+    </div>
+    """
+def _radar_html(result) -> str:
+    colors = {
+        "syntax": "#d95d39",
+        "logic": "#4f772d",
+        "performance": "#355070",
+    }
+    bars = []
+    for label, score in result.confidence_scores.items():
+        bars.append(
+            f"""
+            <div class="bar">
+              <div class="bar-head"><span>{escape(label.title())}</span><span>{score:.0%}</span></div>
+              <div class="bar-track">
+                <div class="bar-fill" style="width:{score * 100:.1f}%; background:{colors.get(label, '#d95d39')};"></div>
+              </div>
+            </div>
+            """
+        )
+    return f"""
+    <div class="metric-card radar-wrap">
+      <div class="eyebrow">Live Triage Radar</div>
+      {''.join(bars)}
+      <div class="matched-box">
+        <strong>Nearest Known Pattern:</strong> {escape(result.matched_pattern.title)}<br>
+        <span style="color:#5f6f67;">{escape(result.matched_pattern.summary)}</span>
+      </div>
+    </div>
+    """
+def _plan_markdown(result) -> str:
+    plan_lines = "\n".join(f"{index + 1}. {step}" for index, step in enumerate(result.repair_plan))
+    return (
+        "### Fix Plan\n"
+        f"**Primary issue:** `{result.issue_label}`\n\n"
+        f"{plan_lines}\n\n"
+        f"**Suggested next action:** {result.suggested_next_action}"
+    )
+def _match_markdown(result) -> str:
+    return (
+        "### Known Pattern Match\n"
+        f"**Task:** `{result.matched_pattern.task_id}`  \n"
+        f"**Title:** {result.matched_pattern.title}  \n"
+        f"**Why it matched:** {result.matched_pattern.rationale}  \n"
+        f"**Similarity:** {result.matched_pattern.similarity:.0%}"
+    )
+def _model_markdown(result) -> str:
+    signal_lines = "\n".join(
+        f"- `{signal.name}` -> {signal.value} ({signal.impact}, weight {signal.weight:.2f}): {signal.evidence}"
+        for signal in result.extracted_signals
+    ) or "- No strong static signals were extracted."
+    notes = "\n".join(f"- {item}" for item in result.inference_notes) or "- No additional backend notes."
+    return (
+        "### Model Notes\n"
+        f"- **Model backend:** `{result.model_backend}`\n"
+        f"- **Model id:** `{result.model_id}`\n"
+        f"- **Analysis time:** `{result.analysis_time_ms:.2f} ms`\n\n"
+        "### Extracted Signals\n"
+        f"{signal_lines}\n\n"
+        "### Backend Notes\n"
+        f"{notes}"
+    )
+def analyze_inputs(code: str, traceback_text: str) -> tuple[str, str, str, str, str]:
+    """Run the triage engine and format outputs for the Gradio UI."""
+    result = get_default_engine().triage(code or "", traceback_text or "")
+    return (
+        _summary_html(result),
+        _radar_html(result),
+        _plan_markdown(result),
+        _match_markdown(result),
+        _model_markdown(result),
+    )
+def load_example(example_key: str) -> tuple[str, str, str, str, str, str, str, str]:
+    """Populate the UI from a built-in example and immediately analyze it."""
+    example = get_default_engine().example_map()[example_key]
+    outputs = analyze_inputs(example.code, example.traceback_text)
+    header = (
+        f"### Example Scenario\n"
+        f"**{example.title}**  \n"
+        f"{example.summary}  \n"
+        f"Label target: `{example.label}`"
+    )
+    return (example.code, example.traceback_text, header, *outputs)
+def build_demo() -> gr.Blocks:
+    """Create the TorchReview Copilot Gradio application."""
+    examples = get_default_engine().example_map()
+    first_example = next(iter(examples.values()))
+    with gr.Blocks(theme=gr.themes.Soft(primary_hue="orange", secondary_hue="amber"), css=CSS, title="TorchReview Copilot") as demo:
+        gr.HTML(
+            """
+            <div class="hero-card">
+              <div class="eyebrow">Meta PyTorch OpenEnv Hackathon Demo</div>
+              <h1 class="hero-title">TorchReview Copilot</h1>
+              <p class="hero-copy">
+                AI-powered Python code triage using PyTorch to classify issue type, estimate repair risk,
+                and turn messy failure output into an actionable fix plan. OpenEnv stays underneath as the deterministic validation engine.
+              </p>
+            </div>
+            """
+        )
+        with gr.Row():
+            with gr.Column(scale=6):
+                example_choice = gr.Radio(
+                    choices=[(item.title, item.key) for item in examples.values()],
+                    value=first_example.key,
+                    label="Try a built-in failure scenario",
+                    info="Switching examples updates the Live Triage Radar immediately.",
+                )
+                example_header = gr.Markdown()
+                code_input = gr.Code(
+                    value=first_example.code,
+                    language="python",
+                    lines=18,
+                    label="Python code under review",
+                )
+                traceback_input = gr.Textbox(
+                    value=first_example.traceback_text,
+                    lines=7,
+                    label="Optional traceback / failing test output",
+                    placeholder="Paste stack traces, assertion failures, or benchmark notes here.",
+                )
+                with gr.Row():
+                    analyze_button = gr.Button("Analyze With PyTorch", variant="primary")
+                    clear_button = gr.Button("Clear Inputs", variant="secondary")
+            with gr.Column(scale=5):
+                summary_html = gr.HTML()
+                radar_html = gr.HTML()
+                plan_markdown = gr.Markdown()
+                match_markdown = gr.Markdown()
+                model_markdown = gr.Markdown()
+        gr.HTML(
+            """
+            <div class="subtle-card" style="margin-top: 12px;">
+              <div class="eyebrow">How It Works</div>
+              <div class="how-grid">
+                <div class="how-step"><strong>Input</strong><br>Code plus optional traceback or benchmark signal.</div>
+                <div class="how-step"><strong>Processing</strong><br>Static checks extract parser, assertion, and runtime clues.</div>
+                <div class="how-step"><strong>Model</strong><br>CodeBERTa embeddings run through PyTorch and compare against known OpenEnv task patterns.</div>
+                <div class="how-step"><strong>Output</strong><br>Confidence radar, nearest known issue, repair risk, and a practical remediation plan.</div>
+              </div>
+            </div>
+            """
+        )
+        example_choice.change(
+            fn=load_example,
+            inputs=example_choice,
+            outputs=[code_input, traceback_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
+            show_progress="hidden",
+        )
+        analyze_button.click(
+            fn=analyze_inputs,
+            inputs=[code_input, traceback_input],
+            outputs=[summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
+            show_progress="minimal",
+        )
+        clear_button.click(
+            fn=lambda: ("", "", "### Example Scenario\nChoose a built-in example or paste custom code.", *_default_outputs()),
+            inputs=None,
+            outputs=[code_input, traceback_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
+            show_progress="hidden",
+        )
+        demo.load(
+            fn=load_example,
+            inputs=example_choice,
+            outputs=[code_input, traceback_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
+            show_progress="hidden",
+        )
+    return demo

server/requirements.txt CHANGED Viewed

@@ -1,5 +1,8 @@
 openenv-core[core]>=0.2.2
 fastapi>=0.111.0
 uvicorn>=0.30.0
 pytest>=8.0.0
 openai>=1.76.0

 openenv-core[core]>=0.2.2
 fastapi>=0.111.0
+gradio>=5.26.0
 uvicorn>=0.30.0
 pytest>=8.0.0
 openai>=1.76.0
+torch>=2.2.0
+transformers>=4.45.0

tests/test_triage_pipeline.py ADDED Viewed

	@@ -0,0 +1,44 @@

+from __future__ import annotations
+from fastapi.testclient import TestClient
+from triage import CodeTriageEngine, HashingEmbeddingBackend
+from triage_catalog import build_examples
+def test_hashing_backend_returns_normalized_embeddings() -> None:
+    backend = HashingEmbeddingBackend(dimensions=32)
+    embeddings = backend.embed_texts(["def foo():\n    return 1", "for x in items:\n    pass"])
+    assert embeddings.shape == (2, 32)
+    for row in embeddings:
+        assert round(float(row.norm().item()), 5) == 1.0
+def test_examples_map_to_expected_labels_with_fallback_backend() -> None:
+    examples = build_examples()
+    engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
+    for example in examples:
+        result = engine.triage(example.code, example.traceback_text)
+        assert result.issue_label == example.label
+def test_syntax_example_exposes_parser_signal() -> None:
+    example = next(item for item in build_examples() if item.label == "syntax")
+    engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
+    result = engine.triage(example.code, example.traceback_text)
+    assert any(signal.name == "syntax_parse" and signal.value == "fails" for signal in result.extracted_signals)
+    assert result.matched_pattern.task_id == example.task_id
+def test_composed_app_preserves_health_route() -> None:
+    from server.app import build_application
+    client = TestClient(build_application())
+    response = client.get("/health")
+    assert response.status_code == 200
+    assert response.json()["status"] == "ok"

triage.py ADDED Viewed

	@@ -0,0 +1,407 @@

+"""PyTorch-backed triage pipeline for TorchReview Copilot."""
+from __future__ import annotations
+import ast
+import hashlib
+import os
+import re
+import time
+from functools import lru_cache
+from typing import List, Sequence
+import torch
+import torch.nn.functional as F
+try:
+    from transformers import AutoModel, AutoTokenizer
+except Exception:
+    AutoModel = None  # type: ignore[assignment]
+    AutoTokenizer = None  # type: ignore[assignment]
+try:
+    from .triage_catalog import build_examples, build_prototypes
+    from .triage_models import (
+        IssueLabel,
+        PrototypeMatch,
+        TriageExample,
+        TriagePrototype,
+        TriageResult,
+        TriageSignal,
+    )
+except ImportError:
+    from triage_catalog import build_examples, build_prototypes
+    from triage_models import (
+        IssueLabel,
+        PrototypeMatch,
+        TriageExample,
+        TriagePrototype,
+        TriageResult,
+        TriageSignal,
+    )
+MODEL_ID = os.getenv("TRIAGE_MODEL_ID", "huggingface/CodeBERTa-small-v1")
+MODEL_MAX_LENGTH = int(os.getenv("TRIAGE_MODEL_MAX_LENGTH", "256"))
+LABELS: tuple[IssueLabel, ...] = ("syntax", "logic", "performance")
+class _LoopDepthVisitor(ast.NodeVisitor):
+    """Track the maximum loop nesting depth in a code snippet."""
+    def __init__(self) -> None:
+        self.depth = 0
+        self.max_depth = 0
+    def _visit_loop(self, node: ast.AST) -> None:
+        self.depth += 1
+        self.max_depth = max(self.max_depth, self.depth)
+        self.generic_visit(node)
+        self.depth -= 1
+    def visit_For(self, node: ast.For) -> None:  # noqa: N802
+        self._visit_loop(node)
+    def visit_While(self, node: ast.While) -> None:  # noqa: N802
+        self._visit_loop(node)
+    def visit_comprehension(self, node: ast.comprehension) -> None:  # noqa: N802
+        self._visit_loop(node)
+class HashingEmbeddingBackend:
+    """Deterministic torch-native fallback when pretrained weights are unavailable."""
+    def __init__(self, dimensions: int = 96) -> None:
+        self.dimensions = dimensions
+        self.model_id = "hashed-token-fallback"
+        self.backend_name = "hashed-token-fallback"
+        self.notes = ["Using hashed torch embeddings because pretrained weights are unavailable."]
+    def embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
+        rows = torch.zeros((len(texts), self.dimensions), dtype=torch.float32)
+        for row_index, text in enumerate(texts):
+            tokens = re.findall(r"[A-Za-z_]+|\d+|==|!=|<=|>=|\S", text.lower())[:512]
+            if not tokens:
+                rows[row_index, 0] = 1.0
+                continue
+            for token in tokens:
+                digest = hashlib.md5(token.encode("utf-8")).hexdigest()
+                bucket = int(digest[:8], 16) % self.dimensions
+                sign = -1.0 if int(digest[8:10], 16) % 2 else 1.0
+                rows[row_index, bucket] += sign
+        return F.normalize(rows + 1e-6, dim=1)
+class TransformersEmbeddingBackend:
+    """Mean-pool CodeBERTa embeddings via torch + transformers."""
+    def __init__(self, model_id: str = MODEL_ID, force_fallback: bool = False) -> None:
+        self.model_id = model_id
+        self.force_fallback = force_fallback
+        self.backend_name = model_id
+        self.notes: List[str] = []
+        self._fallback = HashingEmbeddingBackend()
+        self._tokenizer = None
+        self._model = None
+        self._load_error = ""
+        if force_fallback:
+            self.backend_name = self._fallback.backend_name
+            self.notes = list(self._fallback.notes)
+    def _ensure_loaded(self) -> None:
+        if self.force_fallback or self._model is not None or self._load_error:
+            return
+        if AutoTokenizer is None or AutoModel is None:
+            self._load_error = "transformers is not installed."
+        else:
+            try:
+                self._tokenizer = AutoTokenizer.from_pretrained(self.model_id)
+                self._model = AutoModel.from_pretrained(self.model_id)
+                self._model.eval()
+                self.notes.append(f"Loaded pretrained encoder `{self.model_id}` for inference.")
+            except Exception as exc:
+                self._load_error = f"{type(exc).__name__}: {exc}"
+        if self._load_error:
+            self.backend_name = self._fallback.backend_name
+            self.notes = list(self._fallback.notes) + [f"Pretrained load failed: {self._load_error}"]
+    def embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
+        self._ensure_loaded()
+        if self._model is None or self._tokenizer is None:
+            return self._fallback.embed_texts(texts)
+        encoded = self._tokenizer(
+            list(texts),
+            padding=True,
+            truncation=True,
+            max_length=MODEL_MAX_LENGTH,
+            return_tensors="pt",
+        )
+        with torch.no_grad():
+            outputs = self._model(**encoded)
+            hidden_state = outputs.last_hidden_state
+            mask = encoded["attention_mask"].unsqueeze(-1)
+            pooled = (hidden_state * mask).sum(dim=1) / mask.sum(dim=1).clamp(min=1)
+        return F.normalize(pooled, dim=1)
+def _sanitize_text(value: str) -> str:
+    text = (value or "").strip()
+    return text[:4000]
+def _safe_softmax(scores: dict[IssueLabel, float]) -> dict[str, float]:
+    tensor = torch.tensor([scores[label] for label in LABELS], dtype=torch.float32)
+    probabilities = torch.softmax(tensor * 4.0, dim=0)
+    return {label: round(float(probabilities[index]), 4) for index, label in enumerate(LABELS)}
+def _loop_depth(code: str) -> int:
+    try:
+        tree = ast.parse(code)
+    except SyntaxError:
+        return 0
+    visitor = _LoopDepthVisitor()
+    visitor.visit(tree)
+    return visitor.max_depth
+def _repair_risk(label: IssueLabel, confidence: float, signal_count: int) -> str:
+    base = {"syntax": 0.25, "logic": 0.55, "performance": 0.7}[label]
+    if confidence < 0.55:
+        base += 0.12
+    if signal_count >= 4:
+        base += 0.08
+    if base < 0.4:
+        return "low"
+    if base < 0.72:
+        return "medium"
+    return "high"
+class CodeTriageEngine:
+    """Combine static signals with PyTorch embeddings to classify code issues."""
+    def __init__(
+        self,
+        *,
+        backend: TransformersEmbeddingBackend | HashingEmbeddingBackend | None = None,
+        prototypes: Sequence[TriagePrototype] | None = None,
+        examples: Sequence[TriageExample] | None = None,
+    ) -> None:
+        self.backend = backend or TransformersEmbeddingBackend()
+        self.prototypes = list(prototypes or build_prototypes())
+        self.examples = list(examples or build_examples())
+        self._prototype_matrix: torch.Tensor | None = None
+    def example_map(self) -> dict[str, TriageExample]:
+        """Return UI examples keyed by task id."""
+        return {example.key: example for example in self.examples}
+    def _build_document(self, code: str, traceback_text: str) -> str:
+        trace = _sanitize_text(traceback_text) or "No traceback supplied."
+        snippet = _sanitize_text(code) or "# No code supplied."
+        return f"Candidate code:\n{snippet}\n\nObserved failure:\n{trace}\n"
+    def _prototype_embeddings(self) -> torch.Tensor:
+        if self._prototype_matrix is None:
+            reference_texts = [prototype.reference_text for prototype in self.prototypes]
+            self._prototype_matrix = self.backend.embed_texts(reference_texts)
+        return self._prototype_matrix
+    def _extract_signals(self, code: str, traceback_text: str) -> tuple[list[TriageSignal], dict[IssueLabel, float], list[str]]:
+        trace = (traceback_text or "").lower()
+        heuristic_scores: dict[IssueLabel, float] = {label: 0.15 for label in LABELS}
+        signals: list[TriageSignal] = []
+        notes: list[str] = []
+        try:
+            ast.parse(code)
+            signals.append(
+                TriageSignal(
+                    name="syntax_parse",
+                    value="passes",
+                    impact="syntax",
+                    weight=0.1,
+                    evidence="Python AST parsing succeeded.",
+                )
+            )
+            heuristic_scores["logic"] += 0.05
+        except SyntaxError as exc:
+            evidence = f"{exc.msg} at line {exc.lineno}"
+            signals.append(
+                TriageSignal(
+                    name="syntax_parse",
+                    value="fails",
+                    impact="syntax",
+                    weight=0.95,
+                    evidence=evidence,
+                )
+            )
+            heuristic_scores["syntax"] += 0.85
+            notes.append(f"Parser failure detected: {evidence}")
+        if any(token in trace for token in ("syntaxerror", "indentationerror", "expected ':'")):
+            signals.append(
+                TriageSignal(
+                    name="traceback_keyword",
+                    value="syntaxerror",
+                    impact="syntax",
+                    weight=0.8,
+                    evidence="Traceback contains a parser error.",
+                )
+            )
+            heuristic_scores["syntax"] += 0.55
+        if any(token in trace for token in ("assertionerror", "expected:", "actual:", "boundary", "missing", "incorrect")):
+            signals.append(
+                TriageSignal(
+                    name="test_failure_signal",
+                    value="assertion-style failure",
+                    impact="logic",
+                    weight=0.7,
+                    evidence="Failure text points to behavioral mismatch instead of parser issues.",
+                )
+            )
+            heuristic_scores["logic"] += 0.55
+        if any(token in trace for token in ("timeout", "benchmark", "slow", "latency", "performance", "profiler")):
+            signals.append(
+                TriageSignal(
+                    name="performance_trace",
+                    value="latency regression",
+                    impact="performance",
+                    weight=0.85,
+                    evidence="Traceback mentions benchmark or latency pressure.",
+                )
+            )
+            heuristic_scores["performance"] += 0.7
+        loop_depth = _loop_depth(code)
+        if loop_depth >= 2:
+            signals.append(
+                TriageSignal(
+                    name="loop_depth",
+                    value=str(loop_depth),
+                    impact="performance",
+                    weight=0.65,
+                    evidence="Nested iteration increases runtime risk on larger fixtures.",
+                )
+            )
+            heuristic_scores["performance"] += 0.35
+        if "Counter(" in code or "defaultdict(" in code or "set(" in code:
+            heuristic_scores["performance"] += 0.05
+        if "return sessions" in code and "sessions.append" not in code:
+            signals.append(
+                TriageSignal(
+                    name="state_update_gap",
+                    value="possible missing final append",
+                    impact="logic",
+                    weight=0.45,
+                    evidence="A collection is returned without an obvious final state flush.",
+                )
+            )
+            heuristic_scores["logic"] += 0.18
+        return signals, heuristic_scores, notes
+    def _nearest_match(self, embedding: torch.Tensor) -> tuple[TriagePrototype, float, dict[str, float]]:
+        similarities = torch.matmul(embedding, self._prototype_embeddings().T)[0]
+        indexed_scores = {
+            self.prototypes[index].task_id: round(float((similarities[index] + 1.0) / 2.0), 4)
+            for index in range(len(self.prototypes))
+        }
+        best_index = int(torch.argmax(similarities).item())
+        best_prototype = self.prototypes[best_index]
+        best_similarity = float((similarities[best_index] + 1.0) / 2.0)
+        return best_prototype, best_similarity, indexed_scores
+    def _repair_plan(self, label: IssueLabel, matched: TriagePrototype) -> list[str]:
+        plans = {
+            "syntax": [
+                "Patch the parser break first: missing colon, bracket, or indentation before changing logic.",
+                f"Realign the implementation with the known-good pattern from `{matched.title}`.",
+                "Re-run the visible checks once the file compiles, then verify hidden edge cases.",
+            ],
+            "logic": [
+                "Reproduce the failing assertion with the smallest public example and inspect state transitions.",
+                f"Compare boundary handling against the known issue pattern `{matched.title}`.",
+                "Patch the final state update or branch condition, then rerun correctness checks before submission.",
+            ],
+            "performance": [
+                "Profile the hot path and isolate repeated full-list scans or nested loops.",
+                f"Refactor toward counting or indexing strategies similar to `{matched.title}`.",
+                "Benchmark the new implementation on a production-like fixture and confirm output stability.",
+            ],
+        }
+        return plans[label]
+    def triage(self, code: str, traceback_text: str = "") -> TriageResult:
+        """Run the full triage pipeline on code plus optional failure context."""
+        started = time.perf_counter()
+        document = self._build_document(code, traceback_text)
+        signals, heuristic_scores, notes = self._extract_signals(code, traceback_text)
+        candidate_embedding = self.backend.embed_texts([document])
+        matched, matched_similarity, prototype_scores = self._nearest_match(candidate_embedding)
+        label_similarity = {label: 0.18 for label in LABELS}
+        for prototype in self.prototypes:
+            label_similarity[prototype.label] = max(
+                label_similarity[prototype.label],
+                prototype_scores[prototype.task_id],
+            )
+        combined_scores = {
+            label: 0.72 * label_similarity[label] + 0.28 * heuristic_scores[label]
+            for label in LABELS
+        }
+        confidence_scores = _safe_softmax(combined_scores)
+        issue_label = max(LABELS, key=lambda label: confidence_scores[label])
+        top_confidence = confidence_scores[issue_label]
+        top_signal = signals[0].evidence if signals else "Model similarity dominated the decision."
+        summary = (
+            f"Detected a {issue_label} issue with {top_confidence:.0%} confidence. "
+            f"The closest known failure pattern is `{matched.title}`, which indicates {matched.summary.lower()}"
+        )
+        suggested_next_action = {
+            "syntax": "Fix the parser error first, then rerun validation before changing behavior.",
+            "logic": "Step through the smallest failing case and confirm the final branch/update behavior.",
+            "performance": "Replace repeated full-list scans with a linear-time aggregation strategy, then benchmark it.",
+        }[issue_label]
+        return TriageResult(
+            issue_label=issue_label,
+            confidence_scores=confidence_scores,
+            repair_risk=_repair_risk(issue_label, top_confidence, len(signals)),
+            summary=summary,
+            matched_pattern=PrototypeMatch(
+                task_id=matched.task_id,
+                title=matched.title,
+                label=matched.label,
+                similarity=round(matched_similarity, 4),
+                summary=matched.summary,
+                rationale=top_signal,
+            ),
+            repair_plan=self._repair_plan(issue_label, matched),
+            suggested_next_action=suggested_next_action,
+            extracted_signals=signals,
+            model_backend=self.backend.backend_name,
+            model_id=self.backend.model_id,
+            inference_notes=list(self.backend.notes) + notes,
+            analysis_time_ms=round((time.perf_counter() - started) * 1000.0, 2),
+        )
+@lru_cache(maxsize=1)
+def get_default_engine() -> CodeTriageEngine:
+    """Return a cached triage engine for the running process."""
+    return CodeTriageEngine()

triage_catalog.py ADDED Viewed

	@@ -0,0 +1,117 @@

+"""Curated prototypes and example inputs for TorchReview Copilot."""
+from __future__ import annotations
+from typing import Dict, List
+try:
+    from .triage_models import IssueLabel, TriageExample, TriagePrototype
+    from .tasks import list_tasks
+except ImportError:
+    from triage_models import IssueLabel, TriageExample, TriagePrototype
+    from tasks import list_tasks
+TASK_KIND_TO_LABEL: Dict[str, IssueLabel] = {
+    "syntax_fix": "syntax",
+    "bug_fix": "logic",
+    "optimization": "performance",
+}
+TRACEBACK_BY_TASK_ID: Dict[str, str] = {
+    "syntax_fix_invoice_totals": (
+        "Traceback (most recent call last):\n"
+        "  File \"services/billing/reconciliation.py\", line 3\n"
+        "    for record in records\n"
+        "                      ^\n"
+        "SyntaxError: expected ':'"
+    ),
+    "bug_fix_session_windows": (
+        "AssertionError: collapse_sessions([{'minute': 1}, {'minute': 3}, {'minute': 8}], 4)\n"
+        "Expected: [(1, 3), (8, 8)]\n"
+        "Actual:   [(1, 8)]\n"
+        "Boundary handling merges the final session instead of starting a new one."
+    ),
+    "optimization_rank_active_users": (
+        "BenchmarkWarning: rank_active_users exceeded the 450ms budget on a nightly export fixture.\n"
+        "Profiler hint: repeated scans over the full event list and nested loops dominate runtime."
+    ),
+}
+SUMMARY_BY_TASK_ID: Dict[str, str] = {
+    "syntax_fix_invoice_totals": "Broken parser state in a billing helper blocks reconciliation jobs.",
+    "bug_fix_session_windows": "Session-boundary logic fails on inclusive idle-timeout edges.",
+    "optimization_rank_active_users": "A nightly ranking job is correct on small fixtures but too slow at production scale.",
+}
+def _prototype_text(
+    task_id: str,
+    title: str,
+    description: str,
+    repo_summary: str,
+    goal: str,
+    visible_tests: List[str],
+    starter_code: str,
+    traceback_text: str,
+) -> str:
+    visible = "\n".join(f"- {item}" for item in visible_tests) or "- none"
+    return (
+        f"Title: {title}\n"
+        f"Problem: {description}\n"
+        f"Repo context: {repo_summary}\n"
+        f"Goal: {goal}\n"
+        f"Observed failure:\n{traceback_text}\n"
+        f"Visible checks:\n{visible}\n"
+        f"Candidate code:\n{starter_code}\n"
+        f"Task id: {task_id}\n"
+    )
+def build_examples() -> List[TriageExample]:
+    """Create stable UI examples from the task catalog."""
+    examples: List[TriageExample] = []
+    for task in list_tasks():
+        label = TASK_KIND_TO_LABEL[task.task_kind]
+        examples.append(
+            TriageExample(
+                key=task.task_id,
+                title=task.title,
+                label=label,
+                summary=SUMMARY_BY_TASK_ID[task.task_id],
+                code=task.starter_code,
+                traceback_text=TRACEBACK_BY_TASK_ID[task.task_id],
+                task_id=task.task_id,
+            )
+        )
+    return examples
+def build_prototypes() -> List[TriagePrototype]:
+    """Build canonical triage prototypes from the OpenEnv tasks."""
+    prototypes: List[TriagePrototype] = []
+    for task in list_tasks():
+        traceback_text = TRACEBACK_BY_TASK_ID[task.task_id]
+        prototypes.append(
+            TriagePrototype(
+                task_id=task.task_id,
+                title=task.title,
+                label=TASK_KIND_TO_LABEL[task.task_kind],
+                summary=SUMMARY_BY_TASK_ID[task.task_id],
+                reference_text=_prototype_text(
+                    task.task_id,
+                    task.title,
+                    task.task_description,
+                    task.repo_summary,
+                    task.goal,
+                    list(task.visible_tests),
+                    task.reference_code,
+                    traceback_text,
+                ),
+                starter_code=task.starter_code,
+                traceback_text=traceback_text,
+            )
+        )
+    return prototypes

triage_models.py ADDED Viewed

	@@ -0,0 +1,73 @@

+"""Typed models for TorchReview Copilot outputs and examples."""
+from __future__ import annotations
+from typing import Dict, List, Literal
+from pydantic import BaseModel, Field
+IssueLabel = Literal["syntax", "logic", "performance"]
+RiskLevel = Literal["low", "medium", "high"]
+class TriageSignal(BaseModel):
+    """One extracted signal used during issue classification."""
+    name: str
+    value: str
+    impact: Literal["syntax", "logic", "performance", "mixed"] = "mixed"
+    weight: float = Field(..., ge=0.0, le=1.0)
+    evidence: str = ""
+class PrototypeMatch(BaseModel):
+    """Nearest known bug pattern from the built-in task catalog."""
+    task_id: str
+    title: str
+    label: IssueLabel
+    similarity: float = Field(..., ge=0.0, le=1.0)
+    summary: str
+    rationale: str
+class TriageExample(BaseModel):
+    """Example payload exposed in the demo UI."""
+    key: str
+    title: str
+    label: IssueLabel
+    summary: str
+    code: str
+    traceback_text: str
+    task_id: str
+class TriagePrototype(BaseModel):
+    """Canonical issue-pattern representation embedded by the triage engine."""
+    task_id: str
+    title: str
+    label: IssueLabel
+    summary: str
+    reference_text: str
+    starter_code: str
+    traceback_text: str
+class TriageResult(BaseModel):
+    """Structured output produced by the triage pipeline."""
+    issue_label: IssueLabel
+    confidence_scores: Dict[str, float]
+    repair_risk: RiskLevel
+    summary: str
+    matched_pattern: PrototypeMatch
+    repair_plan: List[str]
+    suggested_next_action: str
+    extracted_signals: List[TriageSignal] = Field(default_factory=list)
+    model_backend: str
+    model_id: str
+    inference_notes: List[str] = Field(default_factory=list)
+    analysis_time_ms: float = Field(..., ge=0.0)