uvpatel7271 commited on
Commit
737f100
·
1 Parent(s): 5d806ad

added inference and triage logic and better optimization and scalable

Browse files
.dockerignore CHANGED
@@ -8,6 +8,10 @@
8
  !models.py
9
  !openenv.yaml
10
  !pyproject.toml
 
 
 
 
11
  !server/
12
  !server/**
13
  !tasks/
 
8
  !models.py
9
  !openenv.yaml
10
  !pyproject.toml
11
+ !DEMO_SCRIPT.md
12
+ !triage.py
13
+ !triage_catalog.py
14
+ !triage_models.py
15
  !server/
16
  !server/**
17
  !tasks/
DEMO_SCRIPT.md ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TorchReview Copilot Demo Script
2
+
3
+ ## 60-90 Second Walkthrough
4
+
5
+ 1. Open the Hugging Face Space and introduce TorchReview Copilot as an AI-powered Python triage assistant built with PyTorch.
6
+ 2. Point to the single-sentence problem statement: teams lose time figuring out whether a failure is syntax, logic, or performance related.
7
+ 3. Select the `Fix the invoice total syntax regression` example to show the app loading a real broken code sample.
8
+ 4. Highlight the **Live Triage Radar** updating immediately, then call out the predicted issue class and repair risk.
9
+ 5. Explain that the PyTorch layer uses CodeBERTa embeddings to compare the input against known bug patterns from the OpenEnv task catalog.
10
+ 6. Scroll to the repair plan and note that the output is not just a label; it gives a prioritized remediation checklist and the nearest known failure pattern.
11
+ 7. Switch to the performance example to show the confidence profile change and emphasize that the system can distinguish runtime bottlenecks from correctness bugs.
12
+ 8. Close by noting that OpenEnv still powers deterministic validation under the hood, so the demo stays grounded in measurable task outcomes.
README.md CHANGED
@@ -1,189 +1,225 @@
1
  ---
2
- title: Python Code Review Environment
3
- emoji: snake
4
- colorFrom: yellow
5
- colorTo: blue
6
  sdk: docker
7
  pinned: false
8
  app_port: 8000
9
  tags:
 
 
 
10
  - openenv
11
  - code-review
12
- - python
13
  ---
14
 
15
- # python_code_review_env
16
 
17
- `python_code_review_env` is a production-style OpenEnv environment that simulates a realistic Python code review workflow. An agent inspects broken code, edits it, runs tests, and submits a final solution against deterministic graders for syntax repair, bug fixing, and optimization/refactoring.
18
 
19
- ## Environment design
20
 
21
- - `Observation` includes task instructions, current code, syntax errors, public test output, action history, and remaining attempts.
22
- - `Action` is structured as `analyze_code`, `edit_code`, `run_tests`, or `submit_solution`.
23
- - `Reward` is shaped and non-binary. The environment awards syntax progress, test progress, correctness, and quality improvements while penalizing invalid actions, timeouts, regressions, and unchanged edits.
24
- - `State` exposes the internal episode snapshot through `/state`.
25
 
26
- ## Task set
27
 
28
- 1. `syntax_fix_invoice_totals` (easy)
29
- Fix a syntax regression in an invoice normalization helper.
30
- 2. `bug_fix_session_windows` (medium)
31
- Repair a session-collapsing bug using deterministic public and hidden tests.
32
- 3. `optimization_rank_active_users` (hard)
33
- Refactor a slow ranking function and earn additional score from runtime improvement plus AST/style quality.
34
 
35
- ## Action schema
 
 
36
 
37
- ```json
38
- {
39
- "action_type": "edit_code",
40
- "code": "def function(...):\n ..."
41
- }
42
- ```
43
 
44
- Supported `action_type` values:
45
 
46
- - `analyze_code`
47
- - `edit_code`
48
- - `run_tests`
49
- - `submit_solution`
50
 
51
- ## Observation schema
 
 
 
 
52
 
53
- ```json
54
- {
55
- "task_description": "...",
56
- "current_code": "...",
57
- "errors": "...",
58
- "test_results": "...",
59
- "history": []
60
- }
61
- ```
62
 
63
- The full observation also includes `task_id`, `difficulty`, `task_kind`, `visible_tests`, `attempts_remaining`, `score`, `last_action_status`, `reward`, `done`, and a structured `reward_details` breakdown.
64
 
65
- ## Deterministic grading
66
 
67
- - Syntax tasks use `compile()` plus hidden behavioral checks.
68
- - Bug-fix tasks use deterministic function-call cases that behave like pytest assertions.
69
- - Optimization tasks combine correctness, runtime benchmarking, and AST/style quality scoring.
70
- - Infinite loops and long-running solutions are sandboxed with subprocess timeouts and receive penalties.
71
- - All scores are clamped to `[0.0, 1.0]`.
72
 
73
- ## Run locally
74
 
75
- Install dependencies:
76
 
77
- ```bash
78
- pip install .
79
- ```
80
 
81
- Start the API server:
82
 
83
- ```bash
84
- uvicorn server.app:app --host 0.0.0.0 --port 8000
85
- ```
86
 
87
- Smoke-test the environment:
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
- ```bash
90
- curl http://localhost:8000/health
91
- curl http://localhost:8000/state
92
- ```
93
 
94
- OpenEnv validation:
95
 
96
- ```bash
97
- openenv validate
98
- ```
99
 
100
- ## Docker build
101
 
102
- The Docker image no longer depends on `ghcr.io/meta-pytorch/openenv-base:latest`, which removes the TLS handshake failure from the original build path.
103
 
104
- ```bash
105
- # Run from repo root
106
- docker build -t python-code-review-env -f server/Dockerfile .
107
- docker run --rm -p 8000:8000 python-code-review-env
108
- ```
 
109
 
110
- If you run the build from inside `server/`, you must point the context at the repo root:
111
 
112
- ```bash
113
- docker build -t python-code-review-env -f Dockerfile ..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  ```
115
 
116
- Expected health check:
 
 
117
 
118
  ```bash
119
- curl http://localhost:8000/health
120
  ```
121
 
122
- ## Hugging Face Spaces deployment
123
 
124
- 1. Create a Docker Space.
125
- 2. Push this repository content to the Space.
126
- 3. Ensure port `8000` is exposed.
127
- 4. Wait for the container to build.
128
- 5. Verify `/reset` and `/health` return `200`.
129
-
130
- The image is CPU-friendly and designed for a small Hugging Face Space such as `2 vCPU / 8 GB RAM`.
131
 
132
- ## Inference baseline
133
 
134
- `inference.py` uses an OpenAI-compatible client:
135
 
136
- ```python
137
- client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
138
  ```
139
 
140
- Supported providers include:
141
 
142
- - Gemini through an OpenAI-compatible gateway
143
- - OpenRouter
144
- - Together AI
145
- - DeepSeek-compatible OpenAI endpoints
146
 
147
- Run it with a free/open provider:
148
 
149
  ```bash
150
- set API_BASE_URL=https://openrouter.ai/api/v1
151
- set API_KEY=...
152
- set MODEL=deepseek/deepseek-chat-v3-0324:free
153
- python inference.py
154
  ```
155
 
156
- If no credentials are supplied, the script falls back to a deterministic smoke-test policy that applies the reference fix for each task so the environment can still be validated end to end.
157
 
158
- Example output:
159
-
160
- ```text
161
- Task 1 Score: 1.0
162
- Task 2 Score: 1.0
163
- Task 3 Score: 0.9
164
- Final Score: 1.0
165
  ```
166
 
167
- ## Project structure
168
 
169
  ```text
170
  python_env/
171
  ├── client.py
172
  ├── graders/
173
- │ ├── bug_fix.py
174
- │ ├── dispatch.py
175
- │ ├── optimization.py
176
- │ ├── shared.py
177
- │ └── syntax.py
178
- ├── inference.py
179
- ├── models.py
180
- ├── openenv.yaml
181
- ├── README.md
182
  ├── server/
183
  │ ├── app.py
184
- │ ├── Dockerfile
185
- ── env.py
186
- │ └── python_env_environment.py
187
- ── tasks/
188
- ── catalog.py
 
 
 
189
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: TorchReview Copilot
3
+ emoji: torch
4
+ colorFrom: orange
5
+ colorTo: red
6
  sdk: docker
7
  pinned: false
8
  app_port: 8000
9
  tags:
10
+ - pytorch
11
+ - gradio
12
+ - fastapi
13
  - openenv
14
  - code-review
 
15
  ---
16
 
17
+ # TorchReview Copilot
18
 
19
+ TorchReview Copilot is an **AI-powered Python code triage system using PyTorch** to classify issue type, estimate repair risk, and generate an actionable remediation plan from broken code plus failure output.
20
 
21
+ It upgrades the original OpenEnv hackathon environment into a judge-friendly product demo: a polished Hugging Face Space on top, with the deterministic OpenEnv validation engine still preserved underneath.
22
 
23
+ **Live demo:** [Hugging Face Space](https://huggingface.co/spaces/uvpatel7271/final-python-env)
24
+ **Repository:** [uvpatel/final-python-env](https://github.com/uvpatel/final-python-env)
 
 
25
 
26
+ ## Problem Statement
27
 
28
+ Engineering teams lose time during incident response and code review because broken Python snippets often arrive with noisy traces, partial test output, and unclear ownership. Before fixing anything, someone still has to answer:
 
 
 
 
 
29
 
30
+ - Is this a syntax issue, a logic bug, or a performance regression?
31
+ - How risky is the repair?
32
+ - What should be checked first?
33
 
34
+ That triage step is repetitive, error-prone, and often slows down the actual fix.
 
 
 
 
 
35
 
36
+ ## Solution
37
 
38
+ TorchReview Copilot turns code plus traceback text into a practical triage report:
 
 
 
39
 
40
+ - **Issue classification:** syntax, logic, or performance
41
+ - **Repair risk:** low, medium, or high
42
+ - **Live Triage Radar:** confidence visualization for all issue classes
43
+ - **Nearest known pattern:** the closest OpenEnv task match
44
+ - **Fix plan:** prioritized remediation steps for the engineer
45
 
46
+ The result is a demo that feels like a real AI debugging assistant rather than a backend-only environment.
 
 
 
 
 
 
 
 
47
 
48
+ ## Why PyTorch Matters
49
 
50
+ This project uses **PyTorch for real inference**, not placeholder branching:
51
 
52
+ - `transformers` + `torch` load `huggingface/CodeBERTa-small-v1`
53
+ - the model encodes code snippets and failure context into embeddings
54
+ - embeddings are compared against curated OpenEnv issue prototypes
55
+ - the final decision blends model similarity with lightweight static analysis signals
 
56
 
57
+ That gives the demo an actual model-backed classification path while keeping it CPU-friendly for Hugging Face Spaces.
58
 
59
+ ## How It Works
60
 
61
+ ### Pipeline
 
 
62
 
63
+ `Input code + traceback -> static checks -> PyTorch embeddings -> similarity against issue prototypes -> confidence scores -> repair plan`
64
 
65
+ ### Detailed Flow
 
 
66
 
67
+ 1. The user pastes Python code and optional traceback or benchmark output.
68
+ 2. TorchReview extracts lightweight static signals:
69
+ - parser success/failure
70
+ - assertion-style test language
71
+ - performance keywords
72
+ - nested-loop depth
73
+ 3. CodeBERTa runs through PyTorch to embed the combined input.
74
+ 4. The embedding is compared against built-in issue prototypes derived from the OpenEnv task catalog.
75
+ 5. The UI returns:
76
+ - top issue label
77
+ - confidence radar
78
+ - repair risk
79
+ - nearest known bug pattern
80
+ - suggested next action
81
 
82
+ ## Built-In Demo Scenarios
 
 
 
83
 
84
+ The app ships with three grounded examples reused from the OpenEnv tasks:
85
 
86
+ 1. **Syntax regression:** broken invoice normalization helper
87
+ 2. **Logic bug:** session window boundary failure
88
+ 3. **Performance bottleneck:** slow active-user ranking pipeline
89
 
90
+ These examples make the classification differences obvious during judging and video demos.
91
 
92
+ ## Tech Stack
93
 
94
+ - **PyTorch** for embedding inference
95
+ - **Transformers** for `CodeBERTa-small-v1`
96
+ - **Gradio** for the polished Hugging Face Space UI
97
+ - **FastAPI** for the app server
98
+ - **OpenEnv** for deterministic validation endpoints and environment compatibility
99
+ - **Pydantic** for typed schemas
100
 
101
+ ## Hugging Face Space UX
102
 
103
+ The root app now presents a production-style triage experience:
104
+
105
+ - a clear problem/solution hero section
106
+ - example scenario selector
107
+ - code and traceback inputs
108
+ - **Live Triage Radar**
109
+ - structured fix plan
110
+ - visible model/backend notes
111
+
112
+ The underlying OpenEnv endpoints remain available for compatibility and evaluation.
113
+
114
+ ## Screenshots
115
+
116
+ Add screenshots after deployment:
117
+
118
+ - `docs/screenshots/home.png` -> hero + inputs
119
+ - `docs/screenshots/triage-radar.png` -> confidence visualization
120
+ - `docs/screenshots/fix-plan.png` -> structured output panel
121
+
122
+ Suggested markdown once captured:
123
+
124
+ ```md
125
+ ![TorchReview Copilot Home](docs/screenshots/home.png)
126
+ ![Live Triage Radar](docs/screenshots/triage-radar.png)
127
+ ![Fix Plan Output](docs/screenshots/fix-plan.png)
128
  ```
129
 
130
+ ## Local Setup
131
+
132
+ ### 1. Install dependencies
133
 
134
  ```bash
135
+ pip install .
136
  ```
137
 
138
+ ### 2. Run the application
139
 
140
+ ```bash
141
+ uvicorn server.app:app --host 0.0.0.0 --port 8000
142
+ ```
 
 
 
 
143
 
144
+ ### 3. Open the demo
145
 
146
+ Visit:
147
 
148
+ ```text
149
+ http://localhost:8000/
150
  ```
151
 
152
+ ### 4. Verify OpenEnv compatibility
153
 
154
+ ```bash
155
+ curl http://localhost:8000/health
156
+ curl http://localhost:8000/state
157
+ ```
158
 
159
+ ## Docker
160
 
161
  ```bash
162
+ docker build -t torchreview-copilot -f server/Dockerfile .
163
+ docker run --rm -p 8000:8000 torchreview-copilot
 
 
164
  ```
165
 
166
+ Expected checks:
167
 
168
+ ```bash
169
+ curl http://localhost:8000/
170
+ curl http://localhost:8000/health
 
 
 
 
171
  ```
172
 
173
+ ## Project Structure
174
 
175
  ```text
176
  python_env/
177
  ├── client.py
178
  ├── graders/
 
 
 
 
 
 
 
 
 
179
  ├── server/
180
  │ ├── app.py
181
+ │ ├── demo.py
182
+ ── env.py
183
+ ── tasks/
184
+ ── triage.py
185
+ ── triage_catalog.py
186
+ ├── triage_models.py
187
+ ├── inference.py
188
+ └── tests/
189
  ```
190
+
191
+ ## OpenEnv Compatibility
192
+
193
+ The hackathon backend is still present:
194
+
195
+ - deterministic task grading
196
+ - structured action/observation/state models
197
+ - `/health`, `/state`, `/reset`, `/step`, and related environment routes
198
+
199
+ This means the product demo is not detached from evaluation; it is layered on top of the original OpenEnv system.
200
+
201
+ ## Demo Script
202
+
203
+ See [DEMO_SCRIPT.md](DEMO_SCRIPT.md) for the 60-90 second recording flow.
204
+
205
+ Short version:
206
+
207
+ 1. Open the Space and introduce the problem.
208
+ 2. Load the syntax example.
209
+ 3. Show the Live Triage Radar and issue label.
210
+ 4. Explain the PyTorch embedding step.
211
+ 5. Show the matched pattern and fix plan.
212
+ 6. Switch to the performance example to prove the model distinguishes issue classes.
213
+
214
+ ## Limitations
215
+
216
+ - The classifier uses pretrained embeddings plus prototype similarity, not a custom fine-tuned model.
217
+ - First model load may take longer on a cold Hugging Face Space.
218
+ - The current demo focuses on short Python snippets rather than full multi-file repositories.
219
+
220
+ ## Future Work
221
+
222
+ - fine-tune the PyTorch classifier on a larger bug triage dataset
223
+ - add repository-level file context and diff-aware analysis
224
+ - include automated patch suggestions after triage
225
+ - track remediation outcomes as a feedback loop for future ranking improvements
__init__.py CHANGED
@@ -9,6 +9,8 @@ from .models import (
9
  PythonObservation,
10
  PythonState,
11
  )
 
 
12
 
13
  __all__ = [
14
  "PythonAction",
@@ -19,4 +21,9 @@ __all__ = [
19
  "PythonCodeReviewState",
20
  "PythonCodeReviewEnv",
21
  "PythonEnv",
 
 
 
 
 
22
  ]
 
9
  PythonObservation,
10
  PythonState,
11
  )
12
+ from .triage import CodeTriageEngine, HashingEmbeddingBackend, TransformersEmbeddingBackend, get_default_engine
13
+ from .triage_models import TriageResult
14
 
15
  __all__ = [
16
  "PythonAction",
 
21
  "PythonCodeReviewState",
22
  "PythonCodeReviewEnv",
23
  "PythonEnv",
24
+ "CodeTriageEngine",
25
+ "HashingEmbeddingBackend",
26
+ "TransformersEmbeddingBackend",
27
+ "TriageResult",
28
+ "get_default_engine",
29
  ]
pyproject.toml CHANGED
@@ -5,14 +5,17 @@ build-backend = "setuptools.build_meta"
5
  [project]
6
  name = "openenv-python-code-review-env"
7
  version = "1.0.0"
8
- description = "Production-grade OpenEnv environment for Python code review workflows."
9
  readme = "README.md"
10
  requires-python = ">=3.10"
11
  dependencies = [
12
  "fastapi>=0.111.0",
 
13
  "openai>=1.76.0",
14
  "openenv-core[core]>=0.2.2",
15
  "pytest>=8.0.0",
 
 
16
  "uvicorn>=0.30.0",
17
  ]
18
 
 
5
  [project]
6
  name = "openenv-python-code-review-env"
7
  version = "1.0.0"
8
+ description = "TorchReview Copilot: AI-powered Python code triage with PyTorch and OpenEnv validation."
9
  readme = "README.md"
10
  requires-python = ">=3.10"
11
  dependencies = [
12
  "fastapi>=0.111.0",
13
+ "gradio>=5.26.0",
14
  "openai>=1.76.0",
15
  "openenv-core[core]>=0.2.2",
16
  "pytest>=8.0.0",
17
+ "torch>=2.2.0",
18
+ "transformers>=4.45.0",
19
  "uvicorn>=0.30.0",
20
  ]
21
 
server/Dockerfile CHANGED
@@ -6,7 +6,7 @@ ENV PYTHONDONTWRITEBYTECODE=1 \
6
 
7
  WORKDIR /app
8
 
9
- COPY pyproject.toml README.md openenv.yaml __init__.py client.py compat.py models.py inference.py /app/
10
  COPY server /app/server
11
  COPY tasks /app/tasks
12
  COPY graders /app/graders
 
6
 
7
  WORKDIR /app
8
 
9
+ COPY pyproject.toml README.md DEMO_SCRIPT.md openenv.yaml __init__.py client.py compat.py models.py inference.py triage.py triage_catalog.py triage_models.py /app/
10
  COPY server /app/server
11
  COPY tasks /app/tasks
12
  COPY graders /app/graders
server/app.py CHANGED
@@ -1,4 +1,4 @@
1
- """FastAPI entrypoint for python_code_review_env."""
2
 
3
  from __future__ import annotations
4
 
@@ -9,21 +9,37 @@ except Exception as exc: # pragma: no cover
9
  "openenv-core is required to run the API server. Install project dependencies first."
10
  ) from exc
11
 
 
 
 
 
 
12
  try:
13
  from ..models import PythonCodeReviewAction, PythonCodeReviewObservation
14
  from .env import PythonCodeReviewEnvironment
 
15
  except ImportError:
16
  from models import PythonCodeReviewAction, PythonCodeReviewObservation
17
  from server.env import PythonCodeReviewEnvironment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
 
20
- app = create_app(
21
- PythonCodeReviewEnvironment,
22
- PythonCodeReviewAction,
23
- PythonCodeReviewObservation,
24
- env_name="python_code_review_env",
25
- max_concurrent_envs=4,
26
- )
27
 
28
 
29
  def main(host: str = "0.0.0.0", port: int = 8000) -> None:
 
1
+ """FastAPI + Gradio entrypoint for TorchReview Copilot."""
2
 
3
  from __future__ import annotations
4
 
 
9
  "openenv-core is required to run the API server. Install project dependencies first."
10
  ) from exc
11
 
12
+ try:
13
+ import gradio as gr
14
+ except Exception:
15
+ gr = None # type: ignore[assignment]
16
+
17
  try:
18
  from ..models import PythonCodeReviewAction, PythonCodeReviewObservation
19
  from .env import PythonCodeReviewEnvironment
20
+ from .demo import build_demo
21
  except ImportError:
22
  from models import PythonCodeReviewAction, PythonCodeReviewObservation
23
  from server.env import PythonCodeReviewEnvironment
24
+ from server.demo import build_demo
25
+
26
+
27
+ def build_application():
28
+ """Compose the OpenEnv API with the Gradio demo frontend."""
29
+
30
+ api_app = create_app(
31
+ PythonCodeReviewEnvironment,
32
+ PythonCodeReviewAction,
33
+ PythonCodeReviewObservation,
34
+ env_name="python_code_review_env",
35
+ max_concurrent_envs=4,
36
+ )
37
+ if gr is None:
38
+ return api_app
39
+ return gr.mount_gradio_app(api_app, build_demo(), path="/")
40
 
41
 
42
+ app = build_application()
 
 
 
 
 
 
43
 
44
 
45
  def main(host: str = "0.0.0.0", port: int = 8000) -> None:
server/demo.py ADDED
@@ -0,0 +1,420 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Gradio UI for TorchReview Copilot."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from html import escape
6
+
7
+ import gradio as gr
8
+
9
+ try:
10
+ from ..triage import get_default_engine
11
+ except ImportError:
12
+ from triage import get_default_engine
13
+
14
+
15
+ CSS = """
16
+ :root {
17
+ --paper: #f6f1e8;
18
+ --ink: #162521;
19
+ --accent: #d95d39;
20
+ --panel: #fffdf8;
21
+ --border: #d6c4b8;
22
+ --muted: #5f6f67;
23
+ --good: #2d7d62;
24
+ --warn: #b76516;
25
+ --high: #b23a48;
26
+ }
27
+
28
+ body, .gradio-container {
29
+ background:
30
+ radial-gradient(circle at top left, rgba(247, 197, 159, 0.35), transparent 35%),
31
+ linear-gradient(135deg, #f9f6ef 0%, #efe5d3 100%);
32
+ color: var(--ink);
33
+ font-family: Georgia, "Times New Roman", serif;
34
+ }
35
+
36
+ .gradio-container {
37
+ max-width: 1260px !important;
38
+ }
39
+
40
+ .hero-card,
41
+ .metric-card,
42
+ .subtle-card {
43
+ background: rgba(255, 253, 248, 0.95);
44
+ border: 1px solid var(--border);
45
+ border-radius: 20px;
46
+ box-shadow: 0 16px 40px rgba(22, 37, 33, 0.08);
47
+ }
48
+
49
+ .hero-card {
50
+ padding: 28px 30px;
51
+ margin-bottom: 12px;
52
+ }
53
+
54
+ .metric-card,
55
+ .subtle-card {
56
+ padding: 20px 22px;
57
+ }
58
+
59
+ .eyebrow {
60
+ text-transform: uppercase;
61
+ letter-spacing: 0.12em;
62
+ font-size: 12px;
63
+ color: var(--accent);
64
+ margin-bottom: 10px;
65
+ }
66
+
67
+ .hero-title {
68
+ font-size: 44px;
69
+ line-height: 1.05;
70
+ margin: 0 0 10px;
71
+ }
72
+
73
+ .hero-copy {
74
+ margin: 0;
75
+ font-size: 18px;
76
+ line-height: 1.55;
77
+ color: var(--muted);
78
+ }
79
+
80
+ .summary-title {
81
+ display: flex;
82
+ justify-content: space-between;
83
+ gap: 12px;
84
+ align-items: center;
85
+ margin-bottom: 14px;
86
+ }
87
+
88
+ .pill {
89
+ display: inline-block;
90
+ padding: 6px 12px;
91
+ border-radius: 999px;
92
+ font-size: 12px;
93
+ text-transform: uppercase;
94
+ letter-spacing: 0.08em;
95
+ background: #efe5d3;
96
+ }
97
+
98
+ .pill.low { color: var(--good); }
99
+ .pill.medium { color: var(--warn); }
100
+ .pill.high { color: var(--high); }
101
+
102
+ .summary-grid {
103
+ display: grid;
104
+ grid-template-columns: repeat(2, minmax(0, 1fr));
105
+ gap: 12px;
106
+ margin-top: 16px;
107
+ }
108
+
109
+ .summary-stat {
110
+ background: #fff7ef;
111
+ border-radius: 14px;
112
+ padding: 12px 14px;
113
+ border: 1px solid rgba(214, 196, 184, 0.8);
114
+ }
115
+
116
+ .summary-stat strong {
117
+ display: block;
118
+ font-size: 12px;
119
+ text-transform: uppercase;
120
+ letter-spacing: 0.08em;
121
+ color: var(--muted);
122
+ margin-bottom: 6px;
123
+ }
124
+
125
+ .radar-wrap {
126
+ display: grid;
127
+ gap: 12px;
128
+ }
129
+
130
+ .bar {
131
+ display: grid;
132
+ gap: 6px;
133
+ }
134
+
135
+ .bar-head {
136
+ display: flex;
137
+ justify-content: space-between;
138
+ font-size: 13px;
139
+ color: var(--muted);
140
+ }
141
+
142
+ .bar-track {
143
+ width: 100%;
144
+ height: 12px;
145
+ background: #f2e5d6;
146
+ border-radius: 999px;
147
+ overflow: hidden;
148
+ }
149
+
150
+ .bar-fill {
151
+ height: 100%;
152
+ border-radius: 999px;
153
+ }
154
+
155
+ .matched-box {
156
+ background: #fff7ef;
157
+ border: 1px solid rgba(214, 196, 184, 0.8);
158
+ border-radius: 16px;
159
+ padding: 14px;
160
+ }
161
+
162
+ .how-grid {
163
+ display: grid;
164
+ grid-template-columns: repeat(4, minmax(0, 1fr));
165
+ gap: 12px;
166
+ }
167
+
168
+ .how-step {
169
+ background: rgba(255, 253, 248, 0.9);
170
+ border: 1px solid var(--border);
171
+ border-radius: 18px;
172
+ padding: 16px;
173
+ }
174
+
175
+ @media (max-width: 900px) {
176
+ .hero-title {
177
+ font-size: 34px;
178
+ }
179
+
180
+ .summary-grid,
181
+ .how-grid {
182
+ grid-template-columns: 1fr;
183
+ }
184
+ }
185
+ """
186
+
187
+
188
+ def _default_outputs() -> tuple[str, str, str, str, str]:
189
+ return (
190
+ "<div class='metric-card'><div class='eyebrow'>Awaiting Analysis</div><p class='hero-copy'>Paste Python code, add an optional traceback, or load one of the built-in examples.</p></div>",
191
+ "<div class='metric-card'><div class='eyebrow'>Live Triage Radar</div><p class='hero-copy'>Confidence bars will appear after the first analysis run.</p></div>",
192
+ "### Fix Plan\nAnalyze a sample to generate a prioritized remediation checklist.",
193
+ "### Known Pattern Match\nThe nearest OpenEnv task will be highlighted here after inference runs.",
194
+ "### Model Notes\nBackend and extracted signal details will appear here.",
195
+ )
196
+
197
+
198
+ def _summary_html(result) -> str:
199
+ issue = escape(result.issue_label.title())
200
+ summary = escape(result.summary)
201
+ next_action = escape(result.suggested_next_action)
202
+ return f"""
203
+ <div class="metric-card">
204
+ <div class="summary-title">
205
+ <div>
206
+ <div class="eyebrow">TorchReview Verdict</div>
207
+ <h3 style="margin:0;font-size:30px;">{issue} Issue</h3>
208
+ </div>
209
+ <span class="pill {escape(result.repair_risk)}">{escape(result.repair_risk)} repair risk</span>
210
+ </div>
211
+ <p class="hero-copy">{summary}</p>
212
+ <div class="summary-grid">
213
+ <div class="summary-stat">
214
+ <strong>Matched Pattern</strong>
215
+ {escape(result.matched_pattern.title)}
216
+ </div>
217
+ <div class="summary-stat">
218
+ <strong>Similarity</strong>
219
+ {result.matched_pattern.similarity:.0%}
220
+ </div>
221
+ <div class="summary-stat">
222
+ <strong>Inference Backend</strong>
223
+ {escape(result.model_backend)}
224
+ </div>
225
+ <div class="summary-stat">
226
+ <strong>Next Action</strong>
227
+ {next_action}
228
+ </div>
229
+ </div>
230
+ </div>
231
+ """
232
+
233
+
234
+ def _radar_html(result) -> str:
235
+ colors = {
236
+ "syntax": "#d95d39",
237
+ "logic": "#4f772d",
238
+ "performance": "#355070",
239
+ }
240
+ bars = []
241
+ for label, score in result.confidence_scores.items():
242
+ bars.append(
243
+ f"""
244
+ <div class="bar">
245
+ <div class="bar-head"><span>{escape(label.title())}</span><span>{score:.0%}</span></div>
246
+ <div class="bar-track">
247
+ <div class="bar-fill" style="width:{score * 100:.1f}%; background:{colors.get(label, '#d95d39')};"></div>
248
+ </div>
249
+ </div>
250
+ """
251
+ )
252
+ return f"""
253
+ <div class="metric-card radar-wrap">
254
+ <div class="eyebrow">Live Triage Radar</div>
255
+ {''.join(bars)}
256
+ <div class="matched-box">
257
+ <strong>Nearest Known Pattern:</strong> {escape(result.matched_pattern.title)}<br>
258
+ <span style="color:#5f6f67;">{escape(result.matched_pattern.summary)}</span>
259
+ </div>
260
+ </div>
261
+ """
262
+
263
+
264
+ def _plan_markdown(result) -> str:
265
+ plan_lines = "\n".join(f"{index + 1}. {step}" for index, step in enumerate(result.repair_plan))
266
+ return (
267
+ "### Fix Plan\n"
268
+ f"**Primary issue:** `{result.issue_label}`\n\n"
269
+ f"{plan_lines}\n\n"
270
+ f"**Suggested next action:** {result.suggested_next_action}"
271
+ )
272
+
273
+
274
+ def _match_markdown(result) -> str:
275
+ return (
276
+ "### Known Pattern Match\n"
277
+ f"**Task:** `{result.matched_pattern.task_id}` \n"
278
+ f"**Title:** {result.matched_pattern.title} \n"
279
+ f"**Why it matched:** {result.matched_pattern.rationale} \n"
280
+ f"**Similarity:** {result.matched_pattern.similarity:.0%}"
281
+ )
282
+
283
+
284
+ def _model_markdown(result) -> str:
285
+ signal_lines = "\n".join(
286
+ f"- `{signal.name}` -> {signal.value} ({signal.impact}, weight {signal.weight:.2f}): {signal.evidence}"
287
+ for signal in result.extracted_signals
288
+ ) or "- No strong static signals were extracted."
289
+ notes = "\n".join(f"- {item}" for item in result.inference_notes) or "- No additional backend notes."
290
+ return (
291
+ "### Model Notes\n"
292
+ f"- **Model backend:** `{result.model_backend}`\n"
293
+ f"- **Model id:** `{result.model_id}`\n"
294
+ f"- **Analysis time:** `{result.analysis_time_ms:.2f} ms`\n\n"
295
+ "### Extracted Signals\n"
296
+ f"{signal_lines}\n\n"
297
+ "### Backend Notes\n"
298
+ f"{notes}"
299
+ )
300
+
301
+
302
+ def analyze_inputs(code: str, traceback_text: str) -> tuple[str, str, str, str, str]:
303
+ """Run the triage engine and format outputs for the Gradio UI."""
304
+
305
+ result = get_default_engine().triage(code or "", traceback_text or "")
306
+ return (
307
+ _summary_html(result),
308
+ _radar_html(result),
309
+ _plan_markdown(result),
310
+ _match_markdown(result),
311
+ _model_markdown(result),
312
+ )
313
+
314
+
315
+ def load_example(example_key: str) -> tuple[str, str, str, str, str, str, str, str]:
316
+ """Populate the UI from a built-in example and immediately analyze it."""
317
+
318
+ example = get_default_engine().example_map()[example_key]
319
+ outputs = analyze_inputs(example.code, example.traceback_text)
320
+ header = (
321
+ f"### Example Scenario\n"
322
+ f"**{example.title}** \n"
323
+ f"{example.summary} \n"
324
+ f"Label target: `{example.label}`"
325
+ )
326
+ return (example.code, example.traceback_text, header, *outputs)
327
+
328
+
329
+ def build_demo() -> gr.Blocks:
330
+ """Create the TorchReview Copilot Gradio application."""
331
+
332
+ examples = get_default_engine().example_map()
333
+ first_example = next(iter(examples.values()))
334
+
335
+ with gr.Blocks(theme=gr.themes.Soft(primary_hue="orange", secondary_hue="amber"), css=CSS, title="TorchReview Copilot") as demo:
336
+ gr.HTML(
337
+ """
338
+ <div class="hero-card">
339
+ <div class="eyebrow">Meta PyTorch OpenEnv Hackathon Demo</div>
340
+ <h1 class="hero-title">TorchReview Copilot</h1>
341
+ <p class="hero-copy">
342
+ AI-powered Python code triage using PyTorch to classify issue type, estimate repair risk,
343
+ and turn messy failure output into an actionable fix plan. OpenEnv stays underneath as the deterministic validation engine.
344
+ </p>
345
+ </div>
346
+ """
347
+ )
348
+
349
+ with gr.Row():
350
+ with gr.Column(scale=6):
351
+ example_choice = gr.Radio(
352
+ choices=[(item.title, item.key) for item in examples.values()],
353
+ value=first_example.key,
354
+ label="Try a built-in failure scenario",
355
+ info="Switching examples updates the Live Triage Radar immediately.",
356
+ )
357
+ example_header = gr.Markdown()
358
+ code_input = gr.Code(
359
+ value=first_example.code,
360
+ language="python",
361
+ lines=18,
362
+ label="Python code under review",
363
+ )
364
+ traceback_input = gr.Textbox(
365
+ value=first_example.traceback_text,
366
+ lines=7,
367
+ label="Optional traceback / failing test output",
368
+ placeholder="Paste stack traces, assertion failures, or benchmark notes here.",
369
+ )
370
+ with gr.Row():
371
+ analyze_button = gr.Button("Analyze With PyTorch", variant="primary")
372
+ clear_button = gr.Button("Clear Inputs", variant="secondary")
373
+
374
+ with gr.Column(scale=5):
375
+ summary_html = gr.HTML()
376
+ radar_html = gr.HTML()
377
+ plan_markdown = gr.Markdown()
378
+ match_markdown = gr.Markdown()
379
+ model_markdown = gr.Markdown()
380
+
381
+ gr.HTML(
382
+ """
383
+ <div class="subtle-card" style="margin-top: 12px;">
384
+ <div class="eyebrow">How It Works</div>
385
+ <div class="how-grid">
386
+ <div class="how-step"><strong>Input</strong><br>Code plus optional traceback or benchmark signal.</div>
387
+ <div class="how-step"><strong>Processing</strong><br>Static checks extract parser, assertion, and runtime clues.</div>
388
+ <div class="how-step"><strong>Model</strong><br>CodeBERTa embeddings run through PyTorch and compare against known OpenEnv task patterns.</div>
389
+ <div class="how-step"><strong>Output</strong><br>Confidence radar, nearest known issue, repair risk, and a practical remediation plan.</div>
390
+ </div>
391
+ </div>
392
+ """
393
+ )
394
+
395
+ example_choice.change(
396
+ fn=load_example,
397
+ inputs=example_choice,
398
+ outputs=[code_input, traceback_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
399
+ show_progress="hidden",
400
+ )
401
+ analyze_button.click(
402
+ fn=analyze_inputs,
403
+ inputs=[code_input, traceback_input],
404
+ outputs=[summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
405
+ show_progress="minimal",
406
+ )
407
+ clear_button.click(
408
+ fn=lambda: ("", "", "### Example Scenario\nChoose a built-in example or paste custom code.", *_default_outputs()),
409
+ inputs=None,
410
+ outputs=[code_input, traceback_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
411
+ show_progress="hidden",
412
+ )
413
+ demo.load(
414
+ fn=load_example,
415
+ inputs=example_choice,
416
+ outputs=[code_input, traceback_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
417
+ show_progress="hidden",
418
+ )
419
+
420
+ return demo
server/requirements.txt CHANGED
@@ -1,5 +1,8 @@
1
  openenv-core[core]>=0.2.2
2
  fastapi>=0.111.0
 
3
  uvicorn>=0.30.0
4
  pytest>=8.0.0
5
  openai>=1.76.0
 
 
 
1
  openenv-core[core]>=0.2.2
2
  fastapi>=0.111.0
3
+ gradio>=5.26.0
4
  uvicorn>=0.30.0
5
  pytest>=8.0.0
6
  openai>=1.76.0
7
+ torch>=2.2.0
8
+ transformers>=4.45.0
tests/test_triage_pipeline.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from fastapi.testclient import TestClient
4
+
5
+ from triage import CodeTriageEngine, HashingEmbeddingBackend
6
+ from triage_catalog import build_examples
7
+
8
+
9
+ def test_hashing_backend_returns_normalized_embeddings() -> None:
10
+ backend = HashingEmbeddingBackend(dimensions=32)
11
+ embeddings = backend.embed_texts(["def foo():\n return 1", "for x in items:\n pass"])
12
+
13
+ assert embeddings.shape == (2, 32)
14
+ for row in embeddings:
15
+ assert round(float(row.norm().item()), 5) == 1.0
16
+
17
+
18
+ def test_examples_map_to_expected_labels_with_fallback_backend() -> None:
19
+ examples = build_examples()
20
+ engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
21
+
22
+ for example in examples:
23
+ result = engine.triage(example.code, example.traceback_text)
24
+ assert result.issue_label == example.label
25
+
26
+
27
+ def test_syntax_example_exposes_parser_signal() -> None:
28
+ example = next(item for item in build_examples() if item.label == "syntax")
29
+ engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
30
+
31
+ result = engine.triage(example.code, example.traceback_text)
32
+
33
+ assert any(signal.name == "syntax_parse" and signal.value == "fails" for signal in result.extracted_signals)
34
+ assert result.matched_pattern.task_id == example.task_id
35
+
36
+
37
+ def test_composed_app_preserves_health_route() -> None:
38
+ from server.app import build_application
39
+
40
+ client = TestClient(build_application())
41
+ response = client.get("/health")
42
+
43
+ assert response.status_code == 200
44
+ assert response.json()["status"] == "ok"
triage.py ADDED
@@ -0,0 +1,407 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """PyTorch-backed triage pipeline for TorchReview Copilot."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import ast
6
+ import hashlib
7
+ import os
8
+ import re
9
+ import time
10
+ from functools import lru_cache
11
+ from typing import List, Sequence
12
+
13
+ import torch
14
+ import torch.nn.functional as F
15
+
16
+ try:
17
+ from transformers import AutoModel, AutoTokenizer
18
+ except Exception:
19
+ AutoModel = None # type: ignore[assignment]
20
+ AutoTokenizer = None # type: ignore[assignment]
21
+
22
+ try:
23
+ from .triage_catalog import build_examples, build_prototypes
24
+ from .triage_models import (
25
+ IssueLabel,
26
+ PrototypeMatch,
27
+ TriageExample,
28
+ TriagePrototype,
29
+ TriageResult,
30
+ TriageSignal,
31
+ )
32
+ except ImportError:
33
+ from triage_catalog import build_examples, build_prototypes
34
+ from triage_models import (
35
+ IssueLabel,
36
+ PrototypeMatch,
37
+ TriageExample,
38
+ TriagePrototype,
39
+ TriageResult,
40
+ TriageSignal,
41
+ )
42
+
43
+
44
+ MODEL_ID = os.getenv("TRIAGE_MODEL_ID", "huggingface/CodeBERTa-small-v1")
45
+ MODEL_MAX_LENGTH = int(os.getenv("TRIAGE_MODEL_MAX_LENGTH", "256"))
46
+ LABELS: tuple[IssueLabel, ...] = ("syntax", "logic", "performance")
47
+
48
+
49
+ class _LoopDepthVisitor(ast.NodeVisitor):
50
+ """Track the maximum loop nesting depth in a code snippet."""
51
+
52
+ def __init__(self) -> None:
53
+ self.depth = 0
54
+ self.max_depth = 0
55
+
56
+ def _visit_loop(self, node: ast.AST) -> None:
57
+ self.depth += 1
58
+ self.max_depth = max(self.max_depth, self.depth)
59
+ self.generic_visit(node)
60
+ self.depth -= 1
61
+
62
+ def visit_For(self, node: ast.For) -> None: # noqa: N802
63
+ self._visit_loop(node)
64
+
65
+ def visit_While(self, node: ast.While) -> None: # noqa: N802
66
+ self._visit_loop(node)
67
+
68
+ def visit_comprehension(self, node: ast.comprehension) -> None: # noqa: N802
69
+ self._visit_loop(node)
70
+
71
+
72
+ class HashingEmbeddingBackend:
73
+ """Deterministic torch-native fallback when pretrained weights are unavailable."""
74
+
75
+ def __init__(self, dimensions: int = 96) -> None:
76
+ self.dimensions = dimensions
77
+ self.model_id = "hashed-token-fallback"
78
+ self.backend_name = "hashed-token-fallback"
79
+ self.notes = ["Using hashed torch embeddings because pretrained weights are unavailable."]
80
+
81
+ def embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
82
+ rows = torch.zeros((len(texts), self.dimensions), dtype=torch.float32)
83
+ for row_index, text in enumerate(texts):
84
+ tokens = re.findall(r"[A-Za-z_]+|\d+|==|!=|<=|>=|\S", text.lower())[:512]
85
+ if not tokens:
86
+ rows[row_index, 0] = 1.0
87
+ continue
88
+ for token in tokens:
89
+ digest = hashlib.md5(token.encode("utf-8")).hexdigest()
90
+ bucket = int(digest[:8], 16) % self.dimensions
91
+ sign = -1.0 if int(digest[8:10], 16) % 2 else 1.0
92
+ rows[row_index, bucket] += sign
93
+ return F.normalize(rows + 1e-6, dim=1)
94
+
95
+
96
+ class TransformersEmbeddingBackend:
97
+ """Mean-pool CodeBERTa embeddings via torch + transformers."""
98
+
99
+ def __init__(self, model_id: str = MODEL_ID, force_fallback: bool = False) -> None:
100
+ self.model_id = model_id
101
+ self.force_fallback = force_fallback
102
+ self.backend_name = model_id
103
+ self.notes: List[str] = []
104
+ self._fallback = HashingEmbeddingBackend()
105
+ self._tokenizer = None
106
+ self._model = None
107
+ self._load_error = ""
108
+ if force_fallback:
109
+ self.backend_name = self._fallback.backend_name
110
+ self.notes = list(self._fallback.notes)
111
+
112
+ def _ensure_loaded(self) -> None:
113
+ if self.force_fallback or self._model is not None or self._load_error:
114
+ return
115
+ if AutoTokenizer is None or AutoModel is None:
116
+ self._load_error = "transformers is not installed."
117
+ else:
118
+ try:
119
+ self._tokenizer = AutoTokenizer.from_pretrained(self.model_id)
120
+ self._model = AutoModel.from_pretrained(self.model_id)
121
+ self._model.eval()
122
+ self.notes.append(f"Loaded pretrained encoder `{self.model_id}` for inference.")
123
+ except Exception as exc:
124
+ self._load_error = f"{type(exc).__name__}: {exc}"
125
+
126
+ if self._load_error:
127
+ self.backend_name = self._fallback.backend_name
128
+ self.notes = list(self._fallback.notes) + [f"Pretrained load failed: {self._load_error}"]
129
+
130
+ def embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
131
+ self._ensure_loaded()
132
+ if self._model is None or self._tokenizer is None:
133
+ return self._fallback.embed_texts(texts)
134
+
135
+ encoded = self._tokenizer(
136
+ list(texts),
137
+ padding=True,
138
+ truncation=True,
139
+ max_length=MODEL_MAX_LENGTH,
140
+ return_tensors="pt",
141
+ )
142
+ with torch.no_grad():
143
+ outputs = self._model(**encoded)
144
+ hidden_state = outputs.last_hidden_state
145
+ mask = encoded["attention_mask"].unsqueeze(-1)
146
+ pooled = (hidden_state * mask).sum(dim=1) / mask.sum(dim=1).clamp(min=1)
147
+ return F.normalize(pooled, dim=1)
148
+
149
+
150
+ def _sanitize_text(value: str) -> str:
151
+ text = (value or "").strip()
152
+ return text[:4000]
153
+
154
+
155
+ def _safe_softmax(scores: dict[IssueLabel, float]) -> dict[str, float]:
156
+ tensor = torch.tensor([scores[label] for label in LABELS], dtype=torch.float32)
157
+ probabilities = torch.softmax(tensor * 4.0, dim=0)
158
+ return {label: round(float(probabilities[index]), 4) for index, label in enumerate(LABELS)}
159
+
160
+
161
+ def _loop_depth(code: str) -> int:
162
+ try:
163
+ tree = ast.parse(code)
164
+ except SyntaxError:
165
+ return 0
166
+ visitor = _LoopDepthVisitor()
167
+ visitor.visit(tree)
168
+ return visitor.max_depth
169
+
170
+
171
+ def _repair_risk(label: IssueLabel, confidence: float, signal_count: int) -> str:
172
+ base = {"syntax": 0.25, "logic": 0.55, "performance": 0.7}[label]
173
+ if confidence < 0.55:
174
+ base += 0.12
175
+ if signal_count >= 4:
176
+ base += 0.08
177
+ if base < 0.4:
178
+ return "low"
179
+ if base < 0.72:
180
+ return "medium"
181
+ return "high"
182
+
183
+
184
+ class CodeTriageEngine:
185
+ """Combine static signals with PyTorch embeddings to classify code issues."""
186
+
187
+ def __init__(
188
+ self,
189
+ *,
190
+ backend: TransformersEmbeddingBackend | HashingEmbeddingBackend | None = None,
191
+ prototypes: Sequence[TriagePrototype] | None = None,
192
+ examples: Sequence[TriageExample] | None = None,
193
+ ) -> None:
194
+ self.backend = backend or TransformersEmbeddingBackend()
195
+ self.prototypes = list(prototypes or build_prototypes())
196
+ self.examples = list(examples or build_examples())
197
+ self._prototype_matrix: torch.Tensor | None = None
198
+
199
+ def example_map(self) -> dict[str, TriageExample]:
200
+ """Return UI examples keyed by task id."""
201
+
202
+ return {example.key: example for example in self.examples}
203
+
204
+ def _build_document(self, code: str, traceback_text: str) -> str:
205
+ trace = _sanitize_text(traceback_text) or "No traceback supplied."
206
+ snippet = _sanitize_text(code) or "# No code supplied."
207
+ return f"Candidate code:\n{snippet}\n\nObserved failure:\n{trace}\n"
208
+
209
+ def _prototype_embeddings(self) -> torch.Tensor:
210
+ if self._prototype_matrix is None:
211
+ reference_texts = [prototype.reference_text for prototype in self.prototypes]
212
+ self._prototype_matrix = self.backend.embed_texts(reference_texts)
213
+ return self._prototype_matrix
214
+
215
+ def _extract_signals(self, code: str, traceback_text: str) -> tuple[list[TriageSignal], dict[IssueLabel, float], list[str]]:
216
+ trace = (traceback_text or "").lower()
217
+ heuristic_scores: dict[IssueLabel, float] = {label: 0.15 for label in LABELS}
218
+ signals: list[TriageSignal] = []
219
+ notes: list[str] = []
220
+
221
+ try:
222
+ ast.parse(code)
223
+ signals.append(
224
+ TriageSignal(
225
+ name="syntax_parse",
226
+ value="passes",
227
+ impact="syntax",
228
+ weight=0.1,
229
+ evidence="Python AST parsing succeeded.",
230
+ )
231
+ )
232
+ heuristic_scores["logic"] += 0.05
233
+ except SyntaxError as exc:
234
+ evidence = f"{exc.msg} at line {exc.lineno}"
235
+ signals.append(
236
+ TriageSignal(
237
+ name="syntax_parse",
238
+ value="fails",
239
+ impact="syntax",
240
+ weight=0.95,
241
+ evidence=evidence,
242
+ )
243
+ )
244
+ heuristic_scores["syntax"] += 0.85
245
+ notes.append(f"Parser failure detected: {evidence}")
246
+
247
+ if any(token in trace for token in ("syntaxerror", "indentationerror", "expected ':'")):
248
+ signals.append(
249
+ TriageSignal(
250
+ name="traceback_keyword",
251
+ value="syntaxerror",
252
+ impact="syntax",
253
+ weight=0.8,
254
+ evidence="Traceback contains a parser error.",
255
+ )
256
+ )
257
+ heuristic_scores["syntax"] += 0.55
258
+
259
+ if any(token in trace for token in ("assertionerror", "expected:", "actual:", "boundary", "missing", "incorrect")):
260
+ signals.append(
261
+ TriageSignal(
262
+ name="test_failure_signal",
263
+ value="assertion-style failure",
264
+ impact="logic",
265
+ weight=0.7,
266
+ evidence="Failure text points to behavioral mismatch instead of parser issues.",
267
+ )
268
+ )
269
+ heuristic_scores["logic"] += 0.55
270
+
271
+ if any(token in trace for token in ("timeout", "benchmark", "slow", "latency", "performance", "profiler")):
272
+ signals.append(
273
+ TriageSignal(
274
+ name="performance_trace",
275
+ value="latency regression",
276
+ impact="performance",
277
+ weight=0.85,
278
+ evidence="Traceback mentions benchmark or latency pressure.",
279
+ )
280
+ )
281
+ heuristic_scores["performance"] += 0.7
282
+
283
+ loop_depth = _loop_depth(code)
284
+ if loop_depth >= 2:
285
+ signals.append(
286
+ TriageSignal(
287
+ name="loop_depth",
288
+ value=str(loop_depth),
289
+ impact="performance",
290
+ weight=0.65,
291
+ evidence="Nested iteration increases runtime risk on larger fixtures.",
292
+ )
293
+ )
294
+ heuristic_scores["performance"] += 0.35
295
+
296
+ if "Counter(" in code or "defaultdict(" in code or "set(" in code:
297
+ heuristic_scores["performance"] += 0.05
298
+
299
+ if "return sessions" in code and "sessions.append" not in code:
300
+ signals.append(
301
+ TriageSignal(
302
+ name="state_update_gap",
303
+ value="possible missing final append",
304
+ impact="logic",
305
+ weight=0.45,
306
+ evidence="A collection is returned without an obvious final state flush.",
307
+ )
308
+ )
309
+ heuristic_scores["logic"] += 0.18
310
+
311
+ return signals, heuristic_scores, notes
312
+
313
+ def _nearest_match(self, embedding: torch.Tensor) -> tuple[TriagePrototype, float, dict[str, float]]:
314
+ similarities = torch.matmul(embedding, self._prototype_embeddings().T)[0]
315
+ indexed_scores = {
316
+ self.prototypes[index].task_id: round(float((similarities[index] + 1.0) / 2.0), 4)
317
+ for index in range(len(self.prototypes))
318
+ }
319
+ best_index = int(torch.argmax(similarities).item())
320
+ best_prototype = self.prototypes[best_index]
321
+ best_similarity = float((similarities[best_index] + 1.0) / 2.0)
322
+ return best_prototype, best_similarity, indexed_scores
323
+
324
+ def _repair_plan(self, label: IssueLabel, matched: TriagePrototype) -> list[str]:
325
+ plans = {
326
+ "syntax": [
327
+ "Patch the parser break first: missing colon, bracket, or indentation before changing logic.",
328
+ f"Realign the implementation with the known-good pattern from `{matched.title}`.",
329
+ "Re-run the visible checks once the file compiles, then verify hidden edge cases.",
330
+ ],
331
+ "logic": [
332
+ "Reproduce the failing assertion with the smallest public example and inspect state transitions.",
333
+ f"Compare boundary handling against the known issue pattern `{matched.title}`.",
334
+ "Patch the final state update or branch condition, then rerun correctness checks before submission.",
335
+ ],
336
+ "performance": [
337
+ "Profile the hot path and isolate repeated full-list scans or nested loops.",
338
+ f"Refactor toward counting or indexing strategies similar to `{matched.title}`.",
339
+ "Benchmark the new implementation on a production-like fixture and confirm output stability.",
340
+ ],
341
+ }
342
+ return plans[label]
343
+
344
+ def triage(self, code: str, traceback_text: str = "") -> TriageResult:
345
+ """Run the full triage pipeline on code plus optional failure context."""
346
+
347
+ started = time.perf_counter()
348
+ document = self._build_document(code, traceback_text)
349
+ signals, heuristic_scores, notes = self._extract_signals(code, traceback_text)
350
+
351
+ candidate_embedding = self.backend.embed_texts([document])
352
+ matched, matched_similarity, prototype_scores = self._nearest_match(candidate_embedding)
353
+
354
+ label_similarity = {label: 0.18 for label in LABELS}
355
+ for prototype in self.prototypes:
356
+ label_similarity[prototype.label] = max(
357
+ label_similarity[prototype.label],
358
+ prototype_scores[prototype.task_id],
359
+ )
360
+
361
+ combined_scores = {
362
+ label: 0.72 * label_similarity[label] + 0.28 * heuristic_scores[label]
363
+ for label in LABELS
364
+ }
365
+ confidence_scores = _safe_softmax(combined_scores)
366
+ issue_label = max(LABELS, key=lambda label: confidence_scores[label])
367
+ top_confidence = confidence_scores[issue_label]
368
+
369
+ top_signal = signals[0].evidence if signals else "Model similarity dominated the decision."
370
+ summary = (
371
+ f"Detected a {issue_label} issue with {top_confidence:.0%} confidence. "
372
+ f"The closest known failure pattern is `{matched.title}`, which indicates {matched.summary.lower()}"
373
+ )
374
+ suggested_next_action = {
375
+ "syntax": "Fix the parser error first, then rerun validation before changing behavior.",
376
+ "logic": "Step through the smallest failing case and confirm the final branch/update behavior.",
377
+ "performance": "Replace repeated full-list scans with a linear-time aggregation strategy, then benchmark it.",
378
+ }[issue_label]
379
+
380
+ return TriageResult(
381
+ issue_label=issue_label,
382
+ confidence_scores=confidence_scores,
383
+ repair_risk=_repair_risk(issue_label, top_confidence, len(signals)),
384
+ summary=summary,
385
+ matched_pattern=PrototypeMatch(
386
+ task_id=matched.task_id,
387
+ title=matched.title,
388
+ label=matched.label,
389
+ similarity=round(matched_similarity, 4),
390
+ summary=matched.summary,
391
+ rationale=top_signal,
392
+ ),
393
+ repair_plan=self._repair_plan(issue_label, matched),
394
+ suggested_next_action=suggested_next_action,
395
+ extracted_signals=signals,
396
+ model_backend=self.backend.backend_name,
397
+ model_id=self.backend.model_id,
398
+ inference_notes=list(self.backend.notes) + notes,
399
+ analysis_time_ms=round((time.perf_counter() - started) * 1000.0, 2),
400
+ )
401
+
402
+
403
+ @lru_cache(maxsize=1)
404
+ def get_default_engine() -> CodeTriageEngine:
405
+ """Return a cached triage engine for the running process."""
406
+
407
+ return CodeTriageEngine()
triage_catalog.py ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Curated prototypes and example inputs for TorchReview Copilot."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Dict, List
6
+
7
+ try:
8
+ from .triage_models import IssueLabel, TriageExample, TriagePrototype
9
+ from .tasks import list_tasks
10
+ except ImportError:
11
+ from triage_models import IssueLabel, TriageExample, TriagePrototype
12
+ from tasks import list_tasks
13
+
14
+
15
+ TASK_KIND_TO_LABEL: Dict[str, IssueLabel] = {
16
+ "syntax_fix": "syntax",
17
+ "bug_fix": "logic",
18
+ "optimization": "performance",
19
+ }
20
+
21
+ TRACEBACK_BY_TASK_ID: Dict[str, str] = {
22
+ "syntax_fix_invoice_totals": (
23
+ "Traceback (most recent call last):\n"
24
+ " File \"services/billing/reconciliation.py\", line 3\n"
25
+ " for record in records\n"
26
+ " ^\n"
27
+ "SyntaxError: expected ':'"
28
+ ),
29
+ "bug_fix_session_windows": (
30
+ "AssertionError: collapse_sessions([{'minute': 1}, {'minute': 3}, {'minute': 8}], 4)\n"
31
+ "Expected: [(1, 3), (8, 8)]\n"
32
+ "Actual: [(1, 8)]\n"
33
+ "Boundary handling merges the final session instead of starting a new one."
34
+ ),
35
+ "optimization_rank_active_users": (
36
+ "BenchmarkWarning: rank_active_users exceeded the 450ms budget on a nightly export fixture.\n"
37
+ "Profiler hint: repeated scans over the full event list and nested loops dominate runtime."
38
+ ),
39
+ }
40
+
41
+ SUMMARY_BY_TASK_ID: Dict[str, str] = {
42
+ "syntax_fix_invoice_totals": "Broken parser state in a billing helper blocks reconciliation jobs.",
43
+ "bug_fix_session_windows": "Session-boundary logic fails on inclusive idle-timeout edges.",
44
+ "optimization_rank_active_users": "A nightly ranking job is correct on small fixtures but too slow at production scale.",
45
+ }
46
+
47
+
48
+ def _prototype_text(
49
+ task_id: str,
50
+ title: str,
51
+ description: str,
52
+ repo_summary: str,
53
+ goal: str,
54
+ visible_tests: List[str],
55
+ starter_code: str,
56
+ traceback_text: str,
57
+ ) -> str:
58
+ visible = "\n".join(f"- {item}" for item in visible_tests) or "- none"
59
+ return (
60
+ f"Title: {title}\n"
61
+ f"Problem: {description}\n"
62
+ f"Repo context: {repo_summary}\n"
63
+ f"Goal: {goal}\n"
64
+ f"Observed failure:\n{traceback_text}\n"
65
+ f"Visible checks:\n{visible}\n"
66
+ f"Candidate code:\n{starter_code}\n"
67
+ f"Task id: {task_id}\n"
68
+ )
69
+
70
+
71
+ def build_examples() -> List[TriageExample]:
72
+ """Create stable UI examples from the task catalog."""
73
+
74
+ examples: List[TriageExample] = []
75
+ for task in list_tasks():
76
+ label = TASK_KIND_TO_LABEL[task.task_kind]
77
+ examples.append(
78
+ TriageExample(
79
+ key=task.task_id,
80
+ title=task.title,
81
+ label=label,
82
+ summary=SUMMARY_BY_TASK_ID[task.task_id],
83
+ code=task.starter_code,
84
+ traceback_text=TRACEBACK_BY_TASK_ID[task.task_id],
85
+ task_id=task.task_id,
86
+ )
87
+ )
88
+ return examples
89
+
90
+
91
+ def build_prototypes() -> List[TriagePrototype]:
92
+ """Build canonical triage prototypes from the OpenEnv tasks."""
93
+
94
+ prototypes: List[TriagePrototype] = []
95
+ for task in list_tasks():
96
+ traceback_text = TRACEBACK_BY_TASK_ID[task.task_id]
97
+ prototypes.append(
98
+ TriagePrototype(
99
+ task_id=task.task_id,
100
+ title=task.title,
101
+ label=TASK_KIND_TO_LABEL[task.task_kind],
102
+ summary=SUMMARY_BY_TASK_ID[task.task_id],
103
+ reference_text=_prototype_text(
104
+ task.task_id,
105
+ task.title,
106
+ task.task_description,
107
+ task.repo_summary,
108
+ task.goal,
109
+ list(task.visible_tests),
110
+ task.reference_code,
111
+ traceback_text,
112
+ ),
113
+ starter_code=task.starter_code,
114
+ traceback_text=traceback_text,
115
+ )
116
+ )
117
+ return prototypes
triage_models.py ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Typed models for TorchReview Copilot outputs and examples."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Dict, List, Literal
6
+
7
+ from pydantic import BaseModel, Field
8
+
9
+
10
+ IssueLabel = Literal["syntax", "logic", "performance"]
11
+ RiskLevel = Literal["low", "medium", "high"]
12
+
13
+
14
+ class TriageSignal(BaseModel):
15
+ """One extracted signal used during issue classification."""
16
+
17
+ name: str
18
+ value: str
19
+ impact: Literal["syntax", "logic", "performance", "mixed"] = "mixed"
20
+ weight: float = Field(..., ge=0.0, le=1.0)
21
+ evidence: str = ""
22
+
23
+
24
+ class PrototypeMatch(BaseModel):
25
+ """Nearest known bug pattern from the built-in task catalog."""
26
+
27
+ task_id: str
28
+ title: str
29
+ label: IssueLabel
30
+ similarity: float = Field(..., ge=0.0, le=1.0)
31
+ summary: str
32
+ rationale: str
33
+
34
+
35
+ class TriageExample(BaseModel):
36
+ """Example payload exposed in the demo UI."""
37
+
38
+ key: str
39
+ title: str
40
+ label: IssueLabel
41
+ summary: str
42
+ code: str
43
+ traceback_text: str
44
+ task_id: str
45
+
46
+
47
+ class TriagePrototype(BaseModel):
48
+ """Canonical issue-pattern representation embedded by the triage engine."""
49
+
50
+ task_id: str
51
+ title: str
52
+ label: IssueLabel
53
+ summary: str
54
+ reference_text: str
55
+ starter_code: str
56
+ traceback_text: str
57
+
58
+
59
+ class TriageResult(BaseModel):
60
+ """Structured output produced by the triage pipeline."""
61
+
62
+ issue_label: IssueLabel
63
+ confidence_scores: Dict[str, float]
64
+ repair_risk: RiskLevel
65
+ summary: str
66
+ matched_pattern: PrototypeMatch
67
+ repair_plan: List[str]
68
+ suggested_next_action: str
69
+ extracted_signals: List[TriageSignal] = Field(default_factory=list)
70
+ model_backend: str
71
+ model_id: str
72
+ inference_notes: List[str] = Field(default_factory=list)
73
+ analysis_time_ms: float = Field(..., ge=0.0)