File size: 7,253 Bytes
c8e832f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
# Hackathon Checklist

This file translates the tutorial folder into a concrete plan for `python_env`.

It is not a generic OpenEnv summary. It is a project-specific checklist showing:

- what the tutorials are teaching
- how this repo maps to those ideas
- what is already done
- what still needs to be finished before submission

## 1. What The Tutorials Mean For This Project

### Tutorial 1: OpenEnv Pattern

Main concept:

- every environment should follow a clean pattern:
  - typed models
  - environment logic
  - client
  - FastAPI/OpenEnv app
  - Docker packaging

How `python_env` maps:

- `models.py`
  typed action/observation/config/evaluation models
- `server/code_review_environment.py`
  environment logic
- `client.py`
  Python client for reset/step/state
- `server/app.py`
  OpenEnv app plus helper routes
- `server/Dockerfile`
  container packaging

Status:

- done

What to keep in mind:

- do not break the OpenEnv contract while adding features
- treat models as the public interface

### Tutorial 2: Deployment

Main concept:

- local development first
- Docker second
- HF Spaces deployment third
- test `/health`, `/reset`, `/docs`, `/ws`

How `python_env` maps:

- local server:
  `uvicorn server.app:app --reload --host 0.0.0.0 --port 8000`
- Docker:
  `docker build -t python_env-env:latest -f server/Dockerfile .`
- Spaces:
  `openenv push`

Status:

- app boots locally
- Dockerfile exists and now supports `HOST`, `PORT`, `WORKERS`, `MAX_CONCURRENT_ENVS`
- live Docker build still needs final verification
- Spaces deployment still needs to be executed and checked

### Tutorial 3: Scaling

Main concept:

- OpenEnv works best with WebSocket sessions
- use environment class/factory instead of a singleton for OpenEnv session handling
- support concurrency with `MAX_CONCURRENT_ENVS`

How `python_env` maps:

- `create_app(PythonEnvironment, PythonReviewAction, PythonReviewObservation, max_concurrent_envs=...)`
- `MAX_CONCURRENT_ENVS` is now read from env vars
- Docker now exposes `MAX_CONCURRENT_ENVS`

Status:

- partially done

Important caveat:

- OpenEnv `/reset` and `/step` use the class-based session model
- custom routes such as `/history` and `/config` still use a singleton helper instance
- this is acceptable for manual tooling, but it is not a perfect unified session model

Recommendation:

- keep it for now if your priority is submission
- refactor only if it starts causing testing confusion

### Tutorial 4: RL Training And Reward Design

Main concept:

- a good RL environment needs:
  - meaningful reward
  - repeated trajectories
  - enough task diversity
  - an inference/training loop

How `python_env` maps:

- reward shaping already exists:
  - matched rubric items
  - false-positive penalties
  - duplicate penalties
  - hint penalties
  - patch bonus
  - finalize bonus
- `inference.py` already provides a baseline model-vs-env loop

Status:

- partially done

Gap:

- 3 tasks are enough for hackathon minimums
- 3 tasks are not enough for serious RL learning

## 2. Current Repo Status

### Strong Areas

- real-world task: code review
- typed Pydantic/OpenEnv models
- deterministic grader
- 3 difficulty levels
- partial-progress reward shaping
- manual routes for health/tasks/review/config/history
- baseline inference script
- docs in `README.md`, `Project.md`

### Weak Areas

- benchmark still small
- Docker image build not fully verified end-to-end
- HF Spaces deployment not yet executed
- `openenv validate` still needs to be run in your actual runtime
- no large trajectory dataset yet
- custom REST state and OpenEnv session state are not fully unified

## 3. What You Need To Do To Be Submission-Ready

### Step 1: Validate Local Server

Run:

```powershell

uvicorn server.app:app --reload --host 0.0.0.0 --port 8000

```

Manually verify:

- `http://127.0.0.1:8000/docs`
- `http://127.0.0.1:8000/health`
- `POST /reset`
- `POST /step`
- `GET /tasks`
- `POST /review`

### Step 2: Run Tests

Run:

```powershell

python -m pytest tests -q

```

You want all tests green before Docker or HF deployment.

### Step 3: Run OpenEnv Validation

Run:

```powershell

openenv validate

```

This is a hard requirement.

If validation fails:

- fix schema mismatch first
- fix route mismatch second
- fix packaging third

### Step 4: Run Baseline Inference

Run:

```powershell

$env:API_BASE_URL="https://api.openai.com/v1"

$env:MODEL_NAME="gpt-4.1-mini"

$env:OPENAI_API_KEY="your_key"

$env:ENV_BASE_URL="http://127.0.0.1:8000"

python inference.py

```

You want:

- script completes without crashing
- `inference_results.json` gets written
- all 3 tasks run
- scores are reproducible

### Step 5: Verify Docker

Run:

```powershell

docker build -t python_env-env:latest -f server/Dockerfile .

docker run --rm -p 8000:8000 python_env-env:latest

```

Then test:

- `GET /health`
- `POST /reset`
- `POST /step`

### Step 6: Deploy To HF Spaces

Run:

```powershell

openenv push

```

Then verify the live Space:

- `/health`
- `/docs`
- `/reset`
- `/web`

## 4. What Will Help You “Win” Instead Of Just “Submit”

Passing minimum requirements is not enough. To be competitive, improve these areas:

### A. Increase Task Diversity

Current:

- 3 benchmark tasks

Target:

- at least 10 to 20 tasks before final submission if possible

Good additions:

- SQL injection review
- unsafe YAML/pickle loading
- file-handle leak
- race-condition style bug
- retry/backoff misuse
- caching bug
- logging/privacy leak
- API timeout handling

### B. Improve Observation Context

Good RL environments provide enough context for the model to improve.

Possible improvements:

- add matched categories so far
- add a short summary of uncovered issue types
- add previous actions in structured form, not just free text
- add rubric coverage signals without leaking exact answers

### C. Collect Trajectories

You need data that shows:

- first attempt
- improved second attempt
- final attempt
- failures
- false positives
- hint usage

This is much more useful than only saving final scores.

### D. Improve Reward Design Carefully

Current reward design is already decent.

Good refinements:

- slightly larger reward for critical security findings
- bonus for correct line numbers
- bonus for high-quality recommendation text
- penalty for vague findings with no rationale

Do not overcomplicate the reward before submission. Stability matters more.

## 5. Recommended Immediate Priority Order

If time is limited, do the work in this order:

1. `pytest`
2. `openenv validate`
3. local inference run
4. Docker build and run
5. HF Space deployment
6. add 5 to 10 more tasks
7. collect trajectory data

## 6. One-Sentence Summary

You are following the correct OpenEnv architecture from the tutorials already; the main remaining work is not redesign, it is validation, deployment verification, and expanding task/data quality so the environment scores well in human review.