Spaces:
Runtime error
Runtime error
File size: 6,030 Bytes
03b82c2 cd5c208 e31582b cd5c208 e31582b cd5c208 7c8fa1c cd5c208 7c8fa1c cd5c208 e31582b cd5c208 737f100 cd5c208 737f100 cd5c208 737f100 cd5c208 737f100 cd5c208 737f100 cd5c208 737f100 cd5c208 cbcbc92 cd5c208 e31582b 0695520 cd5c208 e31582b cd5c208 e31582b 737f100 cd5c208 03b82c2 cd5c208 737f100 e31582b cd5c208 e31582b cd5c208 e31582b cd5c208 e31582b cd5c208 e31582b 737f100 cd5c208 737f100 e31582b cd5c208 e31582b cd5c208 e31582b cd5c208 e31582b 737f100 cd5c208 e31582b cd5c208 e31582b cd5c208 e31582b 737f100 cd5c208 737f100 cd5c208 737f100 cd5c208 737f100 cd5c208 737f100 cd5c208 03b82c2 cd5c208 737f100 cd5c208 737f100 cd5c208 737f100 cd5c208 737f100 cd5c208 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | ---
title: Python Code Review Environment Server
sdk: docker
app_port: 8000
base_path: /web
pinned: false
tags:
- openenv
---
# OpenEnv Python Code Review Environment
Production-ready hackathon submission for OpenEnv evaluation, deterministic validator runs, and Hugging Face Docker deployment.
## Architecture
```text
root
βββ inference.py # Root validator entrypoint
βββ openenv.yaml # OpenEnv manifest
βββ app/
β βββ agents/ # Action policy and fallback strategy
β βββ env/ # RL loop runner and stdout contract
β βββ models/ # Inference dataclasses/config
β βββ services/ # OpenAI client wrapper with retries
β βββ utils/ # Formatting, task loading, log suppression
βββ server/
β βββ env.py # OpenEnv environment and reward shaping
β βββ app.py # FastAPI/OpenEnv app, optional Gradio mount
β βββ Dockerfile # Hugging Face Docker image
βββ graders/ # Syntax, bug-fix, optimization graders
βββ tasks/ # Deterministic benchmark tasks and references
βββ services/ # Multi-domain analysis services
βββ analyzers/ # Domain-specific analyzers
βββ models/ # Lazy-loaded PyTorch scoring model
βββ schemas/ # API request/response contracts
βββ tests/ # Local validation coverage
```
Runtime flow:
```text
inference.py
-> app.env.runner.InferenceRunner
-> env.reset(task_id=...)
-> ReviewAgent(action planning)
-> env.step_result(action)
-> strict [START]/[STEP]/[END] output
```
## What Was Fixed
- `inference.py` now lives at the repo root and delegates to a strict runner under `app/env`.
- OpenAI usage is limited to the official Python client:
`client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)`.
- Defaulted env vars are enforced for `API_BASE_URL` and `MODEL_NAME`; `HF_TOKEN` is read without a default and handled explicitly.
- Output now matches the required single-line contract exactly and always emits `[END]`, including failure paths.
- The RL loop now uses `reset()` plus `step_result()` in a proper `while not done` loop.
- Step errors now surface through `last_action_error` and are printed in `[STEP]`.
- Reward shaping is now dynamic in the OpenEnv environment:
code quality, test progress, runtime progress, error removal, regressions, and completion are all part of the reward.
- The API-side reward service is no longer a static weighted sum and now exposes quality, error-reduction, and completion signals.
- The Docker image now builds from the repo root, caches dependency installation more effectively, and runs `server.app:app` directly on port `8000`.
- Server startup is lighter:
the PyTorch analyzer is lazy-loaded and the Gradio demo is disabled by default.
## Local Setup
Install dev dependencies:
```bash
pip install -e .[dev]
```
Run the test suite:
```bash
pytest -q
```
Run the OpenEnv server locally:
```bash
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
```
Optional demo UI:
```bash
set ENABLE_GRADIO_DEMO=true
set ENABLE_WEB_INTERFACE=true
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
```
## Inference Contract
Required environment variables:
- `API_BASE_URL`
Default: `https://router.huggingface.co/v1`
- `MODEL_NAME`
Default: `Qwen/Qwen2.5-3B-Instruct`
- `HF_TOKEN`
Mandatory, no default is injected
Example:
```bash
set API_BASE_URL=https://router.huggingface.co/v1
set MODEL_NAME=Qwen/Qwen2.5-3B-Instruct
set HF_TOKEN=hf_xxx
python inference.py
```
Expected stdout shape:
```text
[START] task=syntax_fix_invoice_totals env=python_code_review_env model=Qwen/Qwen2.5-3B-Instruct
[STEP] step=1 action=run_tests reward=0.12 done=false error=null
[STEP] step=2 action=edit_code reward=0.96 done=false error=null
[STEP] step=3 action=run_tests reward=0.99 done=false error=null
[STEP] step=4 action=submit_solution reward=0.99 done=true error=null
[END] success=true steps=4 rewards=0.12,0.96,0.99,0.99
```
## Docker
Build from the project root:
```bash
docker build -f server/Dockerfile .
```
Run locally:
```bash
docker run --rm -p 8000:8000 ^
-e API_BASE_URL=https://router.huggingface.co/v1 ^
-e MODEL_NAME=Qwen/Qwen2.5-3B-Instruct ^
-e HF_TOKEN=hf_xxx ^
openenv-python-code-review-env
```
Container behavior:
- Base image: `python:3.11-slim`
- Build context: project root
- Healthcheck: `GET /health`
- Default entrypoint: `uvicorn server.app:app --host 0.0.0.0 --port 8000`
## Hugging Face Spaces
Recommended deployment steps:
1. Create a Docker Space.
2. Push this repository as-is.
3. Let Spaces build with `server/Dockerfile`.
4. Set Space secrets:
`HF_TOKEN`
5. Set Space variables as needed:
`API_BASE_URL`, `MODEL_NAME`, `ENABLE_GRADIO_DEMO=false`
`ENABLE_WEB_INTERFACE=false` is also supported for OpenEnv-managed deploys.
6. Confirm the app listens on port `8000`.
7. Smoke-test:
`/health`
`/reset`
`/step`
## Performance Notes
- Max concurrent environments default to `2`, aligned with a `2 vCPU / 8 GB RAM` target.
- The analyzer model is lazy-loaded instead of being created at startup.
- The inference runner relies on short prompts, low token budgets, and limited retries.
- The policy uses deterministic reference-code fallback instead of expensive iterative code generation.
- Public validation is preferred before final submission to avoid wasted hidden-eval steps.
## Known Limitations
- If `HF_TOKEN` is absent, inference still completes with deterministic fallback actions, but LLM guidance is skipped.
- The benchmark tasks are deterministic and intentionally small; this is good for validator stability but not a full training benchmark.
- Gradio remains optional and is disabled by default to keep deployment lighter.
|