uvpatel7271's picture
final fiexes
03b82c2
metadata
title: Python Code Review Environment Server
sdk: docker
app_port: 8000
base_path: /web
pinned: false
tags:
  - openenv

OpenEnv Python Code Review Environment

Production-ready hackathon submission for OpenEnv evaluation, deterministic validator runs, and Hugging Face Docker deployment.

Architecture

root
β”œβ”€β”€ inference.py                # Root validator entrypoint
β”œβ”€β”€ openenv.yaml                # OpenEnv manifest
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ agents/                # Action policy and fallback strategy
β”‚   β”œβ”€β”€ env/                   # RL loop runner and stdout contract
β”‚   β”œβ”€β”€ models/                # Inference dataclasses/config
β”‚   β”œβ”€β”€ services/              # OpenAI client wrapper with retries
β”‚   └── utils/                 # Formatting, task loading, log suppression
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ env.py                 # OpenEnv environment and reward shaping
β”‚   β”œβ”€β”€ app.py                 # FastAPI/OpenEnv app, optional Gradio mount
β”‚   └── Dockerfile             # Hugging Face Docker image
β”œβ”€β”€ graders/                   # Syntax, bug-fix, optimization graders
β”œβ”€β”€ tasks/                     # Deterministic benchmark tasks and references
β”œβ”€β”€ services/                  # Multi-domain analysis services
β”œβ”€β”€ analyzers/                 # Domain-specific analyzers
β”œβ”€β”€ models/                    # Lazy-loaded PyTorch scoring model
β”œβ”€β”€ schemas/                   # API request/response contracts
└── tests/                     # Local validation coverage

Runtime flow:

inference.py
  -> app.env.runner.InferenceRunner
  -> env.reset(task_id=...)
  -> ReviewAgent(action planning)
  -> env.step_result(action)
  -> strict [START]/[STEP]/[END] output

What Was Fixed

  • inference.py now lives at the repo root and delegates to a strict runner under app/env.
  • OpenAI usage is limited to the official Python client: client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN).
  • Defaulted env vars are enforced for API_BASE_URL and MODEL_NAME; HF_TOKEN is read without a default and handled explicitly.
  • Output now matches the required single-line contract exactly and always emits [END], including failure paths.
  • The RL loop now uses reset() plus step_result() in a proper while not done loop.
  • Step errors now surface through last_action_error and are printed in [STEP].
  • Reward shaping is now dynamic in the OpenEnv environment: code quality, test progress, runtime progress, error removal, regressions, and completion are all part of the reward.
  • The API-side reward service is no longer a static weighted sum and now exposes quality, error-reduction, and completion signals.
  • The Docker image now builds from the repo root, caches dependency installation more effectively, and runs server.app:app directly on port 8000.
  • Server startup is lighter: the PyTorch analyzer is lazy-loaded and the Gradio demo is disabled by default.

Local Setup

Install dev dependencies:

pip install -e .[dev]

Run the test suite:

pytest -q

Run the OpenEnv server locally:

python -m uvicorn server.app:app --host 0.0.0.0 --port 8000

Optional demo UI:

set ENABLE_GRADIO_DEMO=true
set ENABLE_WEB_INTERFACE=true
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000

Inference Contract

Required environment variables:

  • API_BASE_URL Default: https://router.huggingface.co/v1
  • MODEL_NAME Default: Qwen/Qwen2.5-3B-Instruct
  • HF_TOKEN Mandatory, no default is injected

Example:

set API_BASE_URL=https://router.huggingface.co/v1
set MODEL_NAME=Qwen/Qwen2.5-3B-Instruct
set HF_TOKEN=hf_xxx
python inference.py

Expected stdout shape:

[START] task=syntax_fix_invoice_totals env=python_code_review_env model=Qwen/Qwen2.5-3B-Instruct
[STEP]  step=1 action=run_tests reward=0.12 done=false error=null
[STEP]  step=2 action=edit_code reward=0.96 done=false error=null
[STEP]  step=3 action=run_tests reward=0.99 done=false error=null
[STEP]  step=4 action=submit_solution reward=0.99 done=true error=null
[END]   success=true steps=4 rewards=0.12,0.96,0.99,0.99

Docker

Build from the project root:

docker build -f server/Dockerfile .

Run locally:

docker run --rm -p 8000:8000 ^
  -e API_BASE_URL=https://router.huggingface.co/v1 ^
  -e MODEL_NAME=Qwen/Qwen2.5-3B-Instruct ^
  -e HF_TOKEN=hf_xxx ^
  openenv-python-code-review-env

Container behavior:

  • Base image: python:3.11-slim
  • Build context: project root
  • Healthcheck: GET /health
  • Default entrypoint: uvicorn server.app:app --host 0.0.0.0 --port 8000

Hugging Face Spaces

Recommended deployment steps:

  1. Create a Docker Space.
  2. Push this repository as-is.
  3. Let Spaces build with server/Dockerfile.
  4. Set Space secrets: HF_TOKEN
  5. Set Space variables as needed: API_BASE_URL, MODEL_NAME, ENABLE_GRADIO_DEMO=false ENABLE_WEB_INTERFACE=false is also supported for OpenEnv-managed deploys.
  6. Confirm the app listens on port 8000.
  7. Smoke-test: /health /reset /step

Performance Notes

  • Max concurrent environments default to 2, aligned with a 2 vCPU / 8 GB RAM target.
  • The analyzer model is lazy-loaded instead of being created at startup.
  • The inference runner relies on short prompts, low token budgets, and limited retries.
  • The policy uses deterministic reference-code fallback instead of expensive iterative code generation.
  • Public validation is preferred before final submission to avoid wasted hidden-eval steps.

Known Limitations

  • If HF_TOKEN is absent, inference still completes with deterministic fallback actions, but LLM guidance is skipped.
  • The benchmark tasks are deterministic and intentionally small; this is good for validator stability but not a full training benchmark.
  • Gradio remains optional and is disabled by default to keep deployment lighter.