Spaces:

uvpatel7271
/

python-code-review-env

Runtime error

App Files Files Community

python-code-review-env / README.md

uvpatel7271

final fiexes

03b82c2 5 days ago

preview code

raw

history blame contribute delete

6.03 kB

	---
	title: Python Code Review Environment Server
	sdk: docker
	app_port: 8000
	base_path: /web
	pinned: false
	tags:
	- openenv
	---

	# OpenEnv Python Code Review Environment

	Production-ready hackathon submission for OpenEnv evaluation, deterministic validator runs, and Hugging Face Docker deployment.

	## Architecture

	```text
	root
	├── inference.py # Root validator entrypoint
	├── openenv.yaml # OpenEnv manifest
	├── app/
	│ ├── agents/ # Action policy and fallback strategy
	│ ├── env/ # RL loop runner and stdout contract
	│ ├── models/ # Inference dataclasses/config
	│ ├── services/ # OpenAI client wrapper with retries
	│ └── utils/ # Formatting, task loading, log suppression
	├── server/
	│ ├── env.py # OpenEnv environment and reward shaping
	│ ├── app.py # FastAPI/OpenEnv app, optional Gradio mount
	│ └── Dockerfile # Hugging Face Docker image
	├── graders/ # Syntax, bug-fix, optimization graders
	├── tasks/ # Deterministic benchmark tasks and references
	├── services/ # Multi-domain analysis services
	├── analyzers/ # Domain-specific analyzers
	├── models/ # Lazy-loaded PyTorch scoring model
	├── schemas/ # API request/response contracts
	└── tests/ # Local validation coverage
	```

	Runtime flow:

	```text
	inference.py
	-> app.env.runner.InferenceRunner
	-> env.reset(task_id=...)
	-> ReviewAgent(action planning)
	-> env.step_result(action)
	-> strict [START]/[STEP]/[END] output
	```

	## What Was Fixed

	- `inference.py` now lives at the repo root and delegates to a strict runner under `app/env`.
	- OpenAI usage is limited to the official Python client:
	`client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)`.
	- Defaulted env vars are enforced for `API_BASE_URL` and `MODEL_NAME`; `HF_TOKEN` is read without a default and handled explicitly.
	- Output now matches the required single-line contract exactly and always emits `[END]`, including failure paths.
	- The RL loop now uses `reset()` plus `step_result()` in a proper `while not done` loop.
	- Step errors now surface through `last_action_error` and are printed in `[STEP]`.
	- Reward shaping is now dynamic in the OpenEnv environment:
	code quality, test progress, runtime progress, error removal, regressions, and completion are all part of the reward.
	- The API-side reward service is no longer a static weighted sum and now exposes quality, error-reduction, and completion signals.
	- The Docker image now builds from the repo root, caches dependency installation more effectively, and runs `server.app:app` directly on port `8000`.
	- Server startup is lighter:
	the PyTorch analyzer is lazy-loaded and the Gradio demo is disabled by default.

	## Local Setup

	Install dev dependencies:

	```bash
	pip install -e .[dev]
	```

	Run the test suite:

	```bash
	pytest -q
	```

	Run the OpenEnv server locally:

	```bash
	python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
	```

	Optional demo UI:

	```bash
	set ENABLE_GRADIO_DEMO=true
	set ENABLE_WEB_INTERFACE=true
	python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
	```

	## Inference Contract

	Required environment variables:

	- `API_BASE_URL`
	Default: `https://router.huggingface.co/v1`
	- `MODEL_NAME`
	Default: `Qwen/Qwen2.5-3B-Instruct`
	- `HF_TOKEN`
	Mandatory, no default is injected

	Example:

	```bash
	set API_BASE_URL=https://router.huggingface.co/v1
	set MODEL_NAME=Qwen/Qwen2.5-3B-Instruct
	set HF_TOKEN=hf_xxx
	python inference.py
	```

	Expected stdout shape:

	```text
	[START] task=syntax_fix_invoice_totals env=python_code_review_env model=Qwen/Qwen2.5-3B-Instruct
	[STEP] step=1 action=run_tests reward=0.12 done=false error=null
	[STEP] step=2 action=edit_code reward=0.96 done=false error=null
	[STEP] step=3 action=run_tests reward=0.99 done=false error=null
	[STEP] step=4 action=submit_solution reward=0.99 done=true error=null
	[END] success=true steps=4 rewards=0.12,0.96,0.99,0.99
	```

	## Docker

	Build from the project root:

	```bash
	docker build -f server/Dockerfile .
	```

	Run locally:

	```bash
	docker run --rm -p 8000:8000 ^
	-e API_BASE_URL=https://router.huggingface.co/v1 ^
	-e MODEL_NAME=Qwen/Qwen2.5-3B-Instruct ^
	-e HF_TOKEN=hf_xxx ^
	openenv-python-code-review-env
	```

	Container behavior:

	- Base image: `python:3.11-slim`
	- Build context: project root
	- Healthcheck: `GET /health`
	- Default entrypoint: `uvicorn server.app:app --host 0.0.0.0 --port 8000`

	## Hugging Face Spaces

	Recommended deployment steps:

	1. Create a Docker Space.
	2. Push this repository as-is.
	3. Let Spaces build with `server/Dockerfile`.
	4. Set Space secrets:
	`HF_TOKEN`
	5. Set Space variables as needed:
	`API_BASE_URL`, `MODEL_NAME`, `ENABLE_GRADIO_DEMO=false`
	`ENABLE_WEB_INTERFACE=false` is also supported for OpenEnv-managed deploys.
	6. Confirm the app listens on port `8000`.
	7. Smoke-test:
	`/health`
	`/reset`
	`/step`

	## Performance Notes

	- Max concurrent environments default to `2`, aligned with a `2 vCPU / 8 GB RAM` target.
	- The analyzer model is lazy-loaded instead of being created at startup.
	- The inference runner relies on short prompts, low token budgets, and limited retries.
	- The policy uses deterministic reference-code fallback instead of expensive iterative code generation.
	- Public validation is preferred before final submission to avoid wasted hidden-eval steps.

	## Known Limitations

	- If `HF_TOKEN` is absent, inference still completes with deterministic fallback actions, but LLM guidance is skipped.
	- The benchmark tasks are deterministic and intentionally small; this is good for validator stability but not a full training benchmark.
	- Gradio remains optional and is disabled by default to keep deployment lighter.