Spaces:
Sleeping
title: Python Bug Fixer OpenEnv
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
Python Bug Fixer β OpenEnv
An OpenEnv-compliant environment where an AI agent must identify and fix bugs in Python code to produce correct program output. Simulates real-world software debugging and code review workflows.
Environment Description
The agent receives a buggy Python code snippet along with a description of expected behavior. The agent's action is to return the corrected Python code. The environment executes the code and rewards the agent based on how many expected output lines are produced correctly.
Observation Space
Type: Text
The observation contains:
- Task description and difficulty
- Expected stdout output (ground truth)
- The buggy Python code to fix
Action Space
Type: Text
The action is raw Python code (no markdown, no code fences).
It must be valid Python that can be executed with python3.
Tasks
| Task ID | Name | Difficulty | Bugs | Max Steps |
|---|---|---|---|---|
task_easy |
Fix Index Errors | Easy | 2 | 5 |
task_medium |
Fix Binary Search | Medium | 2 | 5 |
task_hard |
Fix DataProcessor Class | Hard | 3 | 7 |
Reward Function
- Reward β [0.0, 1.0]
- Each expected output line is worth
1 / Nreward - Partial credit awarded for partially correct fixes
- Code that crashes with runtime error: 0.1 partial credit if some output produced
Setup & Run Locally
# 1. Install dependencies
pip install -r requirements.txt
# 2. Start the server
uvicorn app.main:app --host 0.0.0.0 --port 7860
# 3. Test endpoints
curl http://localhost:7860/health
curl http://localhost:7860/tasks
Run Inference
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="meta-llama/Meta-Llama-3-8B-Instruct"
export HF_TOKEN="hf_YOUR_TOKEN_HERE"
export SPACE_URL="https://YOUR_USERNAME-python-bug-fixer.hf.space"
python inference.py
Expected output format:
[START] {"task_id": "task_easy", "session_id": "...", "model": "...", "timestamp": "..."}
[STEP] {"step": 1, "reward": 1.0, "done": true, ...}
[END] {"task_id": "task_easy", "total_reward": 1.0, "steps": 1, "success": true, ...}
API Reference
POST /reset
Start a new episode.
Request: { "task_id": "task_easy" }
Response: { "session_id": "...", "task_id": "...", "observation": "...", "info": {} }
POST /step
Submit fixed code as an action.
Request: { "session_id": "...", "action": "def get_last_element(lst): ..." }
Response: { "observation": "...", "reward": 1.0, "done": true, "info": {} }
GET /state?session_id=...
Get current episode state without advancing.
Response: { "session_id": "...", "task_id": "...", "steps": 1, "done": true, "current_observation": "..." }
GET /tasks
List all available tasks and metadata.
GET /health
Returns {"status": "ok"}.
Docker
docker build -t python-bug-fixer .
docker run -p 7860:7860 python-bug-fixer
Project Structure
my-openenv/
βββ inference.py # Baseline inference script (root β required)
βββ openenv.yaml # OpenEnv specification
βββ Dockerfile # Container definition
βββ requirements.txt # Python dependencies
βββ README.md
βββ app/
βββ __init__.py
βββ main.py # FastAPI server (reset/step/state endpoints)
βββ models.py # Pydantic request/response models
βββ tasks/
βββ __init__.py # Task registry
βββ base.py # BaseTask + safe code runner
βββ task_easy.py # Easy task (2 index bugs)
βββ task_medium.py # Medium task (2 binary search bugs)
βββ task_hard.py # Hard task (3 DataProcessor bugs)