--- title: Python Bug Fixer OpenEnv emoji: 🐛 colorFrom: blue colorTo: green sdk: docker app_port: 7860 --- # Python Bug Fixer — OpenEnv An OpenEnv-compliant environment where an AI agent must identify and fix bugs in Python code to produce correct program output. Simulates real-world software debugging and code review workflows. --- ## Environment Description The agent receives a buggy Python code snippet along with a description of expected behavior. The agent's action is to return the corrected Python code. The environment executes the code and rewards the agent based on how many expected output lines are produced correctly. --- ## Observation Space **Type:** Text The observation contains: - Task description and difficulty - Expected stdout output (ground truth) - The buggy Python code to fix --- ## Action Space **Type:** Text The action is raw Python code (no markdown, no code fences). It must be valid Python that can be executed with `python3`. --- ## Tasks | Task ID | Name | Difficulty | Bugs | Max Steps | |---------|------|-----------|------|-----------| | `task_easy` | Fix Index Errors | Easy | 2 | 5 | | `task_medium` | Fix Binary Search | Medium | 2 | 5 | | `task_hard` | Fix DataProcessor Class | Hard | 3 | 7 | ### Reward Function - Reward ∈ [0.0, 1.0] - Each expected output line is worth `1 / N` reward - Partial credit awarded for partially correct fixes - Code that crashes with runtime error: 0.1 partial credit if some output produced --- ## Setup & Run Locally ```bash # 1. Install dependencies pip install -r requirements.txt # 2. Start the server uvicorn app.main:app --host 0.0.0.0 --port 7860 # 3. Test endpoints curl http://localhost:7860/health curl http://localhost:7860/tasks ``` --- ## Run Inference ```bash export API_BASE_URL="https://api-inference.huggingface.co/v1" export MODEL_NAME="meta-llama/Meta-Llama-3-8B-Instruct" export HF_TOKEN="hf_YOUR_TOKEN_HERE" export SPACE_URL="https://YOUR_USERNAME-python-bug-fixer.hf.space" python inference.py ``` Expected output format: ``` [START] {"task_id": "task_easy", "session_id": "...", "model": "...", "timestamp": "..."} [STEP] {"step": 1, "reward": 1.0, "done": true, ...} [END] {"task_id": "task_easy", "total_reward": 1.0, "steps": 1, "success": true, ...} ``` --- ## API Reference ### `POST /reset` Start a new episode. ```json Request: { "task_id": "task_easy" } Response: { "session_id": "...", "task_id": "...", "observation": "...", "info": {} } ``` ### `POST /step` Submit fixed code as an action. ```json Request: { "session_id": "...", "action": "def get_last_element(lst): ..." } Response: { "observation": "...", "reward": 1.0, "done": true, "info": {} } ``` ### `GET /state?session_id=...` Get current episode state without advancing. ```json Response: { "session_id": "...", "task_id": "...", "steps": 1, "done": true, "current_observation": "..." } ``` ### `GET /tasks` List all available tasks and metadata. ### `GET /health` Returns `{"status": "ok"}`. --- ## Docker ```bash docker build -t python-bug-fixer . docker run -p 7860:7860 python-bug-fixer ``` --- ## Project Structure ``` my-openenv/ ├── inference.py # Baseline inference script (root — required) ├── openenv.yaml # OpenEnv specification ├── Dockerfile # Container definition ├── requirements.txt # Python dependencies ├── README.md └── app/ ├── __init__.py ├── main.py # FastAPI server (reset/step/state endpoints) ├── models.py # Pydantic request/response models └── tasks/ ├── __init__.py # Task registry ├── base.py # BaseTask + safe code runner ├── task_easy.py # Easy task (2 index bugs) ├── task_medium.py # Medium task (2 binary search bugs) └── task_hard.py # Hard task (3 DataProcessor bugs) ```