openenv_project / README.md
ark406's picture
Deploy OpenEnv Submission
0b55673 verified
metadata
title: Python Bug Fixer OpenEnv
emoji: πŸ›
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860

Python Bug Fixer β€” OpenEnv

An OpenEnv-compliant environment where an AI agent must identify and fix bugs in Python code to produce correct program output. Simulates real-world software debugging and code review workflows.


Environment Description

The agent receives a buggy Python code snippet along with a description of expected behavior. The agent's action is to return the corrected Python code. The environment executes the code and rewards the agent based on how many expected output lines are produced correctly.


Observation Space

Type: Text

The observation contains:

  • Task description and difficulty
  • Expected stdout output (ground truth)
  • The buggy Python code to fix

Action Space

Type: Text

The action is raw Python code (no markdown, no code fences). It must be valid Python that can be executed with python3.


Tasks

Task ID Name Difficulty Bugs Max Steps
task_easy Fix Index Errors Easy 2 5
task_medium Fix Binary Search Medium 2 5
task_hard Fix DataProcessor Class Hard 3 7

Reward Function

  • Reward ∈ [0.0, 1.0]
  • Each expected output line is worth 1 / N reward
  • Partial credit awarded for partially correct fixes
  • Code that crashes with runtime error: 0.1 partial credit if some output produced

Setup & Run Locally

# 1. Install dependencies
pip install -r requirements.txt

# 2. Start the server
uvicorn app.main:app --host 0.0.0.0 --port 7860

# 3. Test endpoints
curl http://localhost:7860/health
curl http://localhost:7860/tasks

Run Inference

export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="meta-llama/Meta-Llama-3-8B-Instruct"
export HF_TOKEN="hf_YOUR_TOKEN_HERE"
export SPACE_URL="https://YOUR_USERNAME-python-bug-fixer.hf.space"

python inference.py

Expected output format:

[START] {"task_id": "task_easy", "session_id": "...", "model": "...", "timestamp": "..."}
[STEP]  {"step": 1, "reward": 1.0, "done": true, ...}
[END]   {"task_id": "task_easy", "total_reward": 1.0, "steps": 1, "success": true, ...}

API Reference

POST /reset

Start a new episode.

Request:  { "task_id": "task_easy" }
Response: { "session_id": "...", "task_id": "...", "observation": "...", "info": {} }

POST /step

Submit fixed code as an action.

Request:  { "session_id": "...", "action": "def get_last_element(lst): ..." }
Response: { "observation": "...", "reward": 1.0, "done": true, "info": {} }

GET /state?session_id=...

Get current episode state without advancing.

Response: { "session_id": "...", "task_id": "...", "steps": 1, "done": true, "current_observation": "..." }

GET /tasks

List all available tasks and metadata.

GET /health

Returns {"status": "ok"}.


Docker

docker build -t python-bug-fixer .
docker run -p 7860:7860 python-bug-fixer

Project Structure

my-openenv/
β”œβ”€β”€ inference.py          # Baseline inference script (root β€” required)
β”œβ”€β”€ openenv.yaml          # OpenEnv specification
β”œβ”€β”€ Dockerfile            # Container definition
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ README.md
└── app/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ main.py           # FastAPI server (reset/step/state endpoints)
    β”œβ”€β”€ models.py         # Pydantic request/response models
    └── tasks/
        β”œβ”€β”€ __init__.py   # Task registry
        β”œβ”€β”€ base.py       # BaseTask + safe code runner
        β”œβ”€β”€ task_easy.py  # Easy task (2 index bugs)
        β”œβ”€β”€ task_medium.py # Medium task (2 binary search bugs)
        └── task_hard.py  # Hard task (3 DataProcessor bugs)