Spaces:

ark406
/

openenv_project

Sleeping

App Files Files Community

openenv_project / README.md

ark406

Deploy OpenEnv Submission

0b55673 verified 13 days ago

preview code

raw

history blame contribute delete

3.94 kB

metadata

title: Python Bug Fixer OpenEnv
emoji: 🐛
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860

Python Bug Fixer — OpenEnv

An OpenEnv-compliant environment where an AI agent must identify and fix bugs in Python code to produce correct program output. Simulates real-world software debugging and code review workflows.

Environment Description

The agent receives a buggy Python code snippet along with a description of expected behavior. The agent's action is to return the corrected Python code. The environment executes the code and rewards the agent based on how many expected output lines are produced correctly.

Observation Space

Type: Text

The observation contains:

Task description and difficulty
Expected stdout output (ground truth)
The buggy Python code to fix

Action Space

Type: Text

The action is raw Python code (no markdown, no code fences). It must be valid Python that can be executed with python3.

Tasks

Task ID	Name	Difficulty	Bugs	Max Steps
`task_easy`	Fix Index Errors	Easy	2	5
`task_medium`	Fix Binary Search	Medium	2	5
`task_hard`	Fix DataProcessor Class	Hard	3	7

Reward Function

Reward ∈ [0.0, 1.0]
Each expected output line is worth 1 / N reward
Partial credit awarded for partially correct fixes
Code that crashes with runtime error: 0.1 partial credit if some output produced

Setup & Run Locally

# 1. Install dependencies
pip install -r requirements.txt

# 2. Start the server
uvicorn app.main:app --host 0.0.0.0 --port 7860

# 3. Test endpoints
curl http://localhost:7860/health
curl http://localhost:7860/tasks

Run Inference

export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="meta-llama/Meta-Llama-3-8B-Instruct"
export HF_TOKEN="hf_YOUR_TOKEN_HERE"
export SPACE_URL="https://YOUR_USERNAME-python-bug-fixer.hf.space"

python inference.py

Expected output format:

[START] {"task_id": "task_easy", "session_id": "...", "model": "...", "timestamp": "..."}
[STEP]  {"step": 1, "reward": 1.0, "done": true, ...}
[END]   {"task_id": "task_easy", "total_reward": 1.0, "steps": 1, "success": true, ...}

API Reference

`POST /reset`

Start a new episode.

Request:  { "task_id": "task_easy" }
Response: { "session_id": "...", "task_id": "...", "observation": "...", "info": {} }

`POST /step`

Submit fixed code as an action.

Request:  { "session_id": "...", "action": "def get_last_element(lst): ..." }
Response: { "observation": "...", "reward": 1.0, "done": true, "info": {} }

`GET /state?session_id=...`

Get current episode state without advancing.

Response: { "session_id": "...", "task_id": "...", "steps": 1, "done": true, "current_observation": "..." }

`GET /tasks`

List all available tasks and metadata.

`GET /health`

Returns {"status": "ok"}.

Docker

docker build -t python-bug-fixer .
docker run -p 7860:7860 python-bug-fixer

Project Structure

my-openenv/
├── inference.py          # Baseline inference script (root — required)
├── openenv.yaml          # OpenEnv specification
├── Dockerfile            # Container definition
├── requirements.txt      # Python dependencies
├── README.md
└── app/
    ├── __init__.py
    ├── main.py           # FastAPI server (reset/step/state endpoints)
    ├── models.py         # Pydantic request/response models
    └── tasks/
        ├── __init__.py   # Task registry
        ├── base.py       # BaseTask + safe code runner
        ├── task_easy.py  # Easy task (2 index bugs)
        ├── task_medium.py # Medium task (2 binary search bugs)
        └── task_hard.py  # Hard task (3 DataProcessor bugs)