Spaces:

hissterical
/

openenv2

Configuration error

App Files Files Community

openenv2 / README.md

hissterical

Upload 10 files

ebf4715 verified 10 days ago

preview code

raw

history blame contribute delete

3.87 kB

ConfigDebuggerEnv

ConfigDebuggerEnv is a real-world OpenEnv environment for iterative configuration debugging. It simulates tasks that platform engineers and ML engineers face in production: fixing Docker Compose, Kubernetes, and training configuration mistakes under step limits.

Why this environment

Configuration bugs are expensive and common in real systems. They are often partially valid YAML but semantically wrong (type mismatches, missing units, interdependent constraints). This environment provides dense trajectory rewards so an agent can learn corrective behaviors instead of only terminal success/failure.

OpenEnv API

The server exposes the standard lifecycle:

POST /reset
POST /step
GET /state

Typed models

Action model: ConfigAction
Observation model: ConfigObservation
Reward model: ConfigReward
State model: EnvState

Models are defined in server/models.py and validated with Pydantic.

Action space

ConfigAction fields:

operation: edit | add | delete
path: dot path with optional list indexes (example: spec.template.spec.containers.0.image)
value: JSON-serializable payload for edit/add

Observation space

ConfigObservation fields:

task_id
task_description
current_config (YAML string)
syntax_valid
validation_errors
schema_score (0.0 to 1.0)
logic_score (0.0 to 1.0)
overall_score (0.0 to 1.0)
step_count
max_steps

Tasks and graders

Three deterministic tasks are included:

easy_docker (easy)
medium_k8s (medium)
hard_ml_config (hard)

Each task has:

A broken starting configuration
A target configuration
Weighted required paths for schema grading
Deterministic logic checks

Grading always returns normalized values in [0.0, 1.0].

Reward design

Reward has dense progression with penalties:

Base reward is current overall score
Positive delta bonus on improvement
Regression penalty on negative delta
Loop penalty for repeated states
Penalty for invalid actions
Penalty for destructive top-level deletes
Small completion bonus when solved

This creates meaningful signals across the full episode, not only at termination.

Project structure

openenv.yaml
Dockerfile
requirements.txt
inference.py
server/
- data.py
- env.py
- main.py
- models.py

Local setup

Install dependencies

pip install -r requirements.txt

Run server

python -m uvicorn server.main:app --host 0.0.0.0 --port 8000 --reload

Quick API check

curl -X POST "http://localhost:8000/reset" -H "Content-Type: application/json" -d "{\"task_id\":\"easy_docker\"}"

Baseline inference

Heuristic baseline (fully reproducible):

python inference.py --policy heuristic --api-base-url http://localhost:8000 --seed 42

OpenAI baseline (uses OpenAI Python client and OPENAI_API_KEY):

set OPENAI_API_KEY=your_key_here
python inference.py --policy openai --model gpt-4o-mini --api-base-url http://localhost:8000 --seed 42

The script evaluates all three tasks and prints per-task and average scores.

Docker

Build:

docker build -t configdebugger-env .

Run:

docker run -p 7860:7860 configdebugger-env

Hugging Face Spaces notes

Use Docker SDK
Ensure Space port maps to 7860
Add tag: openenv
Include environment variables for external evaluation if needed

Validation checklist

Typed Observation/Action/Reward models: yes
reset/step/state implemented: yes
3 tasks with deterministic graders: yes
Reward in range [0.0, 1.0] with partial progress: yes
Baseline inference script with OpenAI client: yes
Dockerfile included: yes
OpenEnv metadata file included: yes