openenv2 / README.md
hissterical's picture
Upload 10 files
ebf4715 verified

ConfigDebuggerEnv

ConfigDebuggerEnv is a real-world OpenEnv environment for iterative configuration debugging. It simulates tasks that platform engineers and ML engineers face in production: fixing Docker Compose, Kubernetes, and training configuration mistakes under step limits.

Why this environment

Configuration bugs are expensive and common in real systems. They are often partially valid YAML but semantically wrong (type mismatches, missing units, interdependent constraints). This environment provides dense trajectory rewards so an agent can learn corrective behaviors instead of only terminal success/failure.

OpenEnv API

The server exposes the standard lifecycle:

  • POST /reset
  • POST /step
  • GET /state

Typed models

  • Action model: ConfigAction
  • Observation model: ConfigObservation
  • Reward model: ConfigReward
  • State model: EnvState

Models are defined in server/models.py and validated with Pydantic.

Action space

ConfigAction fields:

  • operation: edit | add | delete
  • path: dot path with optional list indexes (example: spec.template.spec.containers.0.image)
  • value: JSON-serializable payload for edit/add

Observation space

ConfigObservation fields:

  • task_id
  • task_description
  • current_config (YAML string)
  • syntax_valid
  • validation_errors
  • schema_score (0.0 to 1.0)
  • logic_score (0.0 to 1.0)
  • overall_score (0.0 to 1.0)
  • step_count
  • max_steps

Tasks and graders

Three deterministic tasks are included:

  1. easy_docker (easy)
  2. medium_k8s (medium)
  3. hard_ml_config (hard)

Each task has:

  • A broken starting configuration
  • A target configuration
  • Weighted required paths for schema grading
  • Deterministic logic checks

Grading always returns normalized values in [0.0, 1.0].

Reward design

Reward has dense progression with penalties:

  • Base reward is current overall score
  • Positive delta bonus on improvement
  • Regression penalty on negative delta
  • Loop penalty for repeated states
  • Penalty for invalid actions
  • Penalty for destructive top-level deletes
  • Small completion bonus when solved

This creates meaningful signals across the full episode, not only at termination.

Project structure

  • openenv.yaml
  • Dockerfile
  • requirements.txt
  • inference.py
  • server/
    • data.py
    • env.py
    • main.py
    • models.py

Local setup

  1. Install dependencies
pip install -r requirements.txt
  1. Run server
python -m uvicorn server.main:app --host 0.0.0.0 --port 8000 --reload
  1. Quick API check
curl -X POST "http://localhost:8000/reset" -H "Content-Type: application/json" -d "{\"task_id\":\"easy_docker\"}"

Baseline inference

Heuristic baseline (fully reproducible):

python inference.py --policy heuristic --api-base-url http://localhost:8000 --seed 42

OpenAI baseline (uses OpenAI Python client and OPENAI_API_KEY):

set OPENAI_API_KEY=your_key_here
python inference.py --policy openai --model gpt-4o-mini --api-base-url http://localhost:8000 --seed 42

The script evaluates all three tasks and prints per-task and average scores.

Docker

Build:

docker build -t configdebugger-env .

Run:

docker run -p 7860:7860 configdebugger-env

Hugging Face Spaces notes

  • Use Docker SDK
  • Ensure Space port maps to 7860
  • Add tag: openenv
  • Include environment variables for external evaluation if needed

Validation checklist

  • Typed Observation/Action/Reward models: yes
  • reset/step/state implemented: yes
  • 3 tasks with deterministic graders: yes
  • Reward in range [0.0, 1.0] with partial progress: yes
  • Baseline inference script with OpenAI client: yes
  • Dockerfile included: yes
  • OpenEnv metadata file included: yes