# ConfigDebuggerEnv ConfigDebuggerEnv is a real-world OpenEnv environment for iterative configuration debugging. It simulates tasks that platform engineers and ML engineers face in production: fixing Docker Compose, Kubernetes, and training configuration mistakes under step limits. ## Why this environment Configuration bugs are expensive and common in real systems. They are often partially valid YAML but semantically wrong (type mismatches, missing units, interdependent constraints). This environment provides dense trajectory rewards so an agent can learn corrective behaviors instead of only terminal success/failure. ## OpenEnv API The server exposes the standard lifecycle: - POST /reset - POST /step - GET /state ### Typed models - Action model: ConfigAction - Observation model: ConfigObservation - Reward model: ConfigReward - State model: EnvState Models are defined in server/models.py and validated with Pydantic. ## Action space ConfigAction fields: - operation: edit | add | delete - path: dot path with optional list indexes (example: spec.template.spec.containers.0.image) - value: JSON-serializable payload for edit/add ## Observation space ConfigObservation fields: - task_id - task_description - current_config (YAML string) - syntax_valid - validation_errors - schema_score (0.0 to 1.0) - logic_score (0.0 to 1.0) - overall_score (0.0 to 1.0) - step_count - max_steps ## Tasks and graders Three deterministic tasks are included: 1. easy_docker (easy) 2. medium_k8s (medium) 3. hard_ml_config (hard) Each task has: - A broken starting configuration - A target configuration - Weighted required paths for schema grading - Deterministic logic checks Grading always returns normalized values in [0.0, 1.0]. ## Reward design Reward has dense progression with penalties: - Base reward is current overall score - Positive delta bonus on improvement - Regression penalty on negative delta - Loop penalty for repeated states - Penalty for invalid actions - Penalty for destructive top-level deletes - Small completion bonus when solved This creates meaningful signals across the full episode, not only at termination. ## Project structure - openenv.yaml - Dockerfile - requirements.txt - inference.py - server/ - data.py - env.py - main.py - models.py ## Local setup 1. Install dependencies ```bash pip install -r requirements.txt ``` 2. Run server ```bash python -m uvicorn server.main:app --host 0.0.0.0 --port 8000 --reload ``` 3. Quick API check ```bash curl -X POST "http://localhost:8000/reset" -H "Content-Type: application/json" -d "{\"task_id\":\"easy_docker\"}" ``` ## Baseline inference Heuristic baseline (fully reproducible): ```bash python inference.py --policy heuristic --api-base-url http://localhost:8000 --seed 42 ``` OpenAI baseline (uses OpenAI Python client and OPENAI_API_KEY): ```bash set OPENAI_API_KEY=your_key_here python inference.py --policy openai --model gpt-4o-mini --api-base-url http://localhost:8000 --seed 42 ``` The script evaluates all three tasks and prints per-task and average scores. ## Docker Build: ```bash docker build -t configdebugger-env . ``` Run: ```bash docker run -p 7860:7860 configdebugger-env ``` ## Hugging Face Spaces notes - Use Docker SDK - Ensure Space port maps to 7860 - Add tag: openenv - Include environment variables for external evaluation if needed ## Validation checklist - Typed Observation/Action/Reward models: yes - reset/step/state implemented: yes - 3 tasks with deterministic graders: yes - Reward in range [0.0, 1.0] with partial progress: yes - Baseline inference script with OpenAI client: yes - Dockerfile included: yes - OpenEnv metadata file included: yes