# ConfigDebuggerEnv

ConfigDebuggerEnv is a real-world OpenEnv environment for iterative configuration debugging. It simulates tasks that platform engineers and ML engineers face in production: fixing Docker Compose, Kubernetes, and training configuration mistakes under step limits.

## Why this environment

Configuration bugs are expensive and common in real systems. They are often partially valid YAML but semantically wrong (type mismatches, missing units, interdependent constraints). This environment provides dense trajectory rewards so an agent can learn corrective behaviors instead of only terminal success/failure.

## OpenEnv API

The server exposes the standard lifecycle:

- POST /reset
- POST /step
- GET /state

### Typed models

- Action model: ConfigAction
- Observation model: ConfigObservation
- Reward model: ConfigReward
- State model: EnvState

Models are defined in server/models.py and validated with Pydantic.

## Action space

ConfigAction fields:

- operation: edit | add | delete
- path: dot path with optional list indexes (example: spec.template.spec.containers.0.image)
- value: JSON-serializable payload for edit/add

## Observation space

ConfigObservation fields:

- task_id
- task_description
- current_config (YAML string)
- syntax_valid
- validation_errors
- schema_score (0.0 to 1.0)
- logic_score (0.0 to 1.0)
- overall_score (0.0 to 1.0)
- step_count
- max_steps

## Tasks and graders

Three deterministic tasks are included:

1. easy_docker (easy)
2. medium_k8s (medium)
3. hard_ml_config (hard)

Each task has:

- A broken starting configuration
- A target configuration
- Weighted required paths for schema grading
- Deterministic logic checks

Grading always returns normalized values in [0.0, 1.0].

## Reward design

Reward has dense progression with penalties:

- Base reward is current overall score
- Positive delta bonus on improvement
- Regression penalty on negative delta
- Loop penalty for repeated states
- Penalty for invalid actions
- Penalty for destructive top-level deletes
- Small completion bonus when solved

This creates meaningful signals across the full episode, not only at termination.

## Project structure

- openenv.yaml
- Dockerfile
- requirements.txt
- inference.py
- server/
  - data.py
  - env.py
  - main.py
  - models.py

## Local setup

1. Install dependencies

```bash
pip install -r requirements.txt
```

2. Run server

```bash
python -m uvicorn server.main:app --host 0.0.0.0 --port 8000 --reload
```

3. Quick API check

```bash
curl -X POST "http://localhost:8000/reset" -H "Content-Type: application/json" -d "{\"task_id\":\"easy_docker\"}"
```

## Baseline inference

Heuristic baseline (fully reproducible):

```bash
python inference.py --policy heuristic --api-base-url http://localhost:8000 --seed 42
```

OpenAI baseline (uses OpenAI Python client and OPENAI_API_KEY):

```bash
set OPENAI_API_KEY=your_key_here
python inference.py --policy openai --model gpt-4o-mini --api-base-url http://localhost:8000 --seed 42
```

The script evaluates all three tasks and prints per-task and average scores.

## Docker

Build:

```bash
docker build -t configdebugger-env .
```

Run:

```bash
docker run -p 7860:7860 configdebugger-env
```

## Hugging Face Spaces notes

- Use Docker SDK
- Ensure Space port maps to 7860
- Add tag: openenv
- Include environment variables for external evaluation if needed

## Validation checklist

- Typed Observation/Action/Reward models: yes
- reset/step/state implemented: yes
- 3 tasks with deterministic graders: yes
- Reward in range [0.0, 1.0] with partial progress: yes
- Baseline inference script with OpenAI client: yes
- Dockerfile included: yes
- OpenEnv metadata file included: yes