hiitsesh commited on
Commit
248cbb9
·
1 Parent(s): 4e608c3

Update scenarios for tier-3 evaluations

Browse files
Files changed (10) hide show
  1. README.md +35 -65
  2. adv_rebuild.py +347 -0
  3. demo.py +0 -15
  4. openenv.yaml +2 -2
  5. src/baseline.py +31 -5
  6. src/env.py +102 -35
  7. src/environment.py +0 -0
  8. src/main.py +10 -13
  9. src/models.py +15 -10
  10. src/tasks.py +41 -3
README.md CHANGED
@@ -1,86 +1,56 @@
1
  ---
2
- title: GPUClusterEnv
3
- emoji: 🚀
4
- colorFrom: blue
5
- colorTo: green
6
  sdk: docker
7
  pinned: false
8
  ---
9
 
10
- # GPU Cluster Resource Management Environment (GPUClusterEnv)
11
 
12
- A real-world cloud infrastructure environment where agents manage GPU provisioning to handle ML training workloads under strict budget constraints.
13
 
14
- Managing compute resources for incoming ML training jobs requires balancing strict budgets against Service Level Agreement (SLA) penalties for long queue times. This environment challenges agents to dynamically scale GPU resources to match fluctuating job arrival rates while maximizing overall reward.
15
 
16
- ## 🚀 Getting Started
 
 
 
17
 
18
- ### Installation
19
-
20
- 1. Clone the repository:
21
- ```bash
22
- git clone https://github.com/yourusername/GPUClusterEnv
23
- cd GPUClusterEnv
24
- ```
25
-
26
- 2. Install dependencies:
27
- ```bash
28
- pip install -r requirements.txt
29
- ```
30
-
31
- 3. Run the FastAPI server:
32
- ```bash
33
- uvicorn src.main:app --host 0.0.0.0 --port 7860
34
- ```
35
-
36
- ## 🧠 Environment Design
37
 
38
  ### Observation Space
39
 
40
- The observation space is represented as a structured dictionary containing the current state of the GPU cluster:
41
-
42
  | Feature | Description | Type |
43
  | :--- | :--- | :--- |
44
- | `time_step` | Current step in the episode. | `int` |
45
- | `active_gpus` | Number of currently provisioned GPUs. | `int` |
46
- | `queue_size` | Number of jobs waiting to be processed. | `int` |
47
- | `current_budget` | Remaining budget for the episode. | `float` |
48
- | `incoming_jobs` | Number of new jobs that arrived in the last step. | `int` |
49
-
50
- ### Action Space
51
-
52
- The agent controls the scaling of the infrastructure by specifying how many GPUs to provision or de-provision:
53
 
54
- | Feature | Description | Type | Notes |
55
- | :--- | :--- | :--- | :--- |
56
- | `gpus_to_provision` | Number of GPUs to spin up (positive) or spin down (negative). | `int` | Infrastructure scaling |
57
 
58
- ### Reward Function
59
-
60
- Instead of a sparse reward, the environment uses a shaped reward function that continuously evaluates the agent's performance based on processing jobs while minimizing costs and SLA penalties:
61
-
62
- $$Reward = (JobsProcessed \times 5.0) - (ActiveGPUs \times CostPerGPU) - (QueueSize \times Penalty)$$
63
-
64
- * **CostPerGPU**: $2.50 per step per active GPU.
65
- * **Penalty**: $1.00 SLA penalty per step for each waiting job in the queue.
66
-
67
- ### Terminal Conditions
68
-
69
- An episode ends when:
70
- 1. The maximum number of `time_steps` for the task is reached.
71
- 2. The `current_budget` drops to $0 or below.
72
 
73
- ## 📋 Tasks
74
 
75
- The environment provides 3 graded tasks with escalating difficulty:
76
 
77
- 1. **Easy** (`task_id: "easy"`): Low job arrival rate, generous budget. (Max Steps: 50)
78
- 2. **Medium** (`task_id: "medium"`): Moderate job arrival rate, standard budget. (Max Steps: 100)
79
- 3. **Hard** (`task_id: "hard"`): High, erratic job arrival rate, tight budget. (Max Steps: 200)
80
 
81
- ## 🤖 Baseline Agent
 
 
82
 
83
- To evaluate the baseline agent performance:
84
- ```bash
85
- python src/baseline.py
86
- ```
 
1
  ---
2
+ title: Desalination RL Protocol
3
+ emoji: 🌊
4
+ colorFrom: cyan
5
+ colorTo: blue
6
  sdk: docker
7
  pinned: false
8
  ---
9
 
10
+ # Advanced Municipal Desalination Plant (DesalEnv)
11
 
12
+ An incredibly unique, real-world RL environment that bridges continuous control, resource arbitrage, dynamic system physics, and environmental noise.
13
 
14
+ The agent operates an industrial reverse-osmosis water desalination plant providing drinking water to a municipality. It must balance massive trade-offs under high pressure. This goes **far** above basic control loops, presenting specific non-linear phenomena.
15
 
16
+ ### Key Mechanics ⚙️
17
+ 1. **Weather Shifts:** The environment continuously cycles through weather patterns (`Normal`, `Heatwave`, `Storm`) which violently alter both the Grid Energy Price and the sheer amount of water the city demands.
18
+ 2. **Maintenance Logistics:** Pushing water fouls the RO membranes, dragging up energy costs. You can trigger a `run_cleaning` action, however, crews are not instantly available! Doing so locks a `maintenance_cooldown`. Trying to clean while on cooldown results in idle time and fines.
19
+ 3. **Biological Safety Limits:** Overworking a fouled membrane causes micro-tears resulting in salt leakage. The agent tracks `water_salinity`. Processing high water yields while fouled raises PPM levels. Tipping above 500PPM induces strict city health department fines.
20
 
21
+ ## 🧠 Environment Structure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ### Observation Space
24
 
 
 
25
  | Feature | Description | Type |
26
  | :--- | :--- | :--- |
27
+ | `reservoir_level` | Fresh water stored (Megaliters). | `float` |
28
+ | `water_salinity` | PPM of salt in the water. >500 triggers penalties. | `float` |
29
+ | `energy_price` | Fluctuating grid energy price ($/MWh). | `float` |
30
+ | `membrane_fouling` | Hardware Degradation index (0.0=clean, 1.0=blocked). | `float` |
31
+ | `city_demand` | Fluctuating water consumption for the current step. | `float` |
32
+ | `weather_condition` | String literal tracking macro-events (`Heatwave`, etc.) | `string` |
33
+ | `maintenance_cooldown` | Steps until a cleaning crew is available again. | `int` |
 
 
34
 
35
+ ### Action Space (Continuous & Discrete Hybrid)
 
 
36
 
37
+ | Feature | Description | Type |
38
+ | :--- | :--- | :--- |
39
+ | `production_rate` | Target water extraction flow rate (0.0 to 50.0). | `float` |
40
+ | `run_cleaning` | Set True to halt production and wash membranes (checks cooldown). | `bool` |
 
 
 
 
 
 
 
 
 
 
41
 
42
+ ## Tasks
43
 
44
+ Provides 6 heavily distinct curriculums across 3 difficulty tiers to truly evaluate agent robustness:
45
 
46
+ **Tier 1: Standard Evaluation**
47
+ * `easy_spring`: Generous reservoir, standard normal weather variables.
 
48
 
49
+ **Tier 2: Volatile Environmental Shifts**
50
+ * `summer_crisis`: Back-to-back heatwaves and high energy prices. The agent has to aggressively juggle cleanings and salinity.
51
+ * `hurricane_season`: Erratic grids, lower demands, but requires extreme energy arbitrage.
52
 
53
+ **Tier 3: Asymmetrical Shock Scenarios (Testing True Robustness)**
54
+ * `black_swan_drought`: Brutal. Demand stays critically high, reservoir is small. Tests the agent's ability to perfectly time maintenance cooldowns. If they miss one cleaning window, the city drys out.
55
+ * `grid_failure`: The ultimate energy arbitrage test. Standard demand, but grid energy pricing fluctuates by massive magnitudes (`price_volatility=250.0`). Pumping at the wrong time bankrupts the plant.
56
+ * `marathon_endurance`: A 500-step test where micro-degradations compound. Short-term greedy strategies (running fouled, taking salinity hits) will eventually snowball into total failure.
adv_rebuild.py ADDED
@@ -0,0 +1,347 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+
3
+ def write_file(path, content):
4
+ with open(path, "w", encoding="utf-8") as f:
5
+ f.write(content.strip() + "\n")
6
+
7
+ models_py = """
8
+ from pydantic import BaseModel, Field
9
+ from typing import Dict, Literal, List
10
+
11
+ class Observation(BaseModel):
12
+ time_step: int
13
+ reservoir_level: float = Field(description="Current fresh water stored (Megaliters)")
14
+ water_salinity: float = Field(description="PPM of salt in the water. >500 is unsafe.")
15
+ energy_price: float = Field(description="Current grid energy price ($/MWh)")
16
+ membrane_fouling: float = Field(description="0.0 is clean, 1.0 is totally blocked")
17
+ city_demand: float = Field(description="Water demand for this step (Megaliters)")
18
+ weather_condition: Literal["Normal", "Heatwave", "Storm"] = Field(description="Current weather event affecting parameters")
19
+ maintenance_cooldown: int = Field(description="Steps until a cleaning crew is available again")
20
+
21
+ class Action(BaseModel):
22
+ production_rate: float = Field(description="Desired water output (ML/step), 0.0 to 50.0")
23
+ run_cleaning: bool = Field(description="If True, halts production to chemically wash membranes (requires crew)")
24
+
25
+ class StepResult(BaseModel):
26
+ observation: Observation
27
+ reward: float
28
+ done: bool
29
+ info: Dict
30
+
31
+ class TaskConfig(BaseModel):
32
+ task_id: str
33
+ max_steps: int
34
+ reservoir_capacity: float
35
+ base_demand: float
36
+ price_volatility: float
37
+ weather_pattern: List[str]
38
+ """
39
+
40
+ env_py = """
41
+ import math
42
+ import random
43
+ from src.models import Observation, Action, StepResult, TaskConfig
44
+
45
+ class DesalEnv:
46
+ def __init__(self):
47
+ self.state = None
48
+ self.config = None
49
+ self.total_reward = 0.0
50
+
51
+ def reset(self, config: TaskConfig) -> Observation:
52
+ self.config = config
53
+ self.total_reward = 0.0
54
+
55
+ initial_weather = config.weather_pattern[0] if config.weather_pattern else "Normal"
56
+
57
+ self.state = Observation(
58
+ time_step=0,
59
+ reservoir_level=config.reservoir_capacity * 0.5,
60
+ water_salinity=300.0, # 300 PPM is superb drinking water
61
+ energy_price=50.0,
62
+ membrane_fouling=0.0,
63
+ city_demand=config.base_demand,
64
+ weather_condition=initial_weather,
65
+ maintenance_cooldown=0
66
+ )
67
+ return self.state
68
+
69
+ def step(self, action: Action) -> StepResult:
70
+ if self.state is None:
71
+ raise ValueError("Must reset prior to step")
72
+
73
+ reward = 0.0
74
+ info = {}
75
+
76
+ # 0. Apply Maintenance Cooldown
77
+ if self.state.maintenance_cooldown > 0:
78
+ self.state.maintenance_cooldown -= 1
79
+
80
+ # 1. Processing Action: Cleaning or Pumping
81
+ actual_production = 0.0
82
+ energy_used = 0.0
83
+
84
+ if action.run_cleaning:
85
+ if self.state.maintenance_cooldown == 0:
86
+ # Successful Clean
87
+ self.state.membrane_fouling = max(0.0, self.state.membrane_fouling - 0.6)
88
+ reward -= 1000.0 # High cost of washing chemicals & crew dispatch
89
+ energy_used = 5.0 # Baseline power for flushing
90
+ self.state.maintenance_cooldown = 5 # Takes 5 steps to organize the next crew
91
+ info["action_taken"] = "cleaned"
92
+ else:
93
+ # Failed clean! The crew wasn't ready, plant stayed idle wasting a step.
94
+ info["action_taken"] = "failed_clean_idle"
95
+ reward -= 100.0 # Penalty for mismanagement
96
+ else:
97
+ actual_production = min(max(0.0, action.production_rate), 50.0)
98
+ info["action_taken"] = f"produced_{actual_production:.1f}"
99
+
100
+ # Physics Engine: Energy required scales exponentially as the membrane clogs
101
+ energy_used = actual_production * (1.5 + (self.state.membrane_fouling * 8.0))
102
+
103
+ # Sub-scale Fouling Physics: pushing water increments fouling parameter
104
+ self.state.membrane_fouling = min(1.0, self.state.membrane_fouling + (actual_production * 0.002))
105
+
106
+ # 2. Water Quality (Salinity) Tracking
107
+ # Baseline is 300PPM. Pushing hard on a fouled membrane allows micro-tears leading to salt leak.
108
+ self.state.water_salinity = 300.0 + (actual_production * self.state.membrane_fouling * 15.0)
109
+
110
+ health_penalty = 0.0
111
+ if self.state.water_salinity > 500.0:
112
+ # Massive fine per unit of violation
113
+ health_penalty = (self.state.water_salinity - 500.0) * 100.0
114
+
115
+ # 3. Economy & City Demands
116
+ water_revenue = actual_production * 25.0
117
+ self.state.reservoir_level = min(self.config.reservoir_capacity, self.state.reservoir_level + actual_production)
118
+
119
+ # The city draws water
120
+ shortfall = max(0.0, self.state.city_demand - self.state.reservoir_level)
121
+ self.state.reservoir_level = max(0.0, self.state.reservoir_level - self.state.city_demand)
122
+
123
+ # 4. Calculate Immediate Reward
124
+ energy_cost = energy_used * (self.state.energy_price / 100.0)
125
+ sla_penalty = shortfall * 1500.0 # Catastrophic penalty for empty lines (No water in pipes)
126
+
127
+ step_reward = water_revenue - energy_cost - sla_penalty - health_penalty
128
+ self.total_reward += step_reward
129
+
130
+ info.update({
131
+ "energy_cost": energy_cost,
132
+ "sla_penalty": sla_penalty,
133
+ "health_penalty": health_penalty,
134
+ "revenue": water_revenue
135
+ })
136
+
137
+ # 5. Advance time and Environment changes
138
+ self.state.time_step += 1
139
+
140
+ # Environmental Stochasticity: Weather Updates
141
+ # Weather phases change every 10 steps
142
+ weather_idx = (self.state.time_step // 10) % len(self.config.weather_pattern)
143
+ self.state.weather_condition = self.config.weather_pattern[weather_idx]
144
+
145
+ demand_multiplier = 1.0
146
+ price_multiplier = 1.0
147
+
148
+ if self.state.weather_condition == "Heatwave":
149
+ demand_multiplier = 1.5 # Massive water usage
150
+ price_multiplier = 1.8 # AC units are running, grid is stressed
151
+ elif self.state.weather_condition == "Storm":
152
+ demand_multiplier = 0.8
153
+ price_multiplier = 0.4 + random.random() # Erratic energy prices
154
+
155
+ # Modulate environment bounds
156
+ self.state.energy_price = (50.0 * price_multiplier) + (math.sin(self.state.time_step / 4.0) * self.config.price_volatility) + random.uniform(-10, 10)
157
+ self.state.energy_price = max(10.0, self.state.energy_price)
158
+
159
+ self.state.city_demand = (self.config.base_demand * demand_multiplier) + (math.sin(self.state.time_step / 6.0) * (self.config.base_demand * 0.2)) + random.uniform(-2, 2)
160
+ self.state.city_demand = max(5.0, self.state.city_demand)
161
+
162
+ done = self.state.time_step >= self.config.max_steps
163
+
164
+ return StepResult(observation=self.state, reward=step_reward, done=done, info=info)
165
+ """
166
+
167
+ tasks_py = """
168
+ from src.models import TaskConfig
169
+
170
+ TASKS = {
171
+ "easy_spring": TaskConfig(
172
+ task_id="easy_spring", max_steps=50, reservoir_capacity=200.0,
173
+ base_demand=15.0, price_volatility=10.0, weather_pattern=["Normal"]
174
+ ),
175
+ "summer_crisis": TaskConfig(
176
+ task_id="summer_crisis", max_steps=100, reservoir_capacity=150.0,
177
+ base_demand=25.0, price_volatility=40.0, weather_pattern=["Normal", "Heatwave", "Heatwave", "Normal"]
178
+ ),
179
+ "hurricane_season": TaskConfig(
180
+ task_id="hurricane_season", max_steps=150, reservoir_capacity=100.0,
181
+ base_demand=20.0, price_volatility=80.0, weather_pattern=["Normal", "Storm", "Normal", "Storm", "Storm"]
182
+ ),
183
+ }
184
+ """
185
+
186
+ main_py = """
187
+ from fastapi import FastAPI, HTTPException
188
+ from src.models import Action, TaskConfig
189
+ from src.env import DesalEnv
190
+ from src.tasks import TASKS
191
+ import subprocess
192
+
193
+ app = FastAPI(title="Advanced Municipal Desalination Plant Env")
194
+ env = DesalEnv()
195
+
196
+ @app.get("/")
197
+ def health_check():
198
+ return {"status": "ok", "message": "Advanced DesalEnv is running", "features": ["weather", "salinity", "mechanics"]}
199
+
200
+ @app.post("/reset")
201
+ def reset_env(task_id: str = "easy_spring"):
202
+ if task_id not in TASKS:
203
+ raise HTTPException(status_code=404, detail="Task not found")
204
+ obs = env.reset(TASKS[task_id])
205
+ return {"observation": obs.dict()}
206
+
207
+ @app.post("/step")
208
+ def step_env(action: Action):
209
+ try:
210
+ result = env.step(action)
211
+ return result.dict()
212
+ except Exception as e:
213
+ raise HTTPException(status_code=400, detail=str(e))
214
+
215
+ @app.get("/state")
216
+ def get_state():
217
+ if env.state is None:
218
+ raise HTTPException(status_code=400, detail="Environment not initialized")
219
+ return {"observation": env.state.dict()}
220
+
221
+ @app.get("/tasks")
222
+ def list_tasks():
223
+ return {"tasks": list(TASKS.keys()), "action_schema": Action.schema()}
224
+
225
+ @app.get("/grader")
226
+ def grader():
227
+ if env.state is None:
228
+ return {"score": 0.0}
229
+ # Grade relative to typical maximum and minimum returns to generate a 0.0-1.0 range
230
+ baseline_offset = env.config.max_steps * 1000.0 # Compensate for penalties
231
+ scale_factor = env.config.max_steps * 1500.0
232
+ score = max(0.0, min(1.0, (env.total_reward + baseline_offset) / scale_factor))
233
+ return {"score": score}
234
+
235
+ @app.post("/baseline")
236
+ def run_baseline():
237
+ result = subprocess.run(["python", "src/baseline.py"], capture_output=True, text=True)
238
+ return {"output": result.stdout}
239
+ """
240
+
241
+ baseline_py = """
242
+ import requests
243
+
244
+ BASE_URL = "http://localhost:7860"
245
+
246
+ def evaluate_baseline(task_id):
247
+ requests.post(f"{BASE_URL}/reset?task_id={task_id}")
248
+ done = False
249
+
250
+ while not done:
251
+ state = requests.get(f"{BASE_URL}/state").json()["observation"]
252
+
253
+ # Advanced Heuristic logic
254
+ # If deeply fouled and crew is ready, we clean!
255
+ # Don't try to clean if cooldown is > 0
256
+ needs_cleaning = state["membrane_fouling"] > 0.65 and state["maintenance_cooldown"] == 0
257
+
258
+ if needs_cleaning:
259
+ action = {"production_rate": 0.0, "run_cleaning": True}
260
+ else:
261
+ # Weather and Salinity check
262
+ # If weather is Heatwave, demand is high, pump up.
263
+ # But if Salinity is getting dangerous (>450), throttle!
264
+ base_prod = state["city_demand"] * 1.2 # Attempt slight overproduce
265
+
266
+ if state["water_salinity"] > 450.0:
267
+ base_prod *= 0.5 # Drop production sharply to avoid fines
268
+
269
+ # Energy heuristic: if expensive, only meet immediate demand.
270
+ if state["energy_price"] > 70.0:
271
+ base_prod = min(base_prod, state["city_demand"] * 0.9)
272
+
273
+ action = {"production_rate": max(0.0, min(base_prod, 50.0)), "run_cleaning": False}
274
+
275
+ step_res = requests.post(f"{BASE_URL}/step", json=action).json()
276
+ done = step_res["done"]
277
+
278
+ score = requests.get(f"{BASE_URL}/grader").json()["score"]
279
+ print(f"Task: {task_id} | Final Score: {score:.3f}")
280
+
281
+ if __name__ == "__main__":
282
+ for task in ["easy_spring", "summer_crisis", "hurricane_season"]:
283
+ evaluate_baseline(task)
284
+ """
285
+
286
+ readme_md = """
287
+ ---
288
+ title: Desalination RL Protocol
289
+ emoji: 🌊
290
+ colorFrom: cyan
291
+ colorTo: blue
292
+ sdk: docker
293
+ pinned: false
294
+ ---
295
+
296
+ # Advanced Municipal Desalination Plant (DesalEnv)
297
+
298
+ An incredibly unique, real-world RL environment that bridges continuous control, resource arbitrage, dynamic system physics, and environmental noise.
299
+
300
+ The agent operates an industrial reverse-osmosis water desalination plant providing drinking water to a municipality. It must balance massive trade-offs under high pressure. This goes **far** above basic control loops, presenting specific non-linear phenomena.
301
+
302
+ ### Key Mechanics ⚙️
303
+ 1. **Weather Shifts:** The environment continuously cycles through weather patterns (`Normal`, `Heatwave`, `Storm`) which violently alter both the Grid Energy Price and the sheer amount of water the city demands.
304
+ 2. **Maintenance Logistics:** Pushing water fouls the RO membranes, dragging up energy costs. You can trigger a `run_cleaning` action, however, crews are not instantly available! Doing so locks a `maintenance_cooldown`. Trying to clean while on cooldown results in idle time and fines.
305
+ 3. **Biological Safety Limits:** Overworking a fouled membrane causes micro-tears resulting in salt leakage. The agent tracks `water_salinity`. Processing high water yields while fouled raises PPM levels. Tipping above 500PPM induces strict city health department fines.
306
+
307
+ ## 🧠 Environment Structure
308
+
309
+ ### Observation Space
310
+
311
+ | Feature | Description | Type |
312
+ | :--- | :--- | :--- |
313
+ | `reservoir_level` | Fresh water stored (Megaliters). | `float` |
314
+ | `water_salinity` | PPM of salt in the water. >500 triggers penalties. | `float` |
315
+ | `energy_price` | Fluctuating grid energy price ($/MWh). | `float` |
316
+ | `membrane_fouling` | Hardware Degradation index (0.0=clean, 1.0=blocked). | `float` |
317
+ | `city_demand` | Fluctuating water consumption for the current step. | `float` |
318
+ | `weather_condition` | String literal tracking macro-events (`Heatwave`, etc.) | `string` |
319
+ | `maintenance_cooldown` | Steps until a cleaning crew is available again. | `int` |
320
+
321
+ ### Action Space (Continuous & Discrete Hybrid)
322
+
323
+ | Feature | Description | Type |
324
+ | :--- | :--- | :--- |
325
+ | `production_rate` | Target water extraction flow rate (0.0 to 50.0). | `float` |
326
+ | `run_cleaning` | Set True to halt production and wash membranes (checks cooldown). | `bool` |
327
+
328
+ ## Tasks
329
+
330
+ Provides 3 heavily distinct curriculums:
331
+ - `easy_spring`: Generous reservoir, standard weather patterns.
332
+ - `summer_crisis`: Frequent extreme Heatwaves driving massive demand + peak electricity pricing.
333
+ - `hurricane_season`: Wild grid-volatility, lower demand, but requires extreme energy arbitrage.
334
+ """
335
+
336
+ files = {
337
+ "d:/KYC/src/models.py": models_py,
338
+ "d:/KYC/src/env.py": env_py,
339
+ "d:/KYC/src/tasks.py": tasks_py,
340
+ "d:/KYC/src/main.py": main_py,
341
+ "d:/KYC/src/baseline.py": baseline_py,
342
+ "d:/KYC/README.md": readme_md
343
+ }
344
+
345
+ for path, content in files.items():
346
+ write_file(path, content)
347
+ print(f"Updated advanced mechanics in {path}")
demo.py DELETED
@@ -1,15 +0,0 @@
1
- import gradio as gr
2
-
3
- def demo_function(name):
4
- return f"Hello, {name if name else 'Developer'}! The OpenEnv Hackathon Demo is running successfully with the new updates!"
5
-
6
- if __name__ == "__main__":
7
- print("Launching Gradio demo...")
8
- demo = gr.Interface(
9
- fn=demo_function,
10
- inputs=gr.Textbox(label="Enter your name", placeholder="Name..."),
11
- outputs=gr.Textbox(label="Message"),
12
- title="OpenEnv Hackathon Submission Demo (Updated v2 ✨)",
13
- description="A demo for your Hugging Face Space. This version has been updated to confirm your recent changes are now live!"
14
- )
15
- demo.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
openenv.yaml CHANGED
@@ -1,6 +1,6 @@
1
- name: GPUClusterEnv
2
  version: 1.0.0
3
- description: A real-world cloud infrastructure environment where agents manage GPU provisioning to handle ML training workloads under strict budget constraints.
4
  endpoints:
5
  reset: /reset
6
  step: /step
 
1
+ name: DesalEnv
2
  version: 1.0.0
3
+ description: Control a municipal desalination plant. Balance water production against fluctuating energy market prices, manage reverse-osmosis membrane degradation, and avoid catastrophic city water shortages.
4
  endpoints:
5
  reset: /reset
6
  step: /step
src/baseline.py CHANGED
@@ -1,6 +1,6 @@
1
  import requests
2
 
3
- BASE_URL = "http://localhost:7860" # Default HF space port
4
 
5
  def evaluate_baseline(task_id):
6
  requests.post(f"{BASE_URL}/reset?task_id={task_id}")
@@ -9,9 +9,27 @@ def evaluate_baseline(task_id):
9
  while not done:
10
  state = requests.get(f"{BASE_URL}/state").json()["observation"]
11
 
12
- # Simple policy: If queue is larger than active GPUs, provision more.
13
- gpus_needed = state["queue_size"] - state["active_gpus"]
14
- action = {"gpus_to_provision": max(-1, min(2, gpus_needed))} # Throttle scaling
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  step_res = requests.post(f"{BASE_URL}/step", json=action).json()
17
  done = step_res["done"]
@@ -20,5 +38,13 @@ def evaluate_baseline(task_id):
20
  print(f"Task: {task_id} | Final Score: {score:.3f}")
21
 
22
  if __name__ == "__main__":
23
- for task in ["easy", "medium", "hard"]:
 
 
 
 
 
 
 
 
24
  evaluate_baseline(task)
 
1
  import requests
2
 
3
+ BASE_URL = "http://localhost:7860"
4
 
5
  def evaluate_baseline(task_id):
6
  requests.post(f"{BASE_URL}/reset?task_id={task_id}")
 
9
  while not done:
10
  state = requests.get(f"{BASE_URL}/state").json()["observation"]
11
 
12
+ # Advanced Heuristic logic
13
+ # If deeply fouled and crew is ready, we clean!
14
+ # Don't try to clean if cooldown is > 0
15
+ needs_cleaning = state["membrane_fouling"] > 0.65 and state["maintenance_cooldown"] == 0
16
+
17
+ if needs_cleaning:
18
+ action = {"production_rate": 0.0, "run_cleaning": True}
19
+ else:
20
+ # Weather and Salinity check
21
+ # If weather is Heatwave, demand is high, pump up.
22
+ # But if Salinity is getting dangerous (>450), throttle!
23
+ base_prod = state["city_demand"] * 1.2 # Attempt slight overproduce
24
+
25
+ if state["water_salinity"] > 450.0:
26
+ base_prod *= 0.5 # Drop production sharply to avoid fines
27
+
28
+ # Energy heuristic: if expensive, only meet immediate demand.
29
+ if state["energy_price"] > 70.0:
30
+ base_prod = min(base_prod, state["city_demand"] * 0.9)
31
+
32
+ action = {"production_rate": max(0.0, min(base_prod, 50.0)), "run_cleaning": False}
33
 
34
  step_res = requests.post(f"{BASE_URL}/step", json=action).json()
35
  done = step_res["done"]
 
38
  print(f"Task: {task_id} | Final Score: {score:.3f}")
39
 
40
  if __name__ == "__main__":
41
+ tasks_to_test = [
42
+ "easy_spring",
43
+ "summer_crisis",
44
+ "hurricane_season",
45
+ "black_swan_drought",
46
+ "grid_failure",
47
+ "marathon_endurance"
48
+ ]
49
+ for task in tasks_to_test:
50
  evaluate_baseline(task)
src/env.py CHANGED
@@ -1,57 +1,124 @@
1
- import numpy as np
 
2
  from src.models import Observation, Action, StepResult, TaskConfig
3
 
4
- class GPUClusterEnv:
5
  def __init__(self):
6
- self.config = None
7
  self.state = None
 
8
  self.total_reward = 0.0
9
 
10
  def reset(self, config: TaskConfig) -> Observation:
11
  self.config = config
12
  self.total_reward = 0.0
 
 
 
13
  self.state = Observation(
14
  time_step=0,
15
- active_gpus=1,
16
- queue_size=0,
17
- current_budget=config.initial_budget,
18
- incoming_jobs=0
 
 
 
19
  )
20
  return self.state
21
 
22
  def step(self, action: Action) -> StepResult:
23
  if self.state is None:
24
- raise ValueError("Environment must be reset before calling step.")
25
-
26
- # 1. Apply Action (Scale infrastructure)
27
- self.state.active_gpus = max(0, self.state.active_gpus + action.gpus_to_provision)
28
 
29
- # 2. Simulate incoming workloads
30
- new_jobs = np.random.poisson(self.config.job_arrival_rate)
31
- self.state.incoming_jobs = new_jobs
32
- self.state.queue_size += new_jobs
33
 
34
- # 3. Process jobs (1 GPU processes 1 job per step)
35
- jobs_processed = min(self.state.active_gpus, self.state.queue_size)
36
- self.state.queue_size -= jobs_processed
37
 
38
- # 4. Calculate Costs & Rewards
39
- gpu_cost = self.state.active_gpus * 2.5 # $2.50 per step per GPU
40
- sla_penalty = self.state.queue_size * 1.0 # $1 penalty per waiting job
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
- self.state.current_budget -= gpu_cost
 
43
 
44
- # Reward shaping
45
- reward = (jobs_processed * 5.0) - gpu_cost - sla_penalty
46
- self.total_reward += reward
 
 
 
 
 
47
  self.state.time_step += 1
48
-
49
- # 5. Terminal Conditions
50
- done = self.state.time_step >= self.config.max_steps or self.state.current_budget <= 0
51
-
52
- return StepResult(
53
- observation=self.state,
54
- reward=reward,
55
- done=done,
56
- info={"jobs_processed": jobs_processed, "total_reward": self.total_reward}
57
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import math
2
+ import random
3
  from src.models import Observation, Action, StepResult, TaskConfig
4
 
5
+ class DesalEnv:
6
  def __init__(self):
 
7
  self.state = None
8
+ self.config = None
9
  self.total_reward = 0.0
10
 
11
  def reset(self, config: TaskConfig) -> Observation:
12
  self.config = config
13
  self.total_reward = 0.0
14
+
15
+ initial_weather = config.weather_pattern[0] if config.weather_pattern else "Normal"
16
+
17
  self.state = Observation(
18
  time_step=0,
19
+ reservoir_level=config.reservoir_capacity * 0.5,
20
+ water_salinity=300.0, # 300 PPM is superb drinking water
21
+ energy_price=50.0,
22
+ membrane_fouling=0.0,
23
+ city_demand=config.base_demand,
24
+ weather_condition=initial_weather,
25
+ maintenance_cooldown=0
26
  )
27
  return self.state
28
 
29
  def step(self, action: Action) -> StepResult:
30
  if self.state is None:
31
+ raise ValueError("Must reset prior to step")
 
 
 
32
 
33
+ reward = 0.0
34
+ info = {}
 
 
35
 
36
+ # 0. Apply Maintenance Cooldown
37
+ if self.state.maintenance_cooldown > 0:
38
+ self.state.maintenance_cooldown -= 1
39
 
40
+ # 1. Processing Action: Cleaning or Pumping
41
+ actual_production = 0.0
42
+ energy_used = 0.0
43
+
44
+ if action.run_cleaning:
45
+ if self.state.maintenance_cooldown == 0:
46
+ # Successful Clean
47
+ self.state.membrane_fouling = max(0.0, self.state.membrane_fouling - 0.6)
48
+ reward -= 1000.0 # High cost of washing chemicals & crew dispatch
49
+ energy_used = 5.0 # Baseline power for flushing
50
+ self.state.maintenance_cooldown = 5 # Takes 5 steps to organize the next crew
51
+ info["action_taken"] = "cleaned"
52
+ else:
53
+ # Failed clean! The crew wasn't ready, plant stayed idle wasting a step.
54
+ info["action_taken"] = "failed_clean_idle"
55
+ reward -= 100.0 # Penalty for mismanagement
56
+ else:
57
+ actual_production = min(max(0.0, action.production_rate), 50.0)
58
+ info["action_taken"] = f"produced_{actual_production:.1f}"
59
+
60
+ # Physics Engine: Energy required scales exponentially as the membrane clogs
61
+ energy_used = actual_production * (1.5 + (self.state.membrane_fouling * 8.0))
62
+
63
+ # Sub-scale Fouling Physics: pushing water increments fouling parameter
64
+ self.state.membrane_fouling = min(1.0, self.state.membrane_fouling + (actual_production * 0.002))
65
+
66
+ # 2. Water Quality (Salinity) Tracking
67
+ # Baseline is 300PPM. Pushing hard on a fouled membrane allows micro-tears leading to salt leak.
68
+ self.state.water_salinity = 300.0 + (actual_production * self.state.membrane_fouling * 15.0)
69
+
70
+ health_penalty = 0.0
71
+ if self.state.water_salinity > 500.0:
72
+ # Massive fine per unit of violation
73
+ health_penalty = (self.state.water_salinity - 500.0) * 100.0
74
+
75
+ # 3. Economy & City Demands
76
+ water_revenue = actual_production * 25.0
77
+ self.state.reservoir_level = min(self.config.reservoir_capacity, self.state.reservoir_level + actual_production)
78
+
79
+ # The city draws water
80
+ shortfall = max(0.0, self.state.city_demand - self.state.reservoir_level)
81
+ self.state.reservoir_level = max(0.0, self.state.reservoir_level - self.state.city_demand)
82
+
83
+ # 4. Calculate Immediate Reward
84
+ energy_cost = energy_used * (self.state.energy_price / 100.0)
85
+ sla_penalty = shortfall * 1500.0 # Catastrophic penalty for empty lines (No water in pipes)
86
 
87
+ step_reward = water_revenue - energy_cost - sla_penalty - health_penalty
88
+ self.total_reward += step_reward
89
 
90
+ info.update({
91
+ "energy_cost": energy_cost,
92
+ "sla_penalty": sla_penalty,
93
+ "health_penalty": health_penalty,
94
+ "revenue": water_revenue
95
+ })
96
+
97
+ # 5. Advance time and Environment changes
98
  self.state.time_step += 1
99
+
100
+ # Environmental Stochasticity: Weather Updates
101
+ # Weather phases change every 10 steps
102
+ weather_idx = (self.state.time_step // 10) % len(self.config.weather_pattern)
103
+ self.state.weather_condition = self.config.weather_pattern[weather_idx]
104
+
105
+ demand_multiplier = 1.0
106
+ price_multiplier = 1.0
107
+
108
+ if self.state.weather_condition == "Heatwave":
109
+ demand_multiplier = 1.5 # Massive water usage
110
+ price_multiplier = 1.8 # AC units are running, grid is stressed
111
+ elif self.state.weather_condition == "Storm":
112
+ demand_multiplier = 0.8
113
+ price_multiplier = 0.4 + random.random() # Erratic energy prices
114
+
115
+ # Modulate environment bounds
116
+ self.state.energy_price = (50.0 * price_multiplier) + (math.sin(self.state.time_step / 4.0) * self.config.price_volatility) + random.uniform(-10, 10)
117
+ self.state.energy_price = max(10.0, self.state.energy_price)
118
+
119
+ self.state.city_demand = (self.config.base_demand * demand_multiplier) + (math.sin(self.state.time_step / 6.0) * (self.config.base_demand * 0.2)) + random.uniform(-2, 2)
120
+ self.state.city_demand = max(5.0, self.state.city_demand)
121
+
122
+ done = self.state.time_step >= self.config.max_steps
123
+
124
+ return StepResult(observation=self.state, reward=step_reward, done=done, info=info)
src/environment.py DELETED
File without changes
src/main.py CHANGED
@@ -1,18 +1,18 @@
1
  from fastapi import FastAPI, HTTPException
2
  from src.models import Action, TaskConfig
3
- from src.env import GPUClusterEnv
4
  from src.tasks import TASKS
5
  import subprocess
6
 
7
- app = FastAPI(title="GPU Cluster OpenEnv")
8
- env = GPUClusterEnv()
9
 
10
  @app.get("/")
11
  def health_check():
12
- return {"status": "ok", "message": "GPUClusterEnv is running"}
13
 
14
  @app.post("/reset")
15
- def reset_env(task_id: str = "easy"):
16
  if task_id not in TASKS:
17
  raise HTTPException(status_code=404, detail="Task not found")
18
  obs = env.reset(TASKS[task_id])
@@ -34,22 +34,19 @@ def get_state():
34
 
35
  @app.get("/tasks")
36
  def list_tasks():
37
- return {
38
- "tasks": list(TASKS.keys()),
39
- "action_schema": Action.schema()
40
- }
41
 
42
  @app.get("/grader")
43
  def grader():
44
- # Normalizes total reward to a 0.0 - 1.0 score based on max possible baseline
45
  if env.state is None:
46
  return {"score": 0.0}
47
- max_expected_reward = env.config.max_steps * 10 # Arbitrary max for example
48
- score = max(0.0, min(1.0, env.total_reward / max_expected_reward))
 
 
49
  return {"score": score}
50
 
51
  @app.post("/baseline")
52
  def run_baseline():
53
- # Trigger the baseline script and return results
54
  result = subprocess.run(["python", "src/baseline.py"], capture_output=True, text=True)
55
  return {"output": result.stdout}
 
1
  from fastapi import FastAPI, HTTPException
2
  from src.models import Action, TaskConfig
3
+ from src.env import DesalEnv
4
  from src.tasks import TASKS
5
  import subprocess
6
 
7
+ app = FastAPI(title="Advanced Municipal Desalination Plant Env")
8
+ env = DesalEnv()
9
 
10
  @app.get("/")
11
  def health_check():
12
+ return {"status": "ok", "message": "Advanced DesalEnv is running", "features": ["weather", "salinity", "mechanics"]}
13
 
14
  @app.post("/reset")
15
+ def reset_env(task_id: str = "easy_spring"):
16
  if task_id not in TASKS:
17
  raise HTTPException(status_code=404, detail="Task not found")
18
  obs = env.reset(TASKS[task_id])
 
34
 
35
  @app.get("/tasks")
36
  def list_tasks():
37
+ return {"tasks": list(TASKS.keys()), "action_schema": Action.schema()}
 
 
 
38
 
39
  @app.get("/grader")
40
  def grader():
 
41
  if env.state is None:
42
  return {"score": 0.0}
43
+ # Grade relative to typical maximum and minimum returns to generate a 0.0-1.0 range
44
+ baseline_offset = env.config.max_steps * 1000.0 # Compensate for penalties
45
+ scale_factor = env.config.max_steps * 1500.0
46
+ score = max(0.0, min(1.0, (env.total_reward + baseline_offset) / scale_factor))
47
  return {"score": score}
48
 
49
  @app.post("/baseline")
50
  def run_baseline():
 
51
  result = subprocess.run(["python", "src/baseline.py"], capture_output=True, text=True)
52
  return {"output": result.stdout}
src/models.py CHANGED
@@ -1,15 +1,19 @@
1
- from pydantic import BaseModel
2
- from typing import List, Dict
3
 
4
  class Observation(BaseModel):
5
  time_step: int
6
- active_gpus: int
7
- queue_size: int
8
- current_budget: float
9
- incoming_jobs: int
 
 
 
10
 
11
  class Action(BaseModel):
12
- gpus_to_provision: int # Can be positive (spin up) or negative (spin down)
 
13
 
14
  class StepResult(BaseModel):
15
  observation: Observation
@@ -19,7 +23,8 @@ class StepResult(BaseModel):
19
 
20
  class TaskConfig(BaseModel):
21
  task_id: str
22
- difficulty: str
23
  max_steps: int
24
- initial_budget: float
25
- job_arrival_rate: float # Lambda for Poisson distribution
 
 
 
1
+ from pydantic import BaseModel, Field
2
+ from typing import Dict, Literal, List
3
 
4
  class Observation(BaseModel):
5
  time_step: int
6
+ reservoir_level: float = Field(description="Current fresh water stored (Megaliters)")
7
+ water_salinity: float = Field(description="PPM of salt in the water. >500 is unsafe.")
8
+ energy_price: float = Field(description="Current grid energy price ($/MWh)")
9
+ membrane_fouling: float = Field(description="0.0 is clean, 1.0 is totally blocked")
10
+ city_demand: float = Field(description="Water demand for this step (Megaliters)")
11
+ weather_condition: Literal["Normal", "Heatwave", "Storm"] = Field(description="Current weather event affecting parameters")
12
+ maintenance_cooldown: int = Field(description="Steps until a cleaning crew is available again")
13
 
14
  class Action(BaseModel):
15
+ production_rate: float = Field(description="Desired water output (ML/step), 0.0 to 50.0")
16
+ run_cleaning: bool = Field(description="If True, halts production to chemically wash membranes (requires crew)")
17
 
18
  class StepResult(BaseModel):
19
  observation: Observation
 
23
 
24
  class TaskConfig(BaseModel):
25
  task_id: str
 
26
  max_steps: int
27
+ reservoir_capacity: float
28
+ base_demand: float
29
+ price_volatility: float
30
+ weather_pattern: List[str]
src/tasks.py CHANGED
@@ -1,7 +1,45 @@
1
  from src.models import TaskConfig
2
 
3
  TASKS = {
4
- "easy": TaskConfig(task_id="easy", difficulty="easy", max_steps=50, initial_budget=1000.0, job_arrival_rate=2.0),
5
- "medium": TaskConfig(task_id="medium", difficulty="medium", max_steps=100, initial_budget=800.0, job_arrival_rate=5.0),
6
- "hard": TaskConfig(task_id="hard", difficulty="hard", max_steps=200, initial_budget=500.0, job_arrival_rate=12.0),
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  }
 
1
  from src.models import TaskConfig
2
 
3
  TASKS = {
4
+ # -------------------------------------------------------------
5
+ # TIER 1: Standard Evaluation (Learning the Basics)
6
+ # -------------------------------------------------------------
7
+ "easy_spring": TaskConfig(
8
+ task_id="easy_spring", max_steps=50, reservoir_capacity=200.0,
9
+ base_demand=10.0, price_volatility=10.0, weather_pattern=["Normal"]
10
+ ),
11
+
12
+ # -------------------------------------------------------------
13
+ # TIER 2: Volatile Environmental Shifts (Learning Constraints)
14
+ # -------------------------------------------------------------
15
+ "summer_crisis": TaskConfig(
16
+ task_id="summer_crisis", max_steps=100, reservoir_capacity=150.0,
17
+ base_demand=25.0, price_volatility=40.0, weather_pattern=["Normal", "Heatwave", "Heatwave", "Normal"]
18
+ ),
19
+ "hurricane_season": TaskConfig(
20
+ task_id="hurricane_season", max_steps=150, reservoir_capacity=100.0,
21
+ base_demand=20.0, price_volatility=80.0, weather_pattern=["Normal", "Storm", "Normal", "Storm", "Storm"]
22
+ ),
23
+
24
+ # -------------------------------------------------------------
25
+ # TIER 3: Asymmetrical Shock Scenarios (Testing Robustness)
26
+ # -------------------------------------------------------------
27
+ "black_swan_drought": TaskConfig(
28
+ # Brutal: Demand stays critically high, reservoir doesn't hold much, and energy volatility is high.
29
+ # Tests the agent's ability to perfectly time maintenance cooldowns. If they miss one cleaning window, the city drys out.
30
+ task_id="black_swan_drought", max_steps=200, reservoir_capacity=120.0,
31
+ base_demand=35.0, price_volatility=50.0, weather_pattern=["Heatwave", "Heatwave", "Heatwave", "Heatwave"]
32
+ ),
33
+ "grid_failure": TaskConfig(
34
+ # The ultimate energy arbitrage test. Standard demand, but grid energy pricing fluctuates by massive magnitudes.
35
+ # Producing water at the wrong step bankrupts the enterprise instantly.
36
+ task_id="grid_failure", max_steps=200, reservoir_capacity=250.0,
37
+ base_demand=15.0, price_volatility=250.0, weather_pattern=["Normal", "Storm", "Storm", "Normal"]
38
+ ),
39
+ "marathon_endurance": TaskConfig(
40
+ # 500 Steps: The agent must manage micro-degradation perfectly over a very long time horizon.
41
+ # Short-term greedy strategies (running fouled, taking salinity hits) will eventually snowball into total failure.
42
+ task_id="marathon_endurance", max_steps=500, reservoir_capacity=200.0,
43
+ base_demand=20.0, price_volatility=30.0, weather_pattern=["Normal", "Heatwave", "Storm", "Normal"]
44
+ ),
45
  }