Spaces:
Running
title: Desalination RL Protocol
emoji: 🌊
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
Advanced Municipal Desalination Plant (DesalEnv)
An incredibly unique, real-world RL environment that bridges continuous control, resource arbitrage, dynamic system physics, and environmental noise.
The agent operates an industrial reverse-osmosis water desalination plant providing drinking water to a municipality. It must balance massive trade-offs under high pressure. This goes far above basic control loops, presenting specific non-linear phenomena.
Key Mechanics ⚙️
- Weather Shifts: The environment continuously cycles through weather patterns (
Normal,Heatwave,Storm) which violently alter both the Grid Energy Price and the sheer amount of water the city demands. - Maintenance Logistics: Pushing water fouls the RO membranes, dragging up energy costs. You can trigger a
run_cleaningaction, however, crews are not instantly available! Doing so locks amaintenance_cooldown. Trying to clean while on cooldown results in idle time and fines. - Biological Safety Limits: Overworking a fouled membrane causes micro-tears resulting in salt leakage. The agent tracks
water_salinity. Processing high water yields while fouled raises PPM levels. Tipping above 500PPM induces strict city health department fines.
🧠 Environment Structure
Observation Space
| Feature | Description | Type |
|---|---|---|
reservoir_level |
Fresh water stored (Megaliters). | float |
water_salinity |
PPM of salt in the water. >500 triggers penalties. | float |
energy_price |
Fluctuating grid energy price ($/MWh). | float |
membrane_fouling |
Hardware Degradation index (0.0=clean, 1.0=blocked). | float |
city_demand |
Fluctuating water consumption for the current step. | float |
weather_condition |
String literal tracking macro-events (Heatwave, etc.) |
string |
maintenance_cooldown |
Steps until a cleaning crew is available again. | int |
Action Space (Continuous & Discrete Hybrid)
| Feature | Description | Type |
|---|---|---|
production_rate |
Target water extraction flow rate (0.0 to 50.0). | float |
run_cleaning |
Set True to halt production and wash membranes (checks cooldown). | bool |
Tasks
Provides 6 heavily distinct curriculums across 3 difficulty tiers to truly evaluate agent robustness:
Tier 1: Standard Evaluation
easy_spring: Generous reservoir, standard normal weather variables.
Tier 2: Volatile Environmental Shifts
summer_crisis: Back-to-back heatwaves and high energy prices. The agent has to aggressively juggle cleanings and salinity.hurricane_season: Erratic grids, lower demands, but requires extreme energy arbitrage.
Tier 3: Asymmetrical Shock Scenarios (Testing True Robustness)
black_swan_drought: Brutal. Demand stays critically high, reservoir is small. Tests the agent's ability to perfectly time maintenance cooldowns. If they miss one cleaning window, the city drys out.grid_failure: The ultimate energy arbitrage test. Standard demand, but grid energy pricing fluctuates by massive magnitudes (price_volatility=250.0). Pumping at the wrong time bankrupts the plant.marathon_endurance: A 500-step test where micro-degradations compound. Short-term greedy strategies (running fouled, taking salinity hits) will eventually snowball into total failure.
Setup and Usage Instructions
Install dependencies: \\ash pip install -r requirements.txt pip install openenv-core uv lock \\
Validate compliance: \\ash openenv validate . \\
Run Environment Locally (Docker): \\ash docker build -t desal_env . docker run -p 7860:7860 desal_env \\
Baseline Scores
The baseline agent uses a heuristic expert hint merged with an LLM prompt to solve the tasks reliably. Scores normally range around:
- easy_spring: ~0.90 to ~0.95
- summer_crisis: ~0.80 to ~0.85
- hurricane_season: ~0.70 to ~0.78