--- title: Desalination RL Protocol emoji: 🌊 colorFrom: blue colorTo: indigo sdk: docker pinned: false --- # Advanced Municipal Desalination Plant (DesalEnv) An incredibly unique, real-world RL environment that bridges continuous control, resource arbitrage, dynamic system physics, and environmental noise. The agent operates an industrial reverse-osmosis water desalination plant providing drinking water to a municipality. It must balance massive trade-offs under high pressure. This goes **far** above basic control loops, presenting specific non-linear phenomena. ### Key Mechanics ⚙️ 1. **Weather Shifts:** The environment continuously cycles through weather patterns (`Normal`, `Heatwave`, `Storm`) which violently alter both the Grid Energy Price and the sheer amount of water the city demands. 2. **Maintenance Logistics:** Pushing water fouls the RO membranes, dragging up energy costs. You can trigger a `run_cleaning` action, however, crews are not instantly available! Doing so locks a `maintenance_cooldown`. Trying to clean while on cooldown results in idle time and fines. 3. **Biological Safety Limits:** Overworking a fouled membrane causes micro-tears resulting in salt leakage. The agent tracks `water_salinity`. Processing high water yields while fouled raises PPM levels. Tipping above 500PPM induces strict city health department fines. ## 🧠 Environment Structure ### Observation Space | Feature | Description | Type | | :--- | :--- | :--- | | `reservoir_level` | Fresh water stored (Megaliters). | `float` | | `water_salinity` | PPM of salt in the water. >500 triggers penalties. | `float` | | `energy_price` | Fluctuating grid energy price ($/MWh). | `float` | | `membrane_fouling` | Hardware Degradation index (0.0=clean, 1.0=blocked). | `float` | | `city_demand` | Fluctuating water consumption for the current step. | `float` | | `weather_condition` | String literal tracking macro-events (`Heatwave`, etc.) | `string` | | `maintenance_cooldown` | Steps until a cleaning crew is available again. | `int` | ### Action Space (Continuous & Discrete Hybrid) | Feature | Description | Type | | :--- | :--- | :--- | | `production_rate` | Target water extraction flow rate (0.0 to 50.0). | `float` | | `run_cleaning` | Set True to halt production and wash membranes (checks cooldown). | `bool` | ## Tasks Provides 6 heavily distinct curriculums across 3 difficulty tiers to truly evaluate agent robustness: **Tier 1: Standard Evaluation** * `easy_spring`: Generous reservoir, standard normal weather variables. **Tier 2: Volatile Environmental Shifts** * `summer_crisis`: Back-to-back heatwaves and high energy prices. The agent has to aggressively juggle cleanings and salinity. * `hurricane_season`: Erratic grids, lower demands, but requires extreme energy arbitrage. **Tier 3: Asymmetrical Shock Scenarios (Testing True Robustness)** * `black_swan_drought`: Brutal. Demand stays critically high, reservoir is small. Tests the agent's ability to perfectly time maintenance cooldowns. If they miss one cleaning window, the city drys out. * `grid_failure`: The ultimate energy arbitrage test. Standard demand, but grid energy pricing fluctuates by massive magnitudes (`price_volatility=250.0`). Pumping at the wrong time bankrupts the plant. * `marathon_endurance`: A 500-step test where micro-degradations compound. Short-term greedy strategies (running fouled, taking salinity hits) will eventually snowball into total failure. ## Setup and Usage Instructions 1. Install dependencies: \\\ash pip install -r requirements.txt pip install openenv-core uv lock \\\ 2. Validate compliance: \\\ash openenv validate . \\\ 3. Run Environment Locally (Docker): \\\ash docker build -t desal_env . docker run -p 7860:7860 desal_env \\\ ## Baseline Scores The baseline agent uses a heuristic expert hint merged with an LLM prompt to solve the tasks reliably. Scores normally range around: - **easy_spring**: ~0.90 to ~0.95 - **summer_crisis**: ~0.80 to ~0.85 - **hurricane_season**: ~0.70 to ~0.78