openenv-hackathon / README.md
hiitsesh's picture
fix: update color scheme in README for improved visibility
36ac8be
metadata
title: Desalination RL Protocol
emoji: 🌊
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false

Advanced Municipal Desalination Plant (DesalEnv)

An incredibly unique, real-world RL environment that bridges continuous control, resource arbitrage, dynamic system physics, and environmental noise.

The agent operates an industrial reverse-osmosis water desalination plant providing drinking water to a municipality. It must balance massive trade-offs under high pressure. This goes far above basic control loops, presenting specific non-linear phenomena.

Key Mechanics ⚙️

  1. Weather Shifts: The environment continuously cycles through weather patterns (Normal, Heatwave, Storm) which violently alter both the Grid Energy Price and the sheer amount of water the city demands.
  2. Maintenance Logistics: Pushing water fouls the RO membranes, dragging up energy costs. You can trigger a run_cleaning action, however, crews are not instantly available! Doing so locks a maintenance_cooldown. Trying to clean while on cooldown results in idle time and fines.
  3. Biological Safety Limits: Overworking a fouled membrane causes micro-tears resulting in salt leakage. The agent tracks water_salinity. Processing high water yields while fouled raises PPM levels. Tipping above 500PPM induces strict city health department fines.

🧠 Environment Structure

Observation Space

Feature Description Type
reservoir_level Fresh water stored (Megaliters). float
water_salinity PPM of salt in the water. >500 triggers penalties. float
energy_price Fluctuating grid energy price ($/MWh). float
membrane_fouling Hardware Degradation index (0.0=clean, 1.0=blocked). float
city_demand Fluctuating water consumption for the current step. float
weather_condition String literal tracking macro-events (Heatwave, etc.) string
maintenance_cooldown Steps until a cleaning crew is available again. int

Action Space (Continuous & Discrete Hybrid)

Feature Description Type
production_rate Target water extraction flow rate (0.0 to 50.0). float
run_cleaning Set True to halt production and wash membranes (checks cooldown). bool

Tasks

Provides 6 heavily distinct curriculums across 3 difficulty tiers to truly evaluate agent robustness:

Tier 1: Standard Evaluation

  • easy_spring: Generous reservoir, standard normal weather variables.

Tier 2: Volatile Environmental Shifts

  • summer_crisis: Back-to-back heatwaves and high energy prices. The agent has to aggressively juggle cleanings and salinity.
  • hurricane_season: Erratic grids, lower demands, but requires extreme energy arbitrage.

Tier 3: Asymmetrical Shock Scenarios (Testing True Robustness)

  • black_swan_drought: Brutal. Demand stays critically high, reservoir is small. Tests the agent's ability to perfectly time maintenance cooldowns. If they miss one cleaning window, the city drys out.
  • grid_failure: The ultimate energy arbitrage test. Standard demand, but grid energy pricing fluctuates by massive magnitudes (price_volatility=250.0). Pumping at the wrong time bankrupts the plant.
  • marathon_endurance: A 500-step test where micro-degradations compound. Short-term greedy strategies (running fouled, taking salinity hits) will eventually snowball into total failure.

Setup and Usage Instructions

  1. Install dependencies: \\ash pip install -r requirements.txt pip install openenv-core uv lock \\

  2. Validate compliance: \\ash openenv validate . \\

  3. Run Environment Locally (Docker): \\ash docker build -t desal_env . docker run -p 7860:7860 desal_env \\

Baseline Scores

The baseline agent uses a heuristic expert hint merged with an LLM prompt to solve the tasks reliably. Scores normally range around:

  • easy_spring: ~0.90 to ~0.95
  • summer_crisis: ~0.80 to ~0.85
  • hurricane_season: ~0.70 to ~0.78