--- title: GridOps emoji: ⚡ colorFrom: blue colorTo: indigo sdk: docker app_port: 8000 tags: - openenv - reinforcement-learning - microgrid - energy --- # GridOps — Community Microgrid Bridge Operator > A production-grade OpenEnv RL environment for Indian community microgrid operation. Plug-and-play. Deterministic. Benchmarkable. **Live demo**: [77ethers-gridops.hf.space/dashboard/](https://77ethers-gridops.hf.space/dashboard/) | **HF Space**: [huggingface.co/spaces/77ethers/gridops](https://huggingface.co/spaces/77ethers/gridops) --- ## At a Glance | | | |---|---| | **Domain** | Real-world Indian community microgrid operation (100 homes, summer) | | **Interface** | Full OpenEnv spec: `reset()` -> `step(action)` -> `state()`, typed Pydantic models | | **Actions** | 3D continuous: `battery_dispatch [-1,1]`, `diesel_dispatch [0,1]`, `demand_shedding [0,1]` | | **Observations** | 30+ fields: demand, solar, price, SOC, forecasts, energy flows. Partial observability (noisy forecasts). | | **Tasks** | 3 tasks (easy -> medium -> hard), each testing a different RL capability | | **Grading** | Deterministic, programmatic, 0.0-1.0. Same seed = same score, every run. | | **Reward** | Dense per-step signal, aligned with episode grader (50% cost + 25% reliability + 25% green) | | **Anti-gaming** | 5 mechanisms: degradation, startup costs, rebound, smooth VoLL, grid cap | | **Baseline** | Grok-4 LLM: 0.80/0.82/0.72 — beats hand-coded oracle on all tasks | | **Deployment** | Docker + HF Space + `openenv validate` 6/6 pass | --- ## Why This Environment Exists Community microgrid operation is a **real job** in India under the [RDSS](https://rdss.gov.in/) (Revamped Distribution Sector Scheme). IEX prosumer bidding is live. Over 50 million Indian homes will have rooftop solar by 2030, and someone — or some agent — needs to manage the battery-grid-diesel tradeoff in real time. This is not a toy problem. This is what a microgrid operator at an Indian housing society actually decides every hour: - **Should I charge the battery now** (grid is cheap at Rs 4/kWh) **or save capacity for tonight** (price will spike to Rs 15)? - **Should I run diesel** (Rs 25/kWh + Rs 100 startup) **or risk a blackout** (Rs 150/kWh VoLL penalty)? - **Should I ask residents to reduce AC usage** (Rs 40/kWh + 100% rebounds tomorrow)? Simple heuristics provably fail. The environment requires multi-hour planning, price forecasting, and constraint management under partial observability. ### What makes this a strong benchmark - **Any agent can plug in immediately** — typed JSON actions in, typed observations out, no custom hacks - **Fully deterministic** — same seed, same actions = identical trajectory every time. Leaderboard-ready. - **Tasks differentiate agents** — Do-Nothing scores 0.45-0.58, Oracle 0.70-0.81, Grok-4 LLM 0.72-0.82. Clear skill gradient. - **Can't be gamed** — 5 anti-exploit mechanisms prevent reward hacking (detailed below) - **Grader = ground truth** — programmatic, deterministic, partial credit, aligned with per-step reward --- ## The Problem at a Glance **You have:** - **Solar panels** — 250 kW peak, free, but only during daylight - **Community battery** — 500 kWh storage, 100 kW max charge/discharge - **Diesel generator** — 100 kW, but Rs 25/kWh + Rs 100 startup cost - **National grid** — auto-imports/exports as slack (capped at 200 kW) **You control (3 continuous actions):** | Action | Range | What it does | |--------|-------|-------------| | `battery_dispatch` | -1 to +1 | Charge (-100 kW) or discharge (+100 kW). Rs 2.5/kWh degradation. | | `diesel_dispatch` | 0 to 1 | Diesel output (0-100 kW). Rs 25/kWh + Rs 100 startup if was off. | | `demand_shedding` | 0 to 1 | Ask residents to cut 0-20% usage. **100% rebounds next hour.** Rs 40/kWh penalty. | **You do NOT control the grid.** It automatically absorbs whatever energy gap remains after your decisions. If the gap exceeds 200 kW, that's a **blackout** (Rs 150/kWh penalty). --- ## The Critical Bottleneck At **8 PM every evening**, demand hits **250 kW** but the grid maxes out at **200 kW** and solar is zero. The **50 kW gap** must come from your battery. If you discharged it for profit during the day, the neighborhood goes dark. On a heatwave day (Task 2-3), demand spikes to **325-375 kW**. Now the gap is **125-175 kW** — you need battery + diesel + shedding just to survive. And in Task 3, the grid goes down entirely for 6 hours. --- ## What the Agent Sees (Observation) | Field | Description | |-------|-------------| | `hour` | Current hour in episode (0-72, starting 6 AM) | | `demand_kw` | What the 100 homes need right now | | `solar_kw` | Free solar power available (0 at night, up to 250 kW midday) | | `battery_soc` | Battery charge level (0-1, i.e. 0-500 kWh) | | `grid_price` | Current IEX electricity price (Rs 3-20/kWh) | | `diesel_fuel_remaining` | Diesel tank level (0-1) | | `diesel_is_on` | Was diesel running last step? (startup cost if turning on) | | `demand_forecast_4h` | Noisy 4-hour demand forecast (+-15%) | | `solar_forecast_4h` | Noisy 4-hour solar forecast | | `price_forecast_4h` | Noisy 4-hour price forecast | | `cumulative_blackout_kwh` | Total blackout energy so far | | `cumulative_cost` | Total money spent so far (Rs) | | `flow_*` | Detailed energy flows (solar, grid import/export, battery in/out, diesel, demand) | **Partial observability**: forecasts have +-15% Gaussian noise. The agent cannot perfectly predict heatwave intensity, cloud cover, or price spikes. --- ## 3 Tasks (Each Tests a Different RL Capability) ### Task 1: Normal Summer (Easy) — *Tests basic arbitrage* - Clear skies, standard demand (~100 kW avg, 250 kW peak) - Grid prices Rs 3-12 with clear cheap night / expensive evening pattern - **What the agent must learn**: charge battery at night (cheap grid), discharge during evening peak (expensive grid), let solar cover midday ### Task 2: Heatwave + Price Spike (Medium) — *Tests temporal planning* - Day 2-3 heatwave (+30% demand), intermittent clouds - **Rs 20 price spike** on Day 2 evening — visible in 4-hour forecast - **What the agent must learn**: read the forecast, hold battery charge for the spike instead of greedily discharging early. A greedy policy discharges mid-afternoon; an RL agent that reads the forecast holds until 6 PM. ### Task 3: Extreme Crisis + Grid Outage (Hard) — *Tests constraint management* - Full 3-day heatwave, -30% solar from haze, +50% demand - Limited diesel (33% tank = ~8 hours at full power) - **6-hour grid outage** on Day 2 afternoon — grid cap drops to 0 kW - **What the agent must learn**: aggressively pre-charge battery before the outage, ration diesel across the outage window, shed demand strategically to stretch resources. This is true microgrid islanding. --- ## Grading (0.0 - 1.0) ``` score = 0.50 x cost_efficiency + 0.25 x reliability + 0.25 x green_score ``` | Component | Formula | What it rewards | |-----------|---------|----------------| | **Cost efficiency** (50%) | `1 - (agent_cost / baseline_cost)` | Spending less than a dumb "max grid import" baseline | | **Reliability** (25%) | `(demand_met - blackout) / demand_met` | Keeping the lights on | | **Green score** (25%) | `1 - (diesel_used / total_demand)` | Minimizing diesel emissions | **Baseline**: "import max grid every hour, no battery/diesel/shedding" — physically possible, but expensive and suffers blackouts during peak hours and grid outages. **VoLL (Value of Lost Load)**: Rs 150/kWh blackout penalty. This is a smooth gradient — no hard reliability cliff. The agent always gets signal for reducing blackouts incrementally. --- ## Why Heuristics Fail | Strategy | Why it fails | |----------|-------------| | "Always discharge battery" | Empty by evening peak. 50 kW gap = blackout. Score collapses. | | "Always run diesel" | Rs 25/kWh vs Rs 5 grid at night. Hemorrhages money. Green score = 0. | | "Shed demand whenever short" | Rs 40/kWh cost + 100% rebounds next hour. More expensive than diesel. | | "Discharge when price > X" | Ignores battery state. Drains SOC before the real peak. | | "Do nothing" | Grid alone can't cover evening peak. 3.6% blackout rate. | The oracle (rule-based, time-of-day + price-aware) scores 0.70-0.81. There's a clear **0.20-0.35 gap** between heuristics and the oracle, proving the environment has real optimization headroom. --- ## Anti-Gaming Design The environment has 5 mechanisms that prevent reward hacking: 1. **Shedding is expensive** — Rs 40/kWh + 100% rebound. Costlier than diesel. True emergency only. 2. **Battery degradation** — Rs 2.5/kWh throughput. Prevents infinite cycling for tiny arbitrage. 3. **Diesel startup cost** — Rs 100 per on-switch. Prevents on/off toggling. 4. **VoLL is smooth** — Rs 150/kWh with no cliff. Agent can't exploit a binary gate. 5. **Grid is capped** — 200 kW max (0 during outages). Can't just buy everything. --- ## Baseline Scores | Strategy | Task 1 | Task 2 | Task 3 | What it does | |----------|--------|--------|--------|-------------| | **Grok-4 (LLM)** | **0.80** | **0.82** | **0.72** | Reads observations, reasons about tradeoffs | | **Oracle (rule-based)** | 0.79 | 0.81 | 0.70 | Time-of-day + price + SOC heuristic | | Do-Nothing (grid only) | 0.58 | 0.51 | 0.45 | Grid covers everything it can | | Always-Discharge | 0.59 | 0.51 | 0.45 | Drains battery, empty by evening | | Always-Diesel | 0.42 | 0.42 | 0.44 | Rs 25/kWh burns money | - **LLM beats oracle**: Grok-4 matched or exceeded the hand-coded oracle on every task - **Deterministic**: identical scores across 3 runs (seeded RNG) - **Oracle ceiling < 1.0**: real physics constraints, not inflated scores - **Clear separation**: LLM > oracle >> heuristics (0.20-0.38 gap from best to worst) - **Task 3 hardest**: grid outage makes it genuinely challenging even for frontier LLMs --- ## Key Physics | Component | Spec | Cost | |-----------|------|------| | **Solar** | 250 kW peak, bell curve 6 AM - 6 PM | Free | | **Battery** | 500 kWh, 100 kW max, 90% round-trip (sqrt each way) | Rs 2.5/kWh degradation | | **Diesel** | 100 kW max | Rs 25/kWh + Rs 100 startup | | **Grid** | 200 kW max import/export (slack variable) | Market price Rs 3-20/kWh | | **Blackout** | Unmet demand when all sources exhausted | Rs 150/kWh VoLL penalty | | **Shedding** | Up to 20% demand reduction | Rs 40/kWh + 100% rebound next hour | **Energy balance every step:** ``` supply = solar + grid_import + battery_discharge + diesel consume = effective_demand + grid_export + battery_charge ``` Supply always equals consumption. Any unmet demand beyond grid cap = blackout. --- ## Setup & Usage ```bash # Install pip install -e . # Run server uvicorn gridops.server.app:app --port 8000 # Interactive dashboard open http://localhost:8000/dashboard/ # Validate oracle + determinism python scripts/oracle_test.py # Run LLM baseline export API_BASE_URL="https://router.huggingface.co/v1" export HF_TOKEN="your-token" export MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct" python inference.py ``` ## Docker ```bash docker build -t gridops . docker run -p 8000:8000 gridops ``` ## OpenEnv Validation ```bash # Local structure check openenv validate # Runtime check (against live server) openenv validate --url http://localhost:8000 ``` --- ## Project Structure ``` gridops/ ├── inference.py # LLM baseline (API_BASE_URL, MODEL_NAME, HF_TOKEN) ├── openenv.yaml # OpenEnv manifest ├── Dockerfile # Docker deployment ├── server/app.py # Root entry point (openenv validate) ├── gridops/ │ ├── models.py # GridOpsAction, GridOpsObservation (Pydantic) │ ├── simulation/ │ │ ├── physics.py # Energy balance, battery, VoLL, degradation, outages │ │ └── scenarios.py # Demand/solar/price curve generators │ ├── tasks/ │ │ ├── definitions.py # 3 task configs (normal, heatwave, crisis+outage) │ │ └── graders.py # 0-1 scoring: cost + reliability + green │ └── server/ │ ├── app.py # FastAPI + OpenEnv create_app │ ├── environment.py # OpenEnv Environment class │ └── static/index.html # Interactive dashboard with energy flows └── scripts/ └── oracle_test.py # Oracle + heuristic validation + determinism check ``` --- ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Health check | | `/schema` | GET | Action/observation/state JSON schemas | | `/metadata` | GET | Environment name and description | | `/reset` | POST | Reset environment (OpenEnv standard) | | `/step` | POST | Execute action (OpenEnv standard) | | `/state` | GET | Current state (OpenEnv standard) | | `/ws` | WebSocket | Persistent session (OpenEnv standard) | | `/api/reset` | POST | Stateful reset (dashboard) | | `/api/step` | POST | Stateful step (dashboard) | | `/api/state` | GET | Stateful state (dashboard) | | `/tasks` | GET | List available tasks | | `/dashboard/` | GET | Interactive web UI |