gridops / README.md
77ethers's picture
Upload README.md with huggingface_hub
a439f7a verified
metadata
title: GridOps
emoji: 
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000
tags:
  - openenv
  - reinforcement-learning
  - microgrid
  - energy

GridOps — Community Microgrid Bridge Operator

A production-grade OpenEnv RL environment for Indian community microgrid operation. Plug-and-play. Deterministic. Benchmarkable.

Live demo: 77ethers-gridops.hf.space/dashboard/ | HF Space: huggingface.co/spaces/77ethers/gridops


At a Glance

Domain Real-world Indian community microgrid operation (100 homes, summer)
Interface Full OpenEnv spec: reset() -> step(action) -> state(), typed Pydantic models
Actions 3D continuous: battery_dispatch [-1,1], diesel_dispatch [0,1], demand_shedding [0,1]
Observations 30+ fields: demand, solar, price, SOC, forecasts, energy flows. Partial observability (noisy forecasts).
Tasks 3 tasks (easy -> medium -> hard), each testing a different RL capability
Grading Deterministic, programmatic, 0.0-1.0. Same seed = same score, every run.
Reward Dense per-step signal, aligned with episode grader (50% cost + 25% reliability + 25% green)
Anti-gaming 5 mechanisms: degradation, startup costs, rebound, smooth VoLL, grid cap
Baseline Grok-4 LLM: 0.80/0.82/0.72 — beats hand-coded oracle on all tasks
Deployment Docker + HF Space + openenv validate 6/6 pass

Why This Environment Exists

Community microgrid operation is a real job in India under the RDSS (Revamped Distribution Sector Scheme). IEX prosumer bidding is live. Over 50 million Indian homes will have rooftop solar by 2030, and someone — or some agent — needs to manage the battery-grid-diesel tradeoff in real time.

This is not a toy problem. This is what a microgrid operator at an Indian housing society actually decides every hour:

  • Should I charge the battery now (grid is cheap at Rs 4/kWh) or save capacity for tonight (price will spike to Rs 15)?
  • Should I run diesel (Rs 25/kWh + Rs 100 startup) or risk a blackout (Rs 150/kWh VoLL penalty)?
  • Should I ask residents to reduce AC usage (Rs 40/kWh + 100% rebounds tomorrow)?

Simple heuristics provably fail. The environment requires multi-hour planning, price forecasting, and constraint management under partial observability.

What makes this a strong benchmark

  • Any agent can plug in immediately — typed JSON actions in, typed observations out, no custom hacks
  • Fully deterministic — same seed, same actions = identical trajectory every time. Leaderboard-ready.
  • Tasks differentiate agents — Do-Nothing scores 0.45-0.58, Oracle 0.70-0.81, Grok-4 LLM 0.72-0.82. Clear skill gradient.
  • Can't be gamed — 5 anti-exploit mechanisms prevent reward hacking (detailed below)
  • Grader = ground truth — programmatic, deterministic, partial credit, aligned with per-step reward

The Problem at a Glance

You have:

  • Solar panels — 250 kW peak, free, but only during daylight
  • Community battery — 500 kWh storage, 100 kW max charge/discharge
  • Diesel generator — 100 kW, but Rs 25/kWh + Rs 100 startup cost
  • National grid — auto-imports/exports as slack (capped at 200 kW)

You control (3 continuous actions):

Action Range What it does
battery_dispatch -1 to +1 Charge (-100 kW) or discharge (+100 kW). Rs 2.5/kWh degradation.
diesel_dispatch 0 to 1 Diesel output (0-100 kW). Rs 25/kWh + Rs 100 startup if was off.
demand_shedding 0 to 1 Ask residents to cut 0-20% usage. 100% rebounds next hour. Rs 40/kWh penalty.

You do NOT control the grid. It automatically absorbs whatever energy gap remains after your decisions. If the gap exceeds 200 kW, that's a blackout (Rs 150/kWh penalty).


The Critical Bottleneck

At 8 PM every evening, demand hits 250 kW but the grid maxes out at 200 kW and solar is zero.

The 50 kW gap must come from your battery. If you discharged it for profit during the day, the neighborhood goes dark.

On a heatwave day (Task 2-3), demand spikes to 325-375 kW. Now the gap is 125-175 kW — you need battery + diesel + shedding just to survive. And in Task 3, the grid goes down entirely for 6 hours.


What the Agent Sees (Observation)

Field Description
hour Current hour in episode (0-72, starting 6 AM)
demand_kw What the 100 homes need right now
solar_kw Free solar power available (0 at night, up to 250 kW midday)
battery_soc Battery charge level (0-1, i.e. 0-500 kWh)
grid_price Current IEX electricity price (Rs 3-20/kWh)
diesel_fuel_remaining Diesel tank level (0-1)
diesel_is_on Was diesel running last step? (startup cost if turning on)
demand_forecast_4h Noisy 4-hour demand forecast (+-15%)
solar_forecast_4h Noisy 4-hour solar forecast
price_forecast_4h Noisy 4-hour price forecast
cumulative_blackout_kwh Total blackout energy so far
cumulative_cost Total money spent so far (Rs)
flow_* Detailed energy flows (solar, grid import/export, battery in/out, diesel, demand)

Partial observability: forecasts have +-15% Gaussian noise. The agent cannot perfectly predict heatwave intensity, cloud cover, or price spikes.


3 Tasks (Each Tests a Different RL Capability)

Task 1: Normal Summer (Easy) — Tests basic arbitrage

  • Clear skies, standard demand (~100 kW avg, 250 kW peak)
  • Grid prices Rs 3-12 with clear cheap night / expensive evening pattern
  • What the agent must learn: charge battery at night (cheap grid), discharge during evening peak (expensive grid), let solar cover midday

Task 2: Heatwave + Price Spike (Medium) — Tests temporal planning

  • Day 2-3 heatwave (+30% demand), intermittent clouds
  • Rs 20 price spike on Day 2 evening — visible in 4-hour forecast
  • What the agent must learn: read the forecast, hold battery charge for the spike instead of greedily discharging early. A greedy policy discharges mid-afternoon; an RL agent that reads the forecast holds until 6 PM.

Task 3: Extreme Crisis + Grid Outage (Hard) — Tests constraint management

  • Full 3-day heatwave, -30% solar from haze, +50% demand
  • Limited diesel (33% tank = ~8 hours at full power)
  • 6-hour grid outage on Day 2 afternoon — grid cap drops to 0 kW
  • What the agent must learn: aggressively pre-charge battery before the outage, ration diesel across the outage window, shed demand strategically to stretch resources. This is true microgrid islanding.

Grading (0.0 - 1.0)

score = 0.50 x cost_efficiency + 0.25 x reliability + 0.25 x green_score
Component Formula What it rewards
Cost efficiency (50%) 1 - (agent_cost / baseline_cost) Spending less than a dumb "max grid import" baseline
Reliability (25%) (demand_met - blackout) / demand_met Keeping the lights on
Green score (25%) 1 - (diesel_used / total_demand) Minimizing diesel emissions

Baseline: "import max grid every hour, no battery/diesel/shedding" — physically possible, but expensive and suffers blackouts during peak hours and grid outages.

VoLL (Value of Lost Load): Rs 150/kWh blackout penalty. This is a smooth gradient — no hard reliability cliff. The agent always gets signal for reducing blackouts incrementally.


Why Heuristics Fail

Strategy Why it fails
"Always discharge battery" Empty by evening peak. 50 kW gap = blackout. Score collapses.
"Always run diesel" Rs 25/kWh vs Rs 5 grid at night. Hemorrhages money. Green score = 0.
"Shed demand whenever short" Rs 40/kWh cost + 100% rebounds next hour. More expensive than diesel.
"Discharge when price > X" Ignores battery state. Drains SOC before the real peak.
"Do nothing" Grid alone can't cover evening peak. 3.6% blackout rate.

The oracle (rule-based, time-of-day + price-aware) scores 0.70-0.81. There's a clear 0.20-0.35 gap between heuristics and the oracle, proving the environment has real optimization headroom.


Anti-Gaming Design

The environment has 5 mechanisms that prevent reward hacking:

  1. Shedding is expensive — Rs 40/kWh + 100% rebound. Costlier than diesel. True emergency only.
  2. Battery degradation — Rs 2.5/kWh throughput. Prevents infinite cycling for tiny arbitrage.
  3. Diesel startup cost — Rs 100 per on-switch. Prevents on/off toggling.
  4. VoLL is smooth — Rs 150/kWh with no cliff. Agent can't exploit a binary gate.
  5. Grid is capped — 200 kW max (0 during outages). Can't just buy everything.

Baseline Scores

Strategy Task 1 Task 2 Task 3 What it does
Grok-4 (LLM) 0.80 0.82 0.72 Reads observations, reasons about tradeoffs
Oracle (rule-based) 0.79 0.81 0.70 Time-of-day + price + SOC heuristic
Do-Nothing (grid only) 0.58 0.51 0.45 Grid covers everything it can
Always-Discharge 0.59 0.51 0.45 Drains battery, empty by evening
Always-Diesel 0.42 0.42 0.44 Rs 25/kWh burns money
  • LLM beats oracle: Grok-4 matched or exceeded the hand-coded oracle on every task
  • Deterministic: identical scores across 3 runs (seeded RNG)
  • Oracle ceiling < 1.0: real physics constraints, not inflated scores
  • Clear separation: LLM > oracle >> heuristics (0.20-0.38 gap from best to worst)
  • Task 3 hardest: grid outage makes it genuinely challenging even for frontier LLMs

Key Physics

Component Spec Cost
Solar 250 kW peak, bell curve 6 AM - 6 PM Free
Battery 500 kWh, 100 kW max, 90% round-trip (sqrt each way) Rs 2.5/kWh degradation
Diesel 100 kW max Rs 25/kWh + Rs 100 startup
Grid 200 kW max import/export (slack variable) Market price Rs 3-20/kWh
Blackout Unmet demand when all sources exhausted Rs 150/kWh VoLL penalty
Shedding Up to 20% demand reduction Rs 40/kWh + 100% rebound next hour

Energy balance every step:

supply  = solar + grid_import + battery_discharge + diesel
consume = effective_demand + grid_export + battery_charge

Supply always equals consumption. Any unmet demand beyond grid cap = blackout.


Setup & Usage

# Install
pip install -e .

# Run server
uvicorn gridops.server.app:app --port 8000

# Interactive dashboard
open http://localhost:8000/dashboard/

# Validate oracle + determinism
python scripts/oracle_test.py

# Run LLM baseline
export API_BASE_URL="https://router.huggingface.co/v1"
export HF_TOKEN="your-token"
export MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct"
python inference.py

Docker

docker build -t gridops .
docker run -p 8000:8000 gridops

OpenEnv Validation

# Local structure check
openenv validate

# Runtime check (against live server)
openenv validate --url http://localhost:8000

Project Structure

gridops/
├── inference.py                 # LLM baseline (API_BASE_URL, MODEL_NAME, HF_TOKEN)
├── openenv.yaml                 # OpenEnv manifest
├── Dockerfile                   # Docker deployment
├── server/app.py                # Root entry point (openenv validate)
├── gridops/
│   ├── models.py                # GridOpsAction, GridOpsObservation (Pydantic)
│   ├── simulation/
│   │   ├── physics.py           # Energy balance, battery, VoLL, degradation, outages
│   │   └── scenarios.py         # Demand/solar/price curve generators
│   ├── tasks/
│   │   ├── definitions.py       # 3 task configs (normal, heatwave, crisis+outage)
│   │   └── graders.py           # 0-1 scoring: cost + reliability + green
│   └── server/
│       ├── app.py               # FastAPI + OpenEnv create_app
│       ├── environment.py       # OpenEnv Environment class
│       └── static/index.html    # Interactive dashboard with energy flows
└── scripts/
    └── oracle_test.py           # Oracle + heuristic validation + determinism check

API Endpoints

Endpoint Method Description
/health GET Health check
/schema GET Action/observation/state JSON schemas
/metadata GET Environment name and description
/reset POST Reset environment (OpenEnv standard)
/step POST Execute action (OpenEnv standard)
/state GET Current state (OpenEnv standard)
/ws WebSocket Persistent session (OpenEnv standard)
/api/reset POST Stateful reset (dashboard)
/api/step POST Stateful step (dashboard)
/api/state GET Stateful state (dashboard)
/tasks GET List available tasks
/dashboard/ GET Interactive web UI