Spaces:

77ethers
/

gridops

Running

App Files Files Community

gridops / README.md

77ethers

Upload README.md with huggingface_hub

a439f7a verified 19 days ago

preview code

raw

history blame contribute delete

13.3 kB

metadata

title: GridOps
emoji: ⚡
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000
tags:
  - openenv
  - reinforcement-learning
  - microgrid
  - energy

GridOps — Community Microgrid Bridge Operator

A production-grade OpenEnv RL environment for Indian community microgrid operation. Plug-and-play. Deterministic. Benchmarkable.

Live demo: 77ethers-gridops.hf.space/dashboard/ | HF Space: huggingface.co/spaces/77ethers/gridops

At a Glance


Domain	Real-world Indian community microgrid operation (100 homes, summer)
Interface	Full OpenEnv spec: `reset()` -> `step(action)` -> `state()`, typed Pydantic models
Actions	3D continuous: `battery_dispatch [-1,1]`, `diesel_dispatch [0,1]`, `demand_shedding [0,1]`
Observations	30+ fields: demand, solar, price, SOC, forecasts, energy flows. Partial observability (noisy forecasts).
Tasks	3 tasks (easy -> medium -> hard), each testing a different RL capability
Grading	Deterministic, programmatic, 0.0-1.0. Same seed = same score, every run.
Reward	Dense per-step signal, aligned with episode grader (50% cost + 25% reliability + 25% green)
Anti-gaming	5 mechanisms: degradation, startup costs, rebound, smooth VoLL, grid cap
Baseline	Grok-4 LLM: 0.80/0.82/0.72 — beats hand-coded oracle on all tasks
Deployment	Docker + HF Space + `openenv validate` 6/6 pass

Why This Environment Exists

Community microgrid operation is a real job in India under the RDSS (Revamped Distribution Sector Scheme). IEX prosumer bidding is live. Over 50 million Indian homes will have rooftop solar by 2030, and someone — or some agent — needs to manage the battery-grid-diesel tradeoff in real time.

This is not a toy problem. This is what a microgrid operator at an Indian housing society actually decides every hour:

Should I charge the battery now (grid is cheap at Rs 4/kWh) or save capacity for tonight (price will spike to Rs 15)?
Should I run diesel (Rs 25/kWh + Rs 100 startup) or risk a blackout (Rs 150/kWh VoLL penalty)?
Should I ask residents to reduce AC usage (Rs 40/kWh + 100% rebounds tomorrow)?

Simple heuristics provably fail. The environment requires multi-hour planning, price forecasting, and constraint management under partial observability.

What makes this a strong benchmark

Any agent can plug in immediately — typed JSON actions in, typed observations out, no custom hacks
Fully deterministic — same seed, same actions = identical trajectory every time. Leaderboard-ready.
Tasks differentiate agents — Do-Nothing scores 0.45-0.58, Oracle 0.70-0.81, Grok-4 LLM 0.72-0.82. Clear skill gradient.
Can't be gamed — 5 anti-exploit mechanisms prevent reward hacking (detailed below)
Grader = ground truth — programmatic, deterministic, partial credit, aligned with per-step reward

The Problem at a Glance

You have:

Solar panels — 250 kW peak, free, but only during daylight
Community battery — 500 kWh storage, 100 kW max charge/discharge
Diesel generator — 100 kW, but Rs 25/kWh + Rs 100 startup cost
National grid — auto-imports/exports as slack (capped at 200 kW)

You control (3 continuous actions):

Action	Range	What it does
`battery_dispatch`	-1 to +1	Charge (-100 kW) or discharge (+100 kW). Rs 2.5/kWh degradation.
`diesel_dispatch`	0 to 1	Diesel output (0-100 kW). Rs 25/kWh + Rs 100 startup if was off.
`demand_shedding`	0 to 1	Ask residents to cut 0-20% usage. 100% rebounds next hour. Rs 40/kWh penalty.

You do NOT control the grid. It automatically absorbs whatever energy gap remains after your decisions. If the gap exceeds 200 kW, that's a blackout (Rs 150/kWh penalty).

The Critical Bottleneck

At 8 PM every evening, demand hits 250 kW but the grid maxes out at 200 kW and solar is zero.

The 50 kW gap must come from your battery. If you discharged it for profit during the day, the neighborhood goes dark.

On a heatwave day (Task 2-3), demand spikes to 325-375 kW. Now the gap is 125-175 kW — you need battery + diesel + shedding just to survive. And in Task 3, the grid goes down entirely for 6 hours.

What the Agent Sees (Observation)

Field	Description
`hour`	Current hour in episode (0-72, starting 6 AM)
`demand_kw`	What the 100 homes need right now
`solar_kw`	Free solar power available (0 at night, up to 250 kW midday)
`battery_soc`	Battery charge level (0-1, i.e. 0-500 kWh)
`grid_price`	Current IEX electricity price (Rs 3-20/kWh)
`diesel_fuel_remaining`	Diesel tank level (0-1)
`diesel_is_on`	Was diesel running last step? (startup cost if turning on)
`demand_forecast_4h`	Noisy 4-hour demand forecast (+-15%)
`solar_forecast_4h`	Noisy 4-hour solar forecast
`price_forecast_4h`	Noisy 4-hour price forecast
`cumulative_blackout_kwh`	Total blackout energy so far
`cumulative_cost`	Total money spent so far (Rs)
`flow_*`	Detailed energy flows (solar, grid import/export, battery in/out, diesel, demand)

Partial observability: forecasts have +-15% Gaussian noise. The agent cannot perfectly predict heatwave intensity, cloud cover, or price spikes.

3 Tasks (Each Tests a Different RL Capability)

Task 1: Normal Summer (Easy) — Tests basic arbitrage

Clear skies, standard demand (~100 kW avg, 250 kW peak)
Grid prices Rs 3-12 with clear cheap night / expensive evening pattern
What the agent must learn: charge battery at night (cheap grid), discharge during evening peak (expensive grid), let solar cover midday

Task 2: Heatwave + Price Spike (Medium) — Tests temporal planning

Day 2-3 heatwave (+30% demand), intermittent clouds
Rs 20 price spike on Day 2 evening — visible in 4-hour forecast
What the agent must learn: read the forecast, hold battery charge for the spike instead of greedily discharging early. A greedy policy discharges mid-afternoon; an RL agent that reads the forecast holds until 6 PM.

Task 3: Extreme Crisis + Grid Outage (Hard) — Tests constraint management

Full 3-day heatwave, -30% solar from haze, +50% demand
Limited diesel (33% tank = ~8 hours at full power)
6-hour grid outage on Day 2 afternoon — grid cap drops to 0 kW
What the agent must learn: aggressively pre-charge battery before the outage, ration diesel across the outage window, shed demand strategically to stretch resources. This is true microgrid islanding.

Grading (0.0 - 1.0)

score = 0.50 x cost_efficiency + 0.25 x reliability + 0.25 x green_score

Component	Formula	What it rewards
Cost efficiency (50%)	`1 - (agent_cost / baseline_cost)`	Spending less than a dumb "max grid import" baseline
Reliability (25%)	`(demand_met - blackout) / demand_met`	Keeping the lights on
Green score (25%)	`1 - (diesel_used / total_demand)`	Minimizing diesel emissions

Baseline: "import max grid every hour, no battery/diesel/shedding" — physically possible, but expensive and suffers blackouts during peak hours and grid outages.

VoLL (Value of Lost Load): Rs 150/kWh blackout penalty. This is a smooth gradient — no hard reliability cliff. The agent always gets signal for reducing blackouts incrementally.

Why Heuristics Fail

Strategy	Why it fails
"Always discharge battery"	Empty by evening peak. 50 kW gap = blackout. Score collapses.
"Always run diesel"	Rs 25/kWh vs Rs 5 grid at night. Hemorrhages money. Green score = 0.
"Shed demand whenever short"	Rs 40/kWh cost + 100% rebounds next hour. More expensive than diesel.
"Discharge when price > X"	Ignores battery state. Drains SOC before the real peak.
"Do nothing"	Grid alone can't cover evening peak. 3.6% blackout rate.

The oracle (rule-based, time-of-day + price-aware) scores 0.70-0.81. There's a clear 0.20-0.35 gap between heuristics and the oracle, proving the environment has real optimization headroom.

Anti-Gaming Design

The environment has 5 mechanisms that prevent reward hacking:

Shedding is expensive — Rs 40/kWh + 100% rebound. Costlier than diesel. True emergency only.
Battery degradation — Rs 2.5/kWh throughput. Prevents infinite cycling for tiny arbitrage.
Diesel startup cost — Rs 100 per on-switch. Prevents on/off toggling.
VoLL is smooth — Rs 150/kWh with no cliff. Agent can't exploit a binary gate.
Grid is capped — 200 kW max (0 during outages). Can't just buy everything.

Baseline Scores

Strategy	Task 1	Task 2	Task 3	What it does
Grok-4 (LLM)	0.80	0.82	0.72	Reads observations, reasons about tradeoffs
Oracle (rule-based)	0.79	0.81	0.70	Time-of-day + price + SOC heuristic
Do-Nothing (grid only)	0.58	0.51	0.45	Grid covers everything it can
Always-Discharge	0.59	0.51	0.45	Drains battery, empty by evening
Always-Diesel	0.42	0.42	0.44	Rs 25/kWh burns money

LLM beats oracle: Grok-4 matched or exceeded the hand-coded oracle on every task
Deterministic: identical scores across 3 runs (seeded RNG)
Oracle ceiling < 1.0: real physics constraints, not inflated scores
Clear separation: LLM > oracle >> heuristics (0.20-0.38 gap from best to worst)
Task 3 hardest: grid outage makes it genuinely challenging even for frontier LLMs

Key Physics

Component	Spec	Cost
Solar	250 kW peak, bell curve 6 AM - 6 PM	Free
Battery	500 kWh, 100 kW max, 90% round-trip (sqrt each way)	Rs 2.5/kWh degradation
Diesel	100 kW max	Rs 25/kWh + Rs 100 startup
Grid	200 kW max import/export (slack variable)	Market price Rs 3-20/kWh
Blackout	Unmet demand when all sources exhausted	Rs 150/kWh VoLL penalty
Shedding	Up to 20% demand reduction	Rs 40/kWh + 100% rebound next hour

Energy balance every step:

supply  = solar + grid_import + battery_discharge + diesel
consume = effective_demand + grid_export + battery_charge

Supply always equals consumption. Any unmet demand beyond grid cap = blackout.

Setup & Usage

# Install
pip install -e .

# Run server
uvicorn gridops.server.app:app --port 8000

# Interactive dashboard
open http://localhost:8000/dashboard/

# Validate oracle + determinism
python scripts/oracle_test.py

# Run LLM baseline
export API_BASE_URL="https://router.huggingface.co/v1"
export HF_TOKEN="your-token"
export MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct"
python inference.py

Docker

docker build -t gridops .
docker run -p 8000:8000 gridops

OpenEnv Validation

# Local structure check
openenv validate

# Runtime check (against live server)
openenv validate --url http://localhost:8000

Project Structure

gridops/
├── inference.py                 # LLM baseline (API_BASE_URL, MODEL_NAME, HF_TOKEN)
├── openenv.yaml                 # OpenEnv manifest
├── Dockerfile                   # Docker deployment
├── server/app.py                # Root entry point (openenv validate)
├── gridops/
│   ├── models.py                # GridOpsAction, GridOpsObservation (Pydantic)
│   ├── simulation/
│   │   ├── physics.py           # Energy balance, battery, VoLL, degradation, outages
│   │   └── scenarios.py         # Demand/solar/price curve generators
│   ├── tasks/
│   │   ├── definitions.py       # 3 task configs (normal, heatwave, crisis+outage)
│   │   └── graders.py           # 0-1 scoring: cost + reliability + green
│   └── server/
│       ├── app.py               # FastAPI + OpenEnv create_app
│       ├── environment.py       # OpenEnv Environment class
│       └── static/index.html    # Interactive dashboard with energy flows
└── scripts/
    └── oracle_test.py           # Oracle + heuristic validation + determinism check

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/schema`	GET	Action/observation/state JSON schemas
`/metadata`	GET	Environment name and description
`/reset`	POST	Reset environment (OpenEnv standard)
`/step`	POST	Execute action (OpenEnv standard)
`/state`	GET	Current state (OpenEnv standard)
`/ws`	WebSocket	Persistent session (OpenEnv standard)
`/api/reset`	POST	Stateful reset (dashboard)
`/api/step`	POST	Stateful step (dashboard)
`/api/state`	GET	Stateful state (dashboard)
`/tasks`	GET	List available tasks
`/dashboard/`	GET	Interactive web UI