Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -14,21 +14,48 @@ tags:
|
|
| 14 |
|
| 15 |
# GridOps — Community Microgrid Bridge Operator
|
| 16 |
|
| 17 |
-
>
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
---
|
| 24 |
|
| 25 |
## Why This Environment Exists
|
| 26 |
|
| 27 |
-
Community microgrid operation is a **real job** in India under the [RDSS](https://rdss.gov.in/) (Revamped Distribution Sector Scheme). IEX prosumer bidding is live. Over 50 million Indian homes will have rooftop solar by 2030, and someone needs to manage the battery-grid-diesel tradeoff in real time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
|
| 30 |
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
---
|
| 34 |
|
|
|
|
| 14 |
|
| 15 |
# GridOps — Community Microgrid Bridge Operator
|
| 16 |
|
| 17 |
+
> A production-grade OpenEnv RL environment for Indian community microgrid operation. Plug-and-play. Deterministic. Benchmarkable.
|
| 18 |
|
| 19 |
+
**Live demo**: [77ethers-gridops.hf.space/dashboard/](https://77ethers-gridops.hf.space/dashboard/) | **HF Space**: [huggingface.co/spaces/77ethers/gridops](https://huggingface.co/spaces/77ethers/gridops)
|
| 20 |
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
## At a Glance
|
| 24 |
+
|
| 25 |
+
| | |
|
| 26 |
+
|---|---|
|
| 27 |
+
| **Domain** | Real-world Indian community microgrid operation (100 homes, summer) |
|
| 28 |
+
| **Interface** | Full OpenEnv spec: `reset()` -> `step(action)` -> `state()`, typed Pydantic models |
|
| 29 |
+
| **Actions** | 3D continuous: `battery_dispatch [-1,1]`, `diesel_dispatch [0,1]`, `demand_shedding [0,1]` |
|
| 30 |
+
| **Observations** | 30+ fields: demand, solar, price, SOC, forecasts, energy flows. Partial observability (noisy forecasts). |
|
| 31 |
+
| **Tasks** | 3 tasks (easy -> medium -> hard), each testing a different RL capability |
|
| 32 |
+
| **Grading** | Deterministic, programmatic, 0.0-1.0. Same seed = same score, every run. |
|
| 33 |
+
| **Reward** | Dense per-step signal, aligned with episode grader (50% cost + 25% reliability + 25% green) |
|
| 34 |
+
| **Anti-gaming** | 5 mechanisms: degradation, startup costs, rebound, smooth VoLL, grid cap |
|
| 35 |
+
| **Baseline** | Grok-4 LLM: 0.80/0.82/0.72 — beats hand-coded oracle on all tasks |
|
| 36 |
+
| **Deployment** | Docker + HF Space + `openenv validate` 6/6 pass |
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
## Why This Environment Exists
|
| 41 |
|
| 42 |
+
Community microgrid operation is a **real job** in India under the [RDSS](https://rdss.gov.in/) (Revamped Distribution Sector Scheme). IEX prosumer bidding is live. Over 50 million Indian homes will have rooftop solar by 2030, and someone — or some agent — needs to manage the battery-grid-diesel tradeoff in real time.
|
| 43 |
+
|
| 44 |
+
This is not a toy problem. This is what a microgrid operator at an Indian housing society actually decides every hour:
|
| 45 |
+
|
| 46 |
+
- **Should I charge the battery now** (grid is cheap at Rs 4/kWh) **or save capacity for tonight** (price will spike to Rs 15)?
|
| 47 |
+
- **Should I run diesel** (Rs 25/kWh + Rs 100 startup) **or risk a blackout** (Rs 150/kWh VoLL penalty)?
|
| 48 |
+
- **Should I ask residents to reduce AC usage** (Rs 40/kWh + 100% rebounds tomorrow)?
|
| 49 |
+
|
| 50 |
+
Simple heuristics provably fail. The environment requires multi-hour planning, price forecasting, and constraint management under partial observability.
|
| 51 |
|
| 52 |
+
### What makes this a strong benchmark
|
| 53 |
|
| 54 |
+
- **Any agent can plug in immediately** — typed JSON actions in, typed observations out, no custom hacks
|
| 55 |
+
- **Fully deterministic** — same seed, same actions = identical trajectory every time. Leaderboard-ready.
|
| 56 |
+
- **Tasks differentiate agents** — Do-Nothing scores 0.45-0.58, Oracle 0.70-0.81, Grok-4 LLM 0.72-0.82. Clear skill gradient.
|
| 57 |
+
- **Can't be gamed** — 5 anti-exploit mechanisms prevent reward hacking (detailed below)
|
| 58 |
+
- **Grader = ground truth** — programmatic, deterministic, partial credit, aligned with per-step reward
|
| 59 |
|
| 60 |
---
|
| 61 |
|