77ethers commited on
Commit
a439f7a
·
verified ·
1 Parent(s): 1873b55

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +33 -6
README.md CHANGED
@@ -14,21 +14,48 @@ tags:
14
 
15
  # GridOps — Community Microgrid Bridge Operator
16
 
17
- > Keep the lights on for 100 homes. Don't go broke. Don't pollute.
18
 
19
- An OpenEnv RL environment where an AI agent operates a **community microgrid in an Indian city during summer**. Every hour for 3 days, the agent decides how to use the battery, whether to run diesel, and whether to ask residents to cut usage — while the national grid automatically covers whatever is left over (up to its 200 kW limit).
20
 
21
- **Live dashboard**: [77ethers-gridops.hf.space/dashboard/](https://77ethers-gridops.hf.space/dashboard/)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ---
24
 
25
  ## Why This Environment Exists
26
 
27
- Community microgrid operation is a **real job** in India under the [RDSS](https://rdss.gov.in/) (Revamped Distribution Sector Scheme). IEX prosumer bidding is live. Over 50 million Indian homes will have rooftop solar by 2030, and someone needs to manage the battery-grid-diesel tradeoff in real time.
 
 
 
 
 
 
 
 
28
 
29
- This environment captures the core tension: **sell surplus solar during expensive evening peaks for profit — but if a heatwave continues tomorrow and your battery is empty, the neighborhood goes dark.**
30
 
31
- Simple heuristics ("always discharge at peak", "always run diesel when low") provably fail. The agent must learn multi-hour planning, price forecasting, and constraint management.
 
 
 
 
32
 
33
  ---
34
 
 
14
 
15
  # GridOps — Community Microgrid Bridge Operator
16
 
17
+ > A production-grade OpenEnv RL environment for Indian community microgrid operation. Plug-and-play. Deterministic. Benchmarkable.
18
 
19
+ **Live demo**: [77ethers-gridops.hf.space/dashboard/](https://77ethers-gridops.hf.space/dashboard/) | **HF Space**: [huggingface.co/spaces/77ethers/gridops](https://huggingface.co/spaces/77ethers/gridops)
20
 
21
+ ---
22
+
23
+ ## At a Glance
24
+
25
+ | | |
26
+ |---|---|
27
+ | **Domain** | Real-world Indian community microgrid operation (100 homes, summer) |
28
+ | **Interface** | Full OpenEnv spec: `reset()` -> `step(action)` -> `state()`, typed Pydantic models |
29
+ | **Actions** | 3D continuous: `battery_dispatch [-1,1]`, `diesel_dispatch [0,1]`, `demand_shedding [0,1]` |
30
+ | **Observations** | 30+ fields: demand, solar, price, SOC, forecasts, energy flows. Partial observability (noisy forecasts). |
31
+ | **Tasks** | 3 tasks (easy -> medium -> hard), each testing a different RL capability |
32
+ | **Grading** | Deterministic, programmatic, 0.0-1.0. Same seed = same score, every run. |
33
+ | **Reward** | Dense per-step signal, aligned with episode grader (50% cost + 25% reliability + 25% green) |
34
+ | **Anti-gaming** | 5 mechanisms: degradation, startup costs, rebound, smooth VoLL, grid cap |
35
+ | **Baseline** | Grok-4 LLM: 0.80/0.82/0.72 — beats hand-coded oracle on all tasks |
36
+ | **Deployment** | Docker + HF Space + `openenv validate` 6/6 pass |
37
 
38
  ---
39
 
40
  ## Why This Environment Exists
41
 
42
+ Community microgrid operation is a **real job** in India under the [RDSS](https://rdss.gov.in/) (Revamped Distribution Sector Scheme). IEX prosumer bidding is live. Over 50 million Indian homes will have rooftop solar by 2030, and someone — or some agent — needs to manage the battery-grid-diesel tradeoff in real time.
43
+
44
+ This is not a toy problem. This is what a microgrid operator at an Indian housing society actually decides every hour:
45
+
46
+ - **Should I charge the battery now** (grid is cheap at Rs 4/kWh) **or save capacity for tonight** (price will spike to Rs 15)?
47
+ - **Should I run diesel** (Rs 25/kWh + Rs 100 startup) **or risk a blackout** (Rs 150/kWh VoLL penalty)?
48
+ - **Should I ask residents to reduce AC usage** (Rs 40/kWh + 100% rebounds tomorrow)?
49
+
50
+ Simple heuristics provably fail. The environment requires multi-hour planning, price forecasting, and constraint management under partial observability.
51
 
52
+ ### What makes this a strong benchmark
53
 
54
+ - **Any agent can plug in immediately** typed JSON actions in, typed observations out, no custom hacks
55
+ - **Fully deterministic** — same seed, same actions = identical trajectory every time. Leaderboard-ready.
56
+ - **Tasks differentiate agents** — Do-Nothing scores 0.45-0.58, Oracle 0.70-0.81, Grok-4 LLM 0.72-0.82. Clear skill gradient.
57
+ - **Can't be gamed** — 5 anti-exploit mechanisms prevent reward hacking (detailed below)
58
+ - **Grader = ground truth** — programmatic, deterministic, partial credit, aligned with per-step reward
59
 
60
  ---
61