docs: clarify scenario count, OPENGRID_MODE flag; drop runtime/epoch info
Browse files- README/blog: remove training-runtime numbers (159.6 min, 2.5 hours, etc.)
and "3 epochs" annotations from the metrics table and prose.
- README: replace misleading "## The four scenarios" heading with
"## The scenarios" and split into two tables — 4 base grids and 3 Karnataka
difficulty variants — for an accurate 7-task picture.
- README: add a "Docker / Hugging Face Space — server vs training mode"
subsection explaining the OPENGRID_MODE env var (unset/server = live demo,
training = run GRPO and serve results) with local docker run examples.
Made-with: Cursor
README.md
CHANGED
|
@@ -78,7 +78,11 @@ Agents talk to the grid over HTTP. Any language, any framework — it's just `PO
|
|
| 78 |
|
| 79 |
---
|
| 80 |
|
| 81 |
-
## The
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
| Task | Buses | Agents | Renewables | What's hard about it |
|
| 84 |
|---|---|---|---|---|
|
|
@@ -87,9 +91,15 @@ Agents talk to the grid over HTTP. Any language, any framework — it's just `PO
|
|
| 87 |
| `task_hard` | 14 | 3 | 70% | Tight margins. Small mistakes blow up. |
|
| 88 |
| `task_karnataka` | 15 | 4 | Real mix | The actual KPTCL grid with GPS coordinates. |
|
| 89 |
|
| 90 |
-
|
| 91 |
|
| 92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
---
|
| 95 |
|
|
@@ -149,13 +159,32 @@ Or open one of the Colab notebooks in Google Colab (free T4 works for both):
|
|
| 149 |
|
| 150 |
Both notebooks produce the same `training/outputs/summary.json` schema, with a `framework` field identifying which path was used.
|
| 151 |
|
| 152 |
-
### Docker
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
|
| 154 |
```bash
|
| 155 |
docker build -t opengrid .
|
| 156 |
-
|
|
|
|
|
|
|
| 157 |
```
|
| 158 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
---
|
| 160 |
|
| 161 |
## The API in 30 seconds
|
|
@@ -277,8 +306,7 @@ We fine-tuned **Qwen/Qwen2.5-1.5B-Instruct** on `task_karnataka` using GRPO (Gro
|
|
| 277 |
| Framework | TRL `GRPOTrainer` + bitsandbytes 4-bit + PEFT LoRA |
|
| 278 |
| LoRA | rank=16, alpha=32, dropout=0.05 |
|
| 279 |
| Hardware | NVIDIA A10G (23.9 GB) |
|
| 280 |
-
|
|
| 281 |
-
| Steps | 449 across 600 prompts (3 epochs) |
|
| 282 |
| Optimizer | paged_adamw_8bit, lr=2e-5, cosine |
|
| 283 |
|
| 284 |
### What happened
|
|
|
|
| 78 |
|
| 79 |
---
|
| 80 |
|
| 81 |
+
## The scenarios
|
| 82 |
+
|
| 83 |
+
Seven scenarios in total — four base grids and three difficulty variants of the Karnataka topology used for curriculum learning.
|
| 84 |
+
|
| 85 |
+
**Base grids**
|
| 86 |
|
| 87 |
| Task | Buses | Agents | Renewables | What's hard about it |
|
| 88 |
|---|---|---|---|---|
|
|
|
|
| 91 |
| `task_hard` | 14 | 3 | 70% | Tight margins. Small mistakes blow up. |
|
| 92 |
| `task_karnataka` | 15 | 4 | Real mix | The actual KPTCL grid with GPS coordinates. |
|
| 93 |
|
| 94 |
+
**Karnataka stress-test variants** — same 15-bus topology, different operating conditions:
|
| 95 |
|
| 96 |
+
| Task | Renewables | Load | Line capacity |
|
| 97 |
+
|---|---|---|---|
|
| 98 |
+
| `karnataka_easy` | 0.3× | 0.6× | 1.5× |
|
| 99 |
+
| `karnataka_medium` | 0.7× | 1.0× | 1.0× |
|
| 100 |
+
| `karnataka_hard` | 1.3× | 1.4× | 0.75× |
|
| 101 |
+
|
| 102 |
+
Episodes run for 50 steps. Scores land between **0.02 and 0.98** (higher = better).
|
| 103 |
|
| 104 |
---
|
| 105 |
|
|
|
|
| 159 |
|
| 160 |
Both notebooks produce the same `training/outputs/summary.json` schema, with a `framework` field identifying which path was used.
|
| 161 |
|
| 162 |
+
### Docker / Hugging Face Space — server vs training mode
|
| 163 |
+
|
| 164 |
+
The same image powers both the live control room and the GRPO training run.
|
| 165 |
+
The behaviour is selected by a single environment variable, **`OPENGRID_MODE`**:
|
| 166 |
+
|
| 167 |
+
| `OPENGRID_MODE` | What runs |
|
| 168 |
+
|---|---|
|
| 169 |
+
| *unset* (default) — or `server` | Boots `uvicorn app:app` on port 7860 — the live control-room dashboard. **This is what the public HF Space serves.** |
|
| 170 |
+
| `training` | Starts the UI server in the background (so the HF health-check passes), then runs `python run_training.py` in the foreground. When training finishes, plots and `summary.json` are written to `training/outputs/` and the already-running UI keeps serving them. |
|
| 171 |
+
|
| 172 |
+
So, locally:
|
| 173 |
|
| 174 |
```bash
|
| 175 |
docker build -t opengrid .
|
| 176 |
+
|
| 177 |
+
docker run -p 7860:7860 opengrid # server mode (default)
|
| 178 |
+
docker run -p 7860:7860 -e OPENGRID_MODE=training opengrid # train, then serve results
|
| 179 |
```
|
| 180 |
|
| 181 |
+
On Hugging Face Spaces, the variable is set under
|
| 182 |
+
*Settings → Variables and secrets* — flip it to `training` to retrain on a GPU
|
| 183 |
+
Space, flip it back to `server` (or remove it) to go back to live demo mode.
|
| 184 |
+
The shipped `summary.json` and plots in this repo were produced exactly that
|
| 185 |
+
way: a one-off `OPENGRID_MODE=training` run on an A10G Space, after which the
|
| 186 |
+
variable was reset so the Space serves the trained results.
|
| 187 |
+
|
| 188 |
---
|
| 189 |
|
| 190 |
## The API in 30 seconds
|
|
|
|
| 306 |
| Framework | TRL `GRPOTrainer` + bitsandbytes 4-bit + PEFT LoRA |
|
| 307 |
| LoRA | rank=16, alpha=32, dropout=0.05 |
|
| 308 |
| Hardware | NVIDIA A10G (23.9 GB) |
|
| 309 |
+
| Steps | 449 across 600 prompts |
|
|
|
|
| 310 |
| Optimizer | paged_adamw_8bit, lr=2e-5, cosine |
|
| 311 |
|
| 312 |
### What happened
|
blog.md
CHANGED
|
@@ -309,8 +309,7 @@ After all that setup, the actual training was almost anticlimactic.
|
|
| 309 |
|
| 310 |
- Model: Qwen2.5-1.5B-Instruct
|
| 311 |
- Hardware: NVIDIA A10G (23.9 GB)
|
| 312 |
-
-
|
| 313 |
-
- Steps: 449 (across 600 prompts × 3 epochs)
|
| 314 |
- LR: 2e-5, cosine schedule
|
| 315 |
- Batch: 4 per device × 4 grad accum × 4 generations = effective 64
|
| 316 |
|
|
@@ -379,7 +378,7 @@ If any of this sounds interesting, here are three things you can do right now, i
|
|
| 379 |
|
| 380 |
**Medium** — point an LLM at it. The whole grid is exposed as REST endpoints. You don't even need Python — `curl` works. See [the README](README.md) for examples.
|
| 381 |
|
| 382 |
-
**Hard** — train your own agent. The code is at [github.com/krishnagoyal099/Opengrid_env](https://github.com/krishnagoyal099/Opengrid_env). The Colab notebook walks through the whole thing.
|
| 383 |
|
| 384 |
---
|
| 385 |
|
|
|
|
| 309 |
|
| 310 |
- Model: Qwen2.5-1.5B-Instruct
|
| 311 |
- Hardware: NVIDIA A10G (23.9 GB)
|
| 312 |
+
- Steps: 449 (across 600 prompts)
|
|
|
|
| 313 |
- LR: 2e-5, cosine schedule
|
| 314 |
- Batch: 4 per device × 4 grad accum × 4 generations = effective 64
|
| 315 |
|
|
|
|
| 378 |
|
| 379 |
**Medium** — point an LLM at it. The whole grid is exposed as REST endpoints. You don't even need Python — `curl` works. See [the README](README.md) for examples.
|
| 380 |
|
| 381 |
+
**Hard** — train your own agent. The code is at [github.com/krishnagoyal099/Opengrid_env](https://github.com/krishnagoyal099/Opengrid_env). The Colab notebook walks through the whole thing.
|
| 382 |
|
| 383 |
---
|
| 384 |
|