K446 commited on
Commit
2f2ff77
·
1 Parent(s): e81353d

docs: clarify scenario count, OPENGRID_MODE flag; drop runtime/epoch info

Browse files

- README/blog: remove training-runtime numbers (159.6 min, 2.5 hours, etc.)
and "3 epochs" annotations from the metrics table and prose.
- README: replace misleading "## The four scenarios" heading with
"## The scenarios" and split into two tables — 4 base grids and 3 Karnataka
difficulty variants — for an accurate 7-task picture.
- README: add a "Docker / Hugging Face Space — server vs training mode"
subsection explaining the OPENGRID_MODE env var (unset/server = live demo,
training = run GRPO and serve results) with local docker run examples.

Made-with: Cursor

Files changed (2) hide show
  1. README.md +35 -7
  2. blog.md +2 -3
README.md CHANGED
@@ -78,7 +78,11 @@ Agents talk to the grid over HTTP. Any language, any framework — it's just `PO
78
 
79
  ---
80
 
81
- ## The four scenarios
 
 
 
 
82
 
83
  | Task | Buses | Agents | Renewables | What's hard about it |
84
  |---|---|---|---|---|
@@ -87,9 +91,15 @@ Agents talk to the grid over HTTP. Any language, any framework — it's just `PO
87
  | `task_hard` | 14 | 3 | 70% | Tight margins. Small mistakes blow up. |
88
  | `task_karnataka` | 15 | 4 | Real mix | The actual KPTCL grid with GPS coordinates. |
89
 
90
- Episodes run for 50 steps. Scores land between **0.02 and 0.98** (higher = better).
91
 
92
- There are also three "stress test" variants of Karnataka — `karnataka_easy`, `karnataka_medium`, `karnataka_hard` — that crank the volatility, fault rates, and renewable share progressively.
 
 
 
 
 
 
93
 
94
  ---
95
 
@@ -149,13 +159,32 @@ Or open one of the Colab notebooks in Google Colab (free T4 works for both):
149
 
150
  Both notebooks produce the same `training/outputs/summary.json` schema, with a `framework` field identifying which path was used.
151
 
152
- ### Docker
 
 
 
 
 
 
 
 
 
 
153
 
154
  ```bash
155
  docker build -t opengrid .
156
- docker run -p 7860:7860 opengrid
 
 
157
  ```
158
 
 
 
 
 
 
 
 
159
  ---
160
 
161
  ## The API in 30 seconds
@@ -277,8 +306,7 @@ We fine-tuned **Qwen/Qwen2.5-1.5B-Instruct** on `task_karnataka` using GRPO (Gro
277
  | Framework | TRL `GRPOTrainer` + bitsandbytes 4-bit + PEFT LoRA |
278
  | LoRA | rank=16, alpha=32, dropout=0.05 |
279
  | Hardware | NVIDIA A10G (23.9 GB) |
280
- | Time | 159.6 minutes |
281
- | Steps | 449 across 600 prompts (3 epochs) |
282
  | Optimizer | paged_adamw_8bit, lr=2e-5, cosine |
283
 
284
  ### What happened
 
78
 
79
  ---
80
 
81
+ ## The scenarios
82
+
83
+ Seven scenarios in total — four base grids and three difficulty variants of the Karnataka topology used for curriculum learning.
84
+
85
+ **Base grids**
86
 
87
  | Task | Buses | Agents | Renewables | What's hard about it |
88
  |---|---|---|---|---|
 
91
  | `task_hard` | 14 | 3 | 70% | Tight margins. Small mistakes blow up. |
92
  | `task_karnataka` | 15 | 4 | Real mix | The actual KPTCL grid with GPS coordinates. |
93
 
94
+ **Karnataka stress-test variants** same 15-bus topology, different operating conditions:
95
 
96
+ | Task | Renewables | Load | Line capacity |
97
+ |---|---|---|---|
98
+ | `karnataka_easy` | 0.3× | 0.6× | 1.5× |
99
+ | `karnataka_medium` | 0.7× | 1.0× | 1.0× |
100
+ | `karnataka_hard` | 1.3× | 1.4× | 0.75× |
101
+
102
+ Episodes run for 50 steps. Scores land between **0.02 and 0.98** (higher = better).
103
 
104
  ---
105
 
 
159
 
160
  Both notebooks produce the same `training/outputs/summary.json` schema, with a `framework` field identifying which path was used.
161
 
162
+ ### Docker / Hugging Face Space — server vs training mode
163
+
164
+ The same image powers both the live control room and the GRPO training run.
165
+ The behaviour is selected by a single environment variable, **`OPENGRID_MODE`**:
166
+
167
+ | `OPENGRID_MODE` | What runs |
168
+ |---|---|
169
+ | *unset* (default) — or `server` | Boots `uvicorn app:app` on port 7860 — the live control-room dashboard. **This is what the public HF Space serves.** |
170
+ | `training` | Starts the UI server in the background (so the HF health-check passes), then runs `python run_training.py` in the foreground. When training finishes, plots and `summary.json` are written to `training/outputs/` and the already-running UI keeps serving them. |
171
+
172
+ So, locally:
173
 
174
  ```bash
175
  docker build -t opengrid .
176
+
177
+ docker run -p 7860:7860 opengrid # server mode (default)
178
+ docker run -p 7860:7860 -e OPENGRID_MODE=training opengrid # train, then serve results
179
  ```
180
 
181
+ On Hugging Face Spaces, the variable is set under
182
+ *Settings → Variables and secrets* — flip it to `training` to retrain on a GPU
183
+ Space, flip it back to `server` (or remove it) to go back to live demo mode.
184
+ The shipped `summary.json` and plots in this repo were produced exactly that
185
+ way: a one-off `OPENGRID_MODE=training` run on an A10G Space, after which the
186
+ variable was reset so the Space serves the trained results.
187
+
188
  ---
189
 
190
  ## The API in 30 seconds
 
306
  | Framework | TRL `GRPOTrainer` + bitsandbytes 4-bit + PEFT LoRA |
307
  | LoRA | rank=16, alpha=32, dropout=0.05 |
308
  | Hardware | NVIDIA A10G (23.9 GB) |
309
+ | Steps | 449 across 600 prompts |
 
310
  | Optimizer | paged_adamw_8bit, lr=2e-5, cosine |
311
 
312
  ### What happened
blog.md CHANGED
@@ -309,8 +309,7 @@ After all that setup, the actual training was almost anticlimactic.
309
 
310
  - Model: Qwen2.5-1.5B-Instruct
311
  - Hardware: NVIDIA A10G (23.9 GB)
312
- - Time: ~160 minutes
313
- - Steps: 449 (across 600 prompts × 3 epochs)
314
  - LR: 2e-5, cosine schedule
315
  - Batch: 4 per device × 4 grad accum × 4 generations = effective 64
316
 
@@ -379,7 +378,7 @@ If any of this sounds interesting, here are three things you can do right now, i
379
 
380
  **Medium** — point an LLM at it. The whole grid is exposed as REST endpoints. You don't even need Python — `curl` works. See [the README](README.md) for examples.
381
 
382
- **Hard** — train your own agent. The code is at [github.com/krishnagoyal099/Opengrid_env](https://github.com/krishnagoyal099/Opengrid_env). The Colab notebook walks through the whole thing. A T4 will do it overnight. An A10G will do it in 2.5 hours.
383
 
384
  ---
385
 
 
309
 
310
  - Model: Qwen2.5-1.5B-Instruct
311
  - Hardware: NVIDIA A10G (23.9 GB)
312
+ - Steps: 449 (across 600 prompts)
 
313
  - LR: 2e-5, cosine schedule
314
  - Batch: 4 per device × 4 grad accum × 4 generations = effective 64
315
 
 
378
 
379
  **Medium** — point an LLM at it. The whole grid is exposed as REST endpoints. You don't even need Python — `curl` works. See [the README](README.md) for examples.
380
 
381
+ **Hard** — train your own agent. The code is at [github.com/krishnagoyal099/Opengrid_env](https://github.com/krishnagoyal099/Opengrid_env). The Colab notebook walks through the whole thing.
382
 
383
  ---
384