Spaces:

K446
/

Opengrid

Running

K446 commited on 11 days ago

Commit

2f2ff77

1 Parent(s): e81353d

docs: clarify scenario count, OPENGRID_MODE flag; drop runtime/epoch info

- README/blog: remove training-runtime numbers (159.6 min, 2.5 hours, etc.)
and "3 epochs" annotations from the metrics table and prose.
- README: replace misleading "## The four scenarios" heading with
"## The scenarios" and split into two tables — 4 base grids and 3 Karnataka
difficulty variants — for an accurate 7-task picture.
- README: add a "Docker / Hugging Face Space — server vs training mode"
subsection explaining the OPENGRID_MODE env var (unset/server = live demo,
training = run GRPO and serve results) with local docker run examples.

Made-with: Cursor

Files changed (2) hide show

README.md +35 -7
blog.md +2 -3

README.md CHANGED Viewed

@@ -78,7 +78,11 @@ Agents talk to the grid over HTTP. Any language, any framework — it's just `PO
 ---
-## The four scenarios
 | Task | Buses | Agents | Renewables | What's hard about it |
 |---|---|---|---|---|
@@ -87,9 +91,15 @@ Agents talk to the grid over HTTP. Any language, any framework — it's just `PO
 | `task_hard` | 14 | 3 | 70% | Tight margins. Small mistakes blow up. |
 | `task_karnataka` | 15 | 4 | Real mix | The actual KPTCL grid with GPS coordinates. |
-Episodes run for 50 steps. Scores land between **0.02 and 0.98** (higher = better).
-There are also three "stress test" variants of Karnataka — `karnataka_easy`, `karnataka_medium`, `karnataka_hard` — that crank the volatility, fault rates, and renewable share progressively.
 ---
@@ -149,13 +159,32 @@ Or open one of the Colab notebooks in Google Colab (free T4 works for both):
 Both notebooks produce the same `training/outputs/summary.json` schema, with a `framework` field identifying which path was used.
-### Docker
 ```bash
 docker build -t opengrid .
-docker run -p 7860:7860 opengrid
 ```
 ---
 ## The API in 30 seconds
@@ -277,8 +306,7 @@ We fine-tuned **Qwen/Qwen2.5-1.5B-Instruct** on `task_karnataka` using GRPO (Gro
 | Framework | TRL `GRPOTrainer` + bitsandbytes 4-bit + PEFT LoRA |
 | LoRA | rank=16, alpha=32, dropout=0.05 |
 | Hardware | NVIDIA A10G (23.9 GB) |
-| Time | 159.6 minutes |
-| Steps | 449 across 600 prompts (3 epochs) |
 | Optimizer | paged_adamw_8bit, lr=2e-5, cosine |
 ### What happened

 ---
+## The scenarios
+Seven scenarios in total — four base grids and three difficulty variants of the Karnataka topology used for curriculum learning.
+**Base grids**
 | Task | Buses | Agents | Renewables | What's hard about it |
 |---|---|---|---|---|
 | `task_hard` | 14 | 3 | 70% | Tight margins. Small mistakes blow up. |
 | `task_karnataka` | 15 | 4 | Real mix | The actual KPTCL grid with GPS coordinates. |
+**Karnataka stress-test variants** — same 15-bus topology, different operating conditions:
+| Task | Renewables | Load | Line capacity |
+|---|---|---|---|
+| `karnataka_easy` | 0.3× | 0.6× | 1.5× |
+| `karnataka_medium` | 0.7× | 1.0× | 1.0× |
+| `karnataka_hard` | 1.3× | 1.4× | 0.75× |
+Episodes run for 50 steps. Scores land between **0.02 and 0.98** (higher = better).
 ---
 Both notebooks produce the same `training/outputs/summary.json` schema, with a `framework` field identifying which path was used.
+### Docker / Hugging Face Space — server vs training mode
+The same image powers both the live control room and the GRPO training run.
+The behaviour is selected by a single environment variable, **`OPENGRID_MODE`**:
+| `OPENGRID_MODE` | What runs |
+|---|---|
+| *unset* (default) — or `server` | Boots `uvicorn app:app` on port 7860 — the live control-room dashboard. **This is what the public HF Space serves.** |
+| `training`               | Starts the UI server in the background (so the HF health-check passes), then runs `python run_training.py` in the foreground. When training finishes, plots and `summary.json` are written to `training/outputs/` and the already-running UI keeps serving them. |
+So, locally:
 ```bash
 docker build -t opengrid .
+docker run -p 7860:7860 opengrid                              # server mode (default)
+docker run -p 7860:7860 -e OPENGRID_MODE=training opengrid    # train, then serve results
 ```
+On Hugging Face Spaces, the variable is set under
+*Settings → Variables and secrets* — flip it to `training` to retrain on a GPU
+Space, flip it back to `server` (or remove it) to go back to live demo mode.
+The shipped `summary.json` and plots in this repo were produced exactly that
+way: a one-off `OPENGRID_MODE=training` run on an A10G Space, after which the
+variable was reset so the Space serves the trained results.
 ---
 ## The API in 30 seconds
 | Framework | TRL `GRPOTrainer` + bitsandbytes 4-bit + PEFT LoRA |
 | LoRA | rank=16, alpha=32, dropout=0.05 |
 | Hardware | NVIDIA A10G (23.9 GB) |
+| Steps | 449 across 600 prompts |
 | Optimizer | paged_adamw_8bit, lr=2e-5, cosine |
 ### What happened

blog.md CHANGED Viewed

@@ -309,8 +309,7 @@ After all that setup, the actual training was almost anticlimactic.
 - Model: Qwen2.5-1.5B-Instruct
 - Hardware: NVIDIA A10G (23.9 GB)
-- Time: ~160 minutes
-- Steps: 449 (across 600 prompts × 3 epochs)
 - LR: 2e-5, cosine schedule
 - Batch: 4 per device × 4 grad accum × 4 generations = effective 64
@@ -379,7 +378,7 @@ If any of this sounds interesting, here are three things you can do right now, i
 **Medium** — point an LLM at it. The whole grid is exposed as REST endpoints. You don't even need Python — `curl` works. See [the README](README.md) for examples.
-**Hard** — train your own agent. The code is at [github.com/krishnagoyal099/Opengrid_env](https://github.com/krishnagoyal099/Opengrid_env). The Colab notebook walks through the whole thing. A T4 will do it overnight. An A10G will do it in 2.5 hours.
 ---

 - Model: Qwen2.5-1.5B-Instruct
 - Hardware: NVIDIA A10G (23.9 GB)
+- Steps: 449 (across 600 prompts)
 - LR: 2e-5, cosine schedule
 - Batch: 4 per device × 4 grad accum × 4 generations = effective 64
 **Medium** — point an LLM at it. The whole grid is exposed as REST endpoints. You don't even need Python — `curl` works. See [the README](README.md) for examples.
+**Hard** — train your own agent. The code is at [github.com/krishnagoyal099/Opengrid_env](https://github.com/krishnagoyal099/Opengrid_env). The Colab notebook walks through the whole thing.
 ---