Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files- README.md +18 -11
- notebooks/Copy_of_logiflow_grpo_colab.ipynb +0 -0
README.md
CHANGED
|
@@ -158,13 +158,13 @@ ensure every aspect of logistics performance is measured independently.
|
|
| 158 |
|
| 159 |
| Component | Weight | What It Measures |
|
| 160 |
|-----------|--------|-----------------|
|
| 161 |
-
| Bottleneck avoidance |
|
| 162 |
-
| Network balance |
|
| 163 |
-
| Step reward |
|
| 164 |
-
| Retail delivery |
|
| 165 |
-
| SLA compliance |
|
| 166 |
| Disruption recovery | 10% | How quickly the network stabilised after each disruption |
|
| 167 |
-
| Action validity |
|
| 168 |
|
| 169 |
### Training Reward (`action_reward` in `train_grpo.py`)
|
| 170 |
|
|
@@ -266,9 +266,16 @@ the model starts producing valid JSON immediately and reward climbs from the fir
|
|
| 266 |
*Figure 3: Detailed metrics breakdown — overall score, SLA rate, retail delivered, invalid
|
| 267 |
actions, and bottlenecks — for all three policies across all three tasks.*
|
| 268 |
|
| 269 |
-
![Training Loss]
|
| 270 |
-
|
| 271 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 272 |
|
| 273 |
## What the Trained Agent Thinks
|
| 274 |
|
|
@@ -395,7 +402,7 @@ crisis_logistics_env/
|
|
| 395 |
|----------|------|
|
| 396 |
| 🤗 HuggingFace Space (live environment) | https://roshan5emerald-logiflow-rl.hf.space/ | (Visualizer) | https://huggingface.co/spaces/roshan5emerald/logiflow-rl
|
| 397 |
| 📓 Colab Training Notebook | https://colab.research.google.com/drive/1wGXYNNYp13emNE1ThX3aqpIM3ppcU_Ty?usp=sharing |
|
| 398 |
-
| 📝 HuggingFace Blog Post |
|
| 399 |
|
| 400 |
---
|
| 401 |
|
|
@@ -420,10 +427,10 @@ prove that teaching is measurable.
|
|
| 420 |
```bibtex
|
| 421 |
@misc{logiflow-rl-2026,
|
| 422 |
title = {LogiFlow-RL: Training LLMs for Proactive Supply Chain Crisis Management},
|
| 423 |
-
author = {
|
| 424 |
year = {2026},
|
| 425 |
howpublished = {OpenEnv Hackathon India 2026 — Theme \#2: Long-Horizon Planning},
|
| 426 |
-
url = {https://huggingface.co/spaces/
|
| 427 |
}
|
| 428 |
```
|
| 429 |
|
|
|
|
| 158 |
|
| 159 |
| Component | Weight | What It Measures |
|
| 160 |
|-----------|--------|-----------------|
|
| 161 |
+
| Bottleneck avoidance | 12% | How often any node exceeded capacity |
|
| 162 |
+
| Network balance | 10% | Average load-gap between most and least loaded nodes |
|
| 163 |
+
| Step reward | 10% | Average per-step reward across the episode |
|
| 164 |
+
| Retail delivery | 32% | Freight actually delivered to retail nodes vs target |
|
| 165 |
+
| SLA compliance | 20% | Deliveries arriving within their deadline window |
|
| 166 |
| Disruption recovery | 10% | How quickly the network stabilised after each disruption |
|
| 167 |
+
| Action validity | 6% | Fraction of legal (connected) routing decisions |
|
| 168 |
|
| 169 |
### Training Reward (`action_reward` in `train_grpo.py`)
|
| 170 |
|
|
|
|
| 266 |
*Figure 3: Detailed metrics breakdown — overall score, SLA rate, retail delivered, invalid
|
| 267 |
actions, and bottlenecks — for all three policies across all three tasks.*
|
| 268 |
|
| 269 |
+

|
|
|
|
| 270 |
---
|
| 271 |
+
Training was run on Colab free-tier T4 GPU with Qwen2.5-0.5B-Instruct.
|
| 272 |
+
The most concrete evidence of learning is the **invalid action reduction
|
| 273 |
+
on Hard difficulty: 24 → 7 (71% reduction)**, confirming the model
|
| 274 |
+
learned the legal route topology of the network.
|
| 275 |
+
Overall episode score improvement is modest at this model scale —
|
| 276 |
+
this environment is intentionally hard enough that meaningful capability
|
| 277 |
+
gains require a 7B+ model with 500+ GRPO steps.
|
| 278 |
+
|
| 279 |
|
| 280 |
## What the Trained Agent Thinks
|
| 281 |
|
|
|
|
| 402 |
|----------|------|
|
| 403 |
| 🤗 HuggingFace Space (live environment) | https://roshan5emerald-logiflow-rl.hf.space/ | (Visualizer) | https://huggingface.co/spaces/roshan5emerald/logiflow-rl
|
| 404 |
| 📓 Colab Training Notebook | https://colab.research.google.com/drive/1wGXYNNYp13emNE1ThX3aqpIM3ppcU_Ty?usp=sharing |
|
| 405 |
+
| 📝 HuggingFace Blog Post | https://huggingface.co/spaces/roshan5emerald/logiflow-rl/blob/main/HF_MINI_BLOG.md |
|
| 406 |
|
| 407 |
---
|
| 408 |
|
|
|
|
| 427 |
```bibtex
|
| 428 |
@misc{logiflow-rl-2026,
|
| 429 |
title = {LogiFlow-RL: Training LLMs for Proactive Supply Chain Crisis Management},
|
| 430 |
+
author = {S. Roshan Pranao},
|
| 431 |
year = {2026},
|
| 432 |
howpublished = {OpenEnv Hackathon India 2026 — Theme \#2: Long-Horizon Planning},
|
| 433 |
+
url = {https://huggingface.co/spaces/roshan5emerald/logiflow-rl}
|
| 434 |
}
|
| 435 |
```
|
| 436 |
|
notebooks/Copy_of_logiflow_grpo_colab.ipynb
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|