roshan5emerald commited on
Commit
7dc0e0a
·
verified ·
1 Parent(s): cee292a

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -158,13 +158,13 @@ ensure every aspect of logistics performance is measured independently.
158
 
159
  | Component | Weight | What It Measures |
160
  |-----------|--------|-----------------|
161
- | Bottleneck avoidance | 18% | How often any node exceeded capacity |
162
- | Network balance | 18% | Average load-gap between most and least loaded nodes |
163
- | Step reward | 14% | Average per-step reward across the episode |
164
- | Retail delivery | 20% | Freight actually delivered to retail nodes vs target |
165
- | SLA compliance | 15% | Deliveries arriving within their deadline window |
166
  | Disruption recovery | 10% | How quickly the network stabilised after each disruption |
167
- | Action validity | 5% | Fraction of legal (connected) routing decisions |
168
 
169
  ### Training Reward (`action_reward` in `train_grpo.py`)
170
 
@@ -266,9 +266,16 @@ the model starts producing valid JSON immediately and reward climbs from the fir
266
  *Figure 3: Detailed metrics breakdown — overall score, SLA rate, retail delivered, invalid
267
  actions, and bottlenecks — for all three policies across all three tasks.*
268
 
269
- ![Training Loss] (artifacts/Training_loss.png)
270
-
271
  ---
 
 
 
 
 
 
 
 
272
 
273
  ## What the Trained Agent Thinks
274
 
@@ -395,7 +402,7 @@ crisis_logistics_env/
395
  |----------|------|
396
  | 🤗 HuggingFace Space (live environment) | https://roshan5emerald-logiflow-rl.hf.space/ | (Visualizer) | https://huggingface.co/spaces/roshan5emerald/logiflow-rl
397
  | 📓 Colab Training Notebook | https://colab.research.google.com/drive/1wGXYNNYp13emNE1ThX3aqpIM3ppcU_Ty?usp=sharing |
398
- | 📝 HuggingFace Blog Post | [Add your blog URL] |
399
 
400
  ---
401
 
@@ -420,10 +427,10 @@ prove that teaching is measurable.
420
  ```bibtex
421
  @misc{logiflow-rl-2026,
422
  title = {LogiFlow-RL: Training LLMs for Proactive Supply Chain Crisis Management},
423
- author = {Your Name},
424
  year = {2026},
425
  howpublished = {OpenEnv Hackathon India 2026 — Theme \#2: Long-Horizon Planning},
426
- url = {https://huggingface.co/spaces/<your-space-url>}
427
  }
428
  ```
429
 
 
158
 
159
  | Component | Weight | What It Measures |
160
  |-----------|--------|-----------------|
161
+ | Bottleneck avoidance | 12% | How often any node exceeded capacity |
162
+ | Network balance | 10% | Average load-gap between most and least loaded nodes |
163
+ | Step reward | 10% | Average per-step reward across the episode |
164
+ | Retail delivery | 32% | Freight actually delivered to retail nodes vs target |
165
+ | SLA compliance | 20% | Deliveries arriving within their deadline window |
166
  | Disruption recovery | 10% | How quickly the network stabilised after each disruption |
167
+ | Action validity | 6% | Fraction of legal (connected) routing decisions |
168
 
169
  ### Training Reward (`action_reward` in `train_grpo.py`)
170
 
 
266
  *Figure 3: Detailed metrics breakdown — overall score, SLA rate, retail delivered, invalid
267
  actions, and bottlenecks — for all three policies across all three tasks.*
268
 
269
+ ![Training Loss](artifacts/Training_loss.png)
 
270
  ---
271
+ Training was run on Colab free-tier T4 GPU with Qwen2.5-0.5B-Instruct.
272
+ The most concrete evidence of learning is the **invalid action reduction
273
+ on Hard difficulty: 24 → 7 (71% reduction)**, confirming the model
274
+ learned the legal route topology of the network.
275
+ Overall episode score improvement is modest at this model scale —
276
+ this environment is intentionally hard enough that meaningful capability
277
+ gains require a 7B+ model with 500+ GRPO steps.
278
+
279
 
280
  ## What the Trained Agent Thinks
281
 
 
402
  |----------|------|
403
  | 🤗 HuggingFace Space (live environment) | https://roshan5emerald-logiflow-rl.hf.space/ | (Visualizer) | https://huggingface.co/spaces/roshan5emerald/logiflow-rl
404
  | 📓 Colab Training Notebook | https://colab.research.google.com/drive/1wGXYNNYp13emNE1ThX3aqpIM3ppcU_Ty?usp=sharing |
405
+ | 📝 HuggingFace Blog Post | https://huggingface.co/spaces/roshan5emerald/logiflow-rl/blob/main/HF_MINI_BLOG.md |
406
 
407
  ---
408
 
 
427
  ```bibtex
428
  @misc{logiflow-rl-2026,
429
  title = {LogiFlow-RL: Training LLMs for Proactive Supply Chain Crisis Management},
430
+ author = {S. Roshan Pranao},
431
  year = {2026},
432
  howpublished = {OpenEnv Hackathon India 2026 — Theme \#2: Long-Horizon Planning},
433
+ url = {https://huggingface.co/spaces/roshan5emerald/logiflow-rl}
434
  }
435
  ```
436
 
notebooks/Copy_of_logiflow_grpo_colab.ipynb ADDED
The diff for this file is too large to render. See raw diff