roshan5emerald commited on
Commit
86f42f7
·
verified ·
1 Parent(s): 2cce423

Update HF_MINI_BLOG.md

Browse files
Files changed (1) hide show
  1. HF_MINI_BLOG.md +2 -2
HF_MINI_BLOG.md CHANGED
@@ -202,7 +202,7 @@ The reward curve below shows the model improving from first steps after the SFT
202
  warm-up. Because the model already knows JSON format, reward is non-zero from
203
  step 1 and climbs steadily.
204
 
205
- ![Reward Curve](reward_curve.png)
206
  *GRPO training reward over 200 logging steps. After SFT warm-up,
207
  the model starts producing valid structured actions immediately.*
208
 
@@ -223,7 +223,7 @@ solving the cold-start problem.
223
 
224
  ### Before vs After GRPO
225
 
226
- ![Before vs After](before_after_comparison.png)
227
  *Policy comparison across all three task difficulties.
228
  Green = trained model. Blue = base model. Amber = heuristic baseline.*
229
 
 
202
  warm-up. Because the model already knows JSON format, reward is non-zero from
203
  step 1 and climbs steadily.
204
 
205
+ ![Reward Curve](outputs/logiflow-grpo-script/artifacts/reward_curve.png)
206
  *GRPO training reward over 200 logging steps. After SFT warm-up,
207
  the model starts producing valid structured actions immediately.*
208
 
 
223
 
224
  ### Before vs After GRPO
225
 
226
+ ![Before vs After](artifacts/before_after_comparison.png)
227
  *Policy comparison across all three task difficulties.
228
  Green = trained model. Blue = base model. Amber = heuristic baseline.*
229