sanjuhs commited on
Commit
fca5bb6
·
verified ·
1 Parent(s): 9d75b6b

Update CADForge blog results and image links

Browse files
Files changed (1) hide show
  1. CADFORGE_BLOG.md +3 -2
CADFORGE_BLOG.md CHANGED
@@ -105,6 +105,7 @@ The real run used Unsloth for LoRA SFT and TRL GRPO for environment reward train
105
  | Run | Result |
106
  |---|---|
107
  | Qwen3.5-2B SFT | train loss `1.4480 -> 0.1658`, eval loss `0.4477 -> 0.2676` |
 
108
  | Qwen3.5-9B SFT | train loss `2.6020 -> 0.1413`, eval loss `0.3650 -> 0.2398` |
109
  | Qwen3.5-9B strict GRPO | `320` completions, `96` buildable, `30.0%` build rate |
110
  | Strict 9B quick eval | `2/3` held-out prompts built successfully |
@@ -116,9 +117,9 @@ The strict GRPO run produced:
116
  - mean per-step reward trend: `+0.003549 / step`
117
  - held-out eval build rate: `66.7%`
118
 
119
- ![Strict GRPO reward curve](training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/grpo_reward_curve.png)
120
 
121
- ![Strict GRPO code health](training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/grpo_code_health.png)
122
 
123
  ## What The Model Learned
124
 
 
105
  | Run | Result |
106
  |---|---|
107
  | Qwen3.5-2B SFT | train loss `1.4480 -> 0.1658`, eval loss `0.4477 -> 0.2676` |
108
+ | Qwen3.5-2B dense GRPO | mean reward `0.3387`, best `0.5303`; useful reward signal but too forgiving on broken builds |
109
  | Qwen3.5-9B SFT | train loss `2.6020 -> 0.1413`, eval loss `0.3650 -> 0.2398` |
110
  | Qwen3.5-9B strict GRPO | `320` completions, `96` buildable, `30.0%` build rate |
111
  | Strict 9B quick eval | `2/3` held-out prompts built successfully |
 
117
  - mean per-step reward trend: `+0.003549 / step`
118
  - held-out eval build rate: `66.7%`
119
 
120
+ ![Strict GRPO reward curve](https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/resolve/main/training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/grpo_reward_curve.png)
121
 
122
+ ![Strict GRPO code health](https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/resolve/main/training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/grpo_code_health.png)
123
 
124
  ## What The Model Learned
125