Spaces:

sanjuhs
/

cadforge-cadquery-openenv

Running

sanjuhs commited on 14 days ago

Commit

fca5bb6

verified ·

1 Parent(s): 9d75b6b

Update CADForge blog results and image links

Files changed (1) hide show

CADFORGE_BLOG.md CHANGED Viewed

@@ -105,6 +105,7 @@ The real run used Unsloth for LoRA SFT and TRL GRPO for environment reward train
 | Run | Result |
 |---|---|
 | Qwen3.5-2B SFT | train loss `1.4480 -> 0.1658`, eval loss `0.4477 -> 0.2676` |
 | Qwen3.5-9B SFT | train loss `2.6020 -> 0.1413`, eval loss `0.3650 -> 0.2398` |
 | Qwen3.5-9B strict GRPO | `320` completions, `96` buildable, `30.0%` build rate |
 | Strict 9B quick eval | `2/3` held-out prompts built successfully |
@@ -116,9 +117,9 @@ The strict GRPO run produced:
 - mean per-step reward trend: `+0.003549 / step`
 - held-out eval build rate: `66.7%`
-![Strict GRPO reward curve](training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/grpo_reward_curve.png)
-![Strict GRPO code health](training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/grpo_code_health.png)
 ## What The Model Learned

 | Run | Result |
 |---|---|
 | Qwen3.5-2B SFT | train loss `1.4480 -> 0.1658`, eval loss `0.4477 -> 0.2676` |
+| Qwen3.5-2B dense GRPO | mean reward `0.3387`, best `0.5303`; useful reward signal but too forgiving on broken builds |
 | Qwen3.5-9B SFT | train loss `2.6020 -> 0.1413`, eval loss `0.3650 -> 0.2398` |
 | Qwen3.5-9B strict GRPO | `320` completions, `96` buildable, `30.0%` build rate |
 | Strict 9B quick eval | `2/3` held-out prompts built successfully |
 - mean per-step reward trend: `+0.003549 / step`
 - held-out eval build rate: `66.7%`
+![Strict GRPO reward curve](https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/resolve/main/training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/grpo_reward_curve.png)
+![Strict GRPO code health](https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/resolve/main/training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/grpo_code_health.png)
 ## What The Model Learned