Update CADForge blog results and image links
Browse files- CADFORGE_BLOG.md +3 -2
CADFORGE_BLOG.md
CHANGED
|
@@ -105,6 +105,7 @@ The real run used Unsloth for LoRA SFT and TRL GRPO for environment reward train
|
|
| 105 |
| Run | Result |
|
| 106 |
|---|---|
|
| 107 |
| Qwen3.5-2B SFT | train loss `1.4480 -> 0.1658`, eval loss `0.4477 -> 0.2676` |
|
|
|
|
| 108 |
| Qwen3.5-9B SFT | train loss `2.6020 -> 0.1413`, eval loss `0.3650 -> 0.2398` |
|
| 109 |
| Qwen3.5-9B strict GRPO | `320` completions, `96` buildable, `30.0%` build rate |
|
| 110 |
| Strict 9B quick eval | `2/3` held-out prompts built successfully |
|
|
@@ -116,9 +117,9 @@ The strict GRPO run produced:
|
|
| 116 |
- mean per-step reward trend: `+0.003549 / step`
|
| 117 |
- held-out eval build rate: `66.7%`
|
| 118 |
|
| 119 |
-

|
| 120 |
|
| 121 |
-

|
| 122 |
|
| 123 |
## What The Model Learned
|
| 124 |
|
|
|
|
| 105 |
| Run | Result |
|
| 106 |
|---|---|
|
| 107 |
| Qwen3.5-2B SFT | train loss `1.4480 -> 0.1658`, eval loss `0.4477 -> 0.2676` |
|
| 108 |
+
| Qwen3.5-2B dense GRPO | mean reward `0.3387`, best `0.5303`; useful reward signal but too forgiving on broken builds |
|
| 109 |
| Qwen3.5-9B SFT | train loss `2.6020 -> 0.1413`, eval loss `0.3650 -> 0.2398` |
|
| 110 |
| Qwen3.5-9B strict GRPO | `320` completions, `96` buildable, `30.0%` build rate |
|
| 111 |
| Strict 9B quick eval | `2/3` held-out prompts built successfully |
|
|
|
|
| 117 |
- mean per-step reward trend: `+0.003549 / step`
|
| 118 |
- held-out eval build rate: `66.7%`
|
| 119 |
|
| 120 |
+

|
| 121 |
|
| 122 |
+

|
| 123 |
|
| 124 |
## What The Model Learned
|
| 125 |
|