sanjuhs commited on
Commit
5fa6fc8
·
verified ·
1 Parent(s): f6a99e5

Add rendered blog diagrams and Colab-safe notebook

Browse files
docs/detailed-blog/cadforge-detailed-blog.md CHANGED
@@ -105,6 +105,8 @@ flowchart TD
105
  H --> B
106
  ```
107
 
 
 
108
  Every episode can write:
109
 
110
  - generated CadQuery code,
@@ -234,6 +236,8 @@ flowchart TD
234
  J --> A
235
  ```
236
 
 
 
237
  This is the environment "fighting back." If the model keeps clipping code before `fixture`, the next batch includes more syntax-closure repairs. If it invents APIs, the next batch asks it to rewrite using conservative primitives. If it builds but misses semantics, the next batch asks for named subassemblies and recognizable features.
238
 
239
  The adaptive curriculum generator produced a 180-row repair set from 320 strict GRPO rollouts. It found:
@@ -274,6 +278,8 @@ flowchart LR
274
  J --> K
275
  ```
276
 
 
 
277
  This is the scalable route. A human does not need to hand-design every CAD target. The pipeline can generate many task prompts, create reference images, turn them into GLBs, extract similarity signals, and train models to write editable CadQuery that approximates the object.
278
 
279
  The wedge is important:
@@ -510,6 +516,9 @@ All must-have visuals are now present locally in `docs/detailed-blog/rendered-as
510
  | GLB reference beside generated CadQuery render | present |
511
  | compact build-rate chart | present |
512
  | adaptive curriculum failure-class chart | present |
 
 
 
513
  | self-improvement loop summary | present |
514
 
515
  The only additional images that would make the case stronger are:
 
105
  H --> B
106
  ```
107
 
108
+ ![Rendered CADForge environment loop](./rendered-assets/environment-loop-rendered.png)
109
+
110
  Every episode can write:
111
 
112
  - generated CadQuery code,
 
236
  J --> A
237
  ```
238
 
239
+ ![Rendered adaptive repair curriculum loop](./rendered-assets/adaptive-repair-loop-rendered.png)
240
+
241
  This is the environment "fighting back." If the model keeps clipping code before `fixture`, the next batch includes more syntax-closure repairs. If it invents APIs, the next batch asks it to rewrite using conservative primitives. If it builds but misses semantics, the next batch asks for named subassemblies and recognizable features.
242
 
243
  The adaptive curriculum generator produced a 180-row repair set from 320 strict GRPO rollouts. It found:
 
278
  J --> K
279
  ```
280
 
281
+ ![Rendered prompt-image-GLB reference loop](./rendered-assets/reference-generation-loop-rendered.png)
282
+
283
  This is the scalable route. A human does not need to hand-design every CAD target. The pipeline can generate many task prompts, create reference images, turn them into GLBs, extract similarity signals, and train models to write editable CadQuery that approximates the object.
284
 
285
  The wedge is important:
 
516
  | GLB reference beside generated CadQuery render | present |
517
  | compact build-rate chart | present |
518
  | adaptive curriculum failure-class chart | present |
519
+ | rendered environment loop diagram | present |
520
+ | rendered adaptive repair loop diagram | present |
521
+ | rendered prompt-image-GLB loop diagram | present |
522
  | self-improvement loop summary | present |
523
 
524
  The only additional images that would make the case stronger are:
docs/detailed-blog/rendered-assets/adaptive-repair-loop-rendered.png ADDED
docs/detailed-blog/rendered-assets/environment-loop-rendered.png ADDED
docs/detailed-blog/rendered-assets/reference-generation-loop-rendered.png ADDED
docs/detailed-blog/rendered-assets/self-improvement-loop-summary.png CHANGED
src/main.js CHANGED
@@ -7,6 +7,8 @@ import { renderScadToGroup } from "./scadRenderer.js";
7
 
8
  const app = document.querySelector("#app");
9
  const route = window.location.pathname;
 
 
10
  const isLandingPage = route === "/" || route === "/index.html";
11
  const isCadQueryGeneratorPage = route === "/cadquery";
12
  const isCadQueryRendererPage = route === "/cadquery-renderer";
@@ -41,6 +43,11 @@ app.innerHTML = `
41
  <strong>Reward environment</strong>
42
  <small>Generate, revise, render, and score Markus-chair CadQuery candidates against the ideal code and GLB reference.</small>
43
  </a>
 
 
 
 
 
44
  </div>
45
  </section>
46
  </main>
@@ -53,6 +60,7 @@ app.innerHTML = `
53
  <a class="${isCadQueryGeneratorPage ? "active" : ""}" href="/cadquery">CadQuery</a>
54
  <a class="${isCadQueryRendererPage ? "active" : ""}" href="/cadquery-renderer">Renderer</a>
55
  <a class="${isCadQueryEnvPage ? "active" : ""}" href="/cadquery-env">Env</a>
 
56
  </nav>
57
  <div>
58
  <p class="eyebrow">CADForge Experiment 2</p>
 
7
 
8
  const app = document.querySelector("#app");
9
  const route = window.location.pathname;
10
+ const detailedBlogUrl =
11
+ "https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/blob/main/docs/detailed-blog/cadforge-detailed-blog.md";
12
  const isLandingPage = route === "/" || route === "/index.html";
13
  const isCadQueryGeneratorPage = route === "/cadquery";
14
  const isCadQueryRendererPage = route === "/cadquery-renderer";
 
43
  <strong>Reward environment</strong>
44
  <small>Generate, revise, render, and score Markus-chair CadQuery candidates against the ideal code and GLB reference.</small>
45
  </a>
46
+ <a class="route-card" href="${detailedBlogUrl}" target="_blank" rel="noreferrer">
47
+ <span>Submission</span>
48
+ <strong>Detailed blog</strong>
49
+ <small>Read the full hackathon story: frontier model failures, reward design, SFT/GRPO evidence, and self-improvement loops.</small>
50
+ </a>
51
  </div>
52
  </section>
53
  </main>
 
60
  <a class="${isCadQueryGeneratorPage ? "active" : ""}" href="/cadquery">CadQuery</a>
61
  <a class="${isCadQueryRendererPage ? "active" : ""}" href="/cadquery-renderer">Renderer</a>
62
  <a class="${isCadQueryEnvPage ? "active" : ""}" href="/cadquery-env">Env</a>
63
+ <a href="${detailedBlogUrl}" target="_blank" rel="noreferrer">Detailed Blog</a>
64
  </nav>
65
  <div>
66
  <p class="eyebrow">CADForge Experiment 2</p>
training/cadforge_openenv_training_colab.ipynb CHANGED
@@ -1,496 +1,522 @@
1
  {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "metadata": {},
6
- "source": [
7
- "# CADForge OpenEnv Training Run Notebook\n",
8
- "\n",
9
- "[Open this notebook in Google Colab](https://colab.research.google.com/github/sanjuhs/open-env-meta-final-hackathon/blob/main/training/cadforge_openenv_training_colab.ipynb) | [Notebook in the HF Space repo](https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/blob/main/training/cadforge_openenv_training_colab.ipynb)\n",
10
- "\n",
11
- "This is the public, judge-runnable training notebook for CADForge. It is intentionally split into two parts:\n",
12
- "\n",
13
- "1. **Colab smoke run**: runs a tiny 3-step SFT training job and validates the CADForge reward pipeline so judges can verify the code path without renting an H200.\n",
14
- "2. **Production run evidence**: links to the exact scripts, reports, model adapters, dataset, and full H200 commands used for the submitted runs.\n",
15
- "\n",
16
- "The final submitted models were trained outside Colab on a RunPod H200 because Qwen3.5 9B + CADForge GRPO is too large for free Colab. The notebook does not pretend to rerun the full job; it proves the same training scripts and public dataset execute end-to-end on a tiny sample."
17
- ]
18
- },
19
- {
20
- "cell_type": "markdown",
21
- "metadata": {},
22
- "source": [
23
- "## What judges should check\n",
24
- "\n",
25
- "- The repo clone and dependency install succeed.\n",
26
- "- The OpenEnv app validates.\n",
27
- "- The public HF dataset loads.\n",
28
- "- The CADForge reward smoke test compiles and scores real CadQuery.\n",
29
- "- The 3-step SFT smoke run calls `training/train_sft_unsloth.py`, the same script used for the production SFT runs.\n",
30
- "- The optional GRPO cell calls `training/train_grpo_cadforge.py`, the same script used for strict build-gated GRPO.\n",
31
- "- The final report cells show the committed metrics and generated artifacts from the full H200 training run."
32
- ]
33
- },
34
- {
35
- "cell_type": "markdown",
36
- "metadata": {},
37
- "source": [
38
- "## Public artifacts\n",
39
- "\n",
40
- "- Project repo: https://github.com/sanjuhs/open-env-meta-final-hackathon\n",
41
- "- Hugging Face Space repo: https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv\n",
42
- "- Training dataset: https://huggingface.co/datasets/sanjuhs/cadforge-cadquery-agentic-traces\n",
43
- "- 2B SFT adapter: https://huggingface.co/sanjuhs/qwen35-2b-cadforge-sft-lora\n",
44
- "- 2B GRPO adapter: https://huggingface.co/sanjuhs/qwen35-2b-cadforge-grpo-lora\n",
45
- "- 9B SFT adapter: https://huggingface.co/sanjuhs/qwen35-9b-cadforge-sft-lora\n",
46
- "- 9B GRPO adapter: https://huggingface.co/sanjuhs/qwen35-9b-cadforge-grpo-lora\n",
47
- "- 9B strict build-gated GRPO adapter: https://huggingface.co/sanjuhs/qwen35-9b-cadforge-grpo-strict-build-lora"
48
- ]
49
- },
50
- {
51
- "cell_type": "markdown",
52
- "metadata": {},
53
- "source": [
54
- "## 1. Configure\n",
55
- "\n",
56
- "For a normal judge rerun, keep `RUN_SFT_SMOKE = True` and `RUN_GRPO_CADFORGE_SMOKE = False`.\n",
57
- "\n",
58
- "The SFT smoke is the best Colab-friendly proof that the training script is real. The CADForge reward smoke separately proves that the OpenEnv/CadQuery reward backend is real. GRPO is left optional because even a tiny CADForge GRPO run can be slow or memory-heavy on Colab T4/L4 runtimes."
59
- ]
60
- },
61
- {
62
- "cell_type": "code",
63
- "execution_count": null,
64
- "metadata": {},
65
- "outputs": [],
66
- "source": [
67
- "import os\n",
68
- "from pathlib import Path\n",
69
- "\n",
70
- "REPO_URL = \"https://github.com/sanjuhs/open-env-meta-final-hackathon.git\"\n",
71
- "REPO_DIR = Path(\"/content/open-env-meta-final-hackathon\")\n",
72
- "\n",
73
- "# Public dataset produced for CADForge.\n",
74
- "DATASET = \"sanjuhs/cadforge-cadquery-agentic-traces\"\n",
75
- "\n",
76
- "# Keep these small for public judge reruns.\n",
77
- "RUN_SFT_SMOKE = True\n",
78
- "RUN_GRPO_CADFORGE_SMOKE = False\n",
79
- "SFT_SMOKE_STEPS = 3\n",
80
- "\n",
81
- "# Make the flags visible to %%bash cells.\n",
82
- "os.environ[\"RUN_SFT_SMOKE\"] = \"1\" if RUN_SFT_SMOKE else \"0\"\n",
83
- "os.environ[\"RUN_GRPO_CADFORGE_SMOKE\"] = \"1\" if RUN_GRPO_CADFORGE_SMOKE else \"0\"\n",
84
- "\n",
85
- "# Optional. Only needed if you want to push adapters or access private repos.\n",
86
- "os.environ.setdefault(\"HF_TOKEN\", \"\")\n",
87
- "\n",
88
- "BASE_2B = \"unsloth/Qwen3.5-2B\"\n",
89
- "BASE_9B = \"unsloth/Qwen3.5-9B\"\n",
90
- "print(\"Configured notebook smoke run\")"
91
- ]
92
- },
93
- {
94
- "cell_type": "markdown",
95
- "metadata": {},
96
- "source": [
97
- "## 2. Clone and install\n",
98
- "\n",
99
- "This uses `uv` to match the production RunPod environment. The install can take several minutes on a fresh Colab runtime."
100
- ]
101
- },
102
- {
103
- "cell_type": "code",
104
- "execution_count": null,
105
- "metadata": {},
106
- "outputs": [],
107
- "source": [
108
- "%%bash\n",
109
- "set -euo pipefail\n",
110
- "\n",
111
- "if [ ! -d /content/open-env-meta-final-hackathon/.git ]; then\n",
112
- " rm -rf /content/open-env-meta-final-hackathon\n",
113
- " git clone --depth 1 https://github.com/sanjuhs/open-env-meta-final-hackathon.git /content/open-env-meta-final-hackathon\n",
114
- "else\n",
115
- " cd /content/open-env-meta-final-hackathon\n",
116
- " git fetch --depth 1 origin main\n",
117
- " git reset --hard origin/main\n",
118
- "fi\n",
119
- "\n",
120
- "cd /content/open-env-meta-final-hackathon\n",
121
- "curl -LsSf https://astral.sh/uv/install.sh | sh || true\n",
122
- "export PATH=\"$HOME/.local/bin:$PATH\"\n",
123
- "\n",
124
- "export UV_CACHE_DIR=/content/.uv-cache\n",
125
- "export HF_HOME=/content/.cache/huggingface\n",
126
- "export TORCH_HOME=/content/.cache/torch\n",
127
- "export TRITON_CACHE_DIR=/content/.cache/triton\n",
128
- "export VLLM_CACHE_ROOT=/content/.cache/vllm\n",
129
- "export UV_LINK_MODE=copy\n",
130
- "export HF_HUB_ENABLE_HF_TRANSFER=1\n",
131
- "export HF_HUB_DISABLE_XET=1\n",
132
- "\n",
133
- "uv sync --project experiment-2-cadforge\n",
134
- "uv run --project experiment-2-cadforge python - <<'PY'\n",
135
- "import numpy, trimesh, scipy, cadquery\n",
136
- "print('CADForge project env OK:', numpy.__version__)\n",
137
- "PY\n"
138
- ]
139
- },
140
- {
141
- "cell_type": "markdown",
142
- "metadata": {},
143
- "source": [
144
- "## 3. Validate the OpenEnv submission and inline reward backend\n",
145
- "\n",
146
- "This validates the OpenEnv app, then evaluates a small pasted CadQuery program through the real CADForge reward environment.\n",
147
- "\n",
148
- "The old root-level smoke command is kept as comments inside the cell because it can fail in Colab when `cadquery_env.py` runs outside the `experiment-2-cadforge` virtualenv. The working path is `uv run --project experiment-2-cadforge ...`, which gives the reward code access to `numpy`, `trimesh`, `scipy`, and `cadquery`.\n"
149
- ]
150
- },
151
- {
152
- "cell_type": "code",
153
- "execution_count": null,
154
- "metadata": {},
155
- "outputs": [],
156
- "source": [
157
- "%%bash\n",
158
- "set -euo pipefail\n",
159
- "cd /content/open-env-meta-final-hackathon\n",
160
- "export PATH=\"$HOME/.local/bin:$PATH\"\n",
161
- "\n",
162
- "uv run --project experiment-2-cadforge openenv validate experiment-2-cadforge\n",
163
- "\n",
164
- "# Kept for reference only. In Colab this can run from the root uv env and miss\n",
165
- "# numpy/cadquery/trimesh, which causes ModuleNotFoundError inside cadquery_env.py.\n",
166
- "# uv run training/smoke_cadforge_reward.py --reward-mode fast\n",
167
- "\n",
168
- "cat > /tmp/cadforge_inline_candidate.py <<'CANDIDATEPY'\n",
169
- "import cadquery as cq\n",
170
- "\n",
171
- "# Tiny editable wall-mounted J-hook for the notebook smoke test.\n",
172
- "# This is deliberately simple so judges can verify the CADForge reward path quickly.\n",
173
- "plate_w = 46.0\n",
174
- "plate_h = 72.0\n",
175
- "plate_t = 6.0\n",
176
- "hook_len = 58.0\n",
177
- "hook_t = 12.0\n",
178
- "hook_drop = 34.0\n",
179
- "rib_t = 8.0\n",
180
- "hole_d = 6.0\n",
181
- "\n",
182
- "plate = cq.Workplane(\"XY\").box(plate_w, plate_t, plate_h).translate((0, 0, plate_h / 2.0))\n",
183
- "\n",
184
- "upper_hole = (\n",
185
- " cq.Workplane(\"XY\")\n",
186
- " .circle(hole_d / 2.0)\n",
187
- " .extrude(plate_t + 2.0)\n",
188
- " .translate((0, -1.0, plate_h * 0.72))\n",
189
- ")\n",
190
- "lower_hole = (\n",
191
- " cq.Workplane(\"XY\")\n",
192
- " .circle(hole_d / 2.0)\n",
193
- " .extrude(plate_t + 2.0)\n",
194
- " .translate((0, -1.0, plate_h * 0.28))\n",
195
- ")\n",
196
- "plate = plate.cut(upper_hole).cut(lower_hole)\n",
197
- "\n",
198
- "arm = cq.Workplane(\"XY\").box(hook_len, hook_t, hook_t).translate((hook_len / 2.0, 0, plate_h * 0.52))\n",
199
- "tip = cq.Workplane(\"XY\").box(hook_t, hook_t, hook_drop).translate((hook_len, 0, plate_h * 0.52 - hook_drop / 2.0))\n",
200
- "rib = cq.Workplane(\"XY\").box(hook_len * 0.65, rib_t, rib_t).translate((hook_len * 0.33, 0, plate_h * 0.38))\n",
201
- "\n",
202
- "fixture = plate.union(arm).union(tip).union(rib).clean()\n",
203
- "CANDIDATEPY\n",
204
- "\n",
205
- "uv run --project experiment-2-cadforge python \\\n",
206
- " experiment-2-cadforge/python_tools/cadquery_env.py evaluate \\\n",
207
- " --code-file /tmp/cadforge_inline_candidate.py \\\n",
208
- " --episode-id notebook-inline-smoke \\\n",
209
- " --step-id pasted-cadquery-candidate \\\n",
210
- " --task-spec wall_j_hook_120n \\\n",
211
- " --task-prompt \"Design a simple editable wall mounted J hook with a wall plate, hook arm, support rib, and screw holes.\" \\\n",
212
- " --reward-mode fast\n"
213
- ]
214
- },
215
- {
216
- "cell_type": "markdown",
217
- "metadata": {},
218
- "source": [
219
- "## 4. Inspect the public training data\n",
220
- "\n",
221
- "The production scripts download from the same public Hugging Face dataset if local JSONL files are absent."
222
- ]
223
- },
224
- {
225
- "cell_type": "code",
226
- "execution_count": null,
227
- "metadata": {},
228
- "outputs": [],
229
- "source": [
230
- "%%bash\n",
231
- "set -euo pipefail\n",
232
- "cd /content/open-env-meta-final-hackathon\n",
233
- "export PATH=\"$HOME/.local/bin:$PATH\"\n",
234
- "\n",
235
- "uv run --with datasets --with huggingface_hub python - <<'PY'\n",
236
- "from datasets import load_dataset\n",
237
- "\n",
238
- "repo = \"sanjuhs/cadforge-cadquery-agentic-traces\"\n",
239
- "files = {\n",
240
- " \"sft_train\": \"data/sft/prompt_to_cadquery_train.jsonl\",\n",
241
- " \"repair_train\": \"data/repair/cadquery_repair_train.jsonl\",\n",
242
- " \"rl_rollouts\": \"data/rl/cadquery_rollouts.jsonl\",\n",
243
- "}\n",
244
- "for name, path in files.items():\n",
245
- " ds = load_dataset(\"json\", data_files={name: f\"hf://datasets/{repo}/{path}\"}, split=name)\n",
246
- " print(name, len(ds), list(ds[0].keys()))\n",
247
- "PY"
248
- ]
249
- },
250
- {
251
- "cell_type": "markdown",
252
- "metadata": {},
253
- "source": [
254
- "## 5. Three-step SFT smoke training\n",
255
- "\n",
256
- "This runs the same SFT script used for the full runs, but with a tiny row limit and 3 optimizer steps. It is the Colab-friendly proof of the training entrypoint.\n",
257
- "\n",
258
- "The production SFT runs used longer schedules and larger context on RunPod H200. Those commands and reports are shown below."
259
- ]
260
- },
261
- {
262
- "cell_type": "code",
263
- "execution_count": null,
264
- "metadata": {},
265
- "outputs": [],
266
- "source": [
267
- "%%bash\n",
268
- "set -euo pipefail\n",
269
- "cd /content/open-env-meta-final-hackathon\n",
270
- "export PATH=\"$HOME/.local/bin:$PATH\"\n",
271
- "export HF_HUB_DISABLE_XET=1\n",
272
- "\n",
273
- "if [ \"${RUN_SFT_SMOKE:-1}\" != \"1\" ]; then\n",
274
- " echo \"SFT smoke disabled in config.\"\n",
275
- " exit 0\n",
276
- "fi\n",
277
- "\n",
278
- "CUDA_READY=$(python - <<'PY'\n",
279
- "import torch\n",
280
- "print('1' if torch.cuda.is_available() else '0')\n",
281
- "PY\n",
282
- ")\n",
283
- "echo \"CUDA available: $CUDA_READY\"\n",
284
- "if [ \"$CUDA_READY\" != \"1\" ]; then\n",
285
- " echo \"Skipping SFT smoke: switch Colab runtime to GPU, then rerun this cell.\"\n",
286
- " exit 0\n",
287
- "fi\n",
288
- "\n",
289
- "uv run training/train_sft_unsloth.py \\\n",
290
- " --model unsloth/Qwen3.5-2B \\\n",
291
- " --output-dir outputs/notebook-qwen35-2b-cadforge-sft-3step \\\n",
292
- " --max-steps 3 \\\n",
293
- " --limit-train-rows 12 \\\n",
294
- " --limit-val-rows 4 \\\n",
295
- " --max-seq-length 2048 \\\n",
296
- " --per-device-train-batch-size 1 \\\n",
297
- " --gradient-accumulation-steps 1 \\\n",
298
- " --eval-steps 3 \\\n",
299
- " --save-steps 3 \\\n",
300
- " --lora-r 16 \\\n",
301
- " --lora-alpha 32 \\\n",
302
- " --load-in-4bit \\\n",
303
- " --run-name notebook-qwen35-2b-sft-3step"
304
- ]
305
- },
306
- {
307
- "cell_type": "markdown",
308
- "metadata": {},
309
- "source": [
310
- "## 6. Optional 3-step GRPO smoke\n",
311
- "\n",
312
- "This calls the actual GRPO training script with strict build gating and the real CADForge reward backend. Leave it disabled for quick judging; enable it only on a strong GPU runtime."
313
- ]
314
- },
315
- {
316
- "cell_type": "code",
317
- "execution_count": null,
318
- "metadata": {},
319
- "outputs": [],
320
- "source": [
321
- "if not RUN_GRPO_CADFORGE_SMOKE:\n",
322
- " print(\"GRPO CADForge smoke disabled by default. Set RUN_GRPO_CADFORGE_SMOKE = True in the config cell to run it.\")"
323
- ]
324
- },
325
- {
326
- "cell_type": "code",
327
- "execution_count": null,
328
- "metadata": {},
329
- "outputs": [],
330
- "source": [
331
- "%%bash\n",
332
- "set -euo pipefail\n",
333
- "cd /content/open-env-meta-final-hackathon\n",
334
- "export PATH=\"$HOME/.local/bin:$PATH\"\n",
335
- "export HF_HUB_DISABLE_XET=1\n",
336
- "export CADFORGE_PYTHON=/content/open-env-meta-final-hackathon/experiment-2-cadforge/.venv/bin/python\n",
337
- "\n",
338
- "GRPO_READY=$(python - <<'PY'\n",
339
- "import os, torch\n",
340
- "run = os.environ.get('RUN_GRPO_CADFORGE_SMOKE', '').lower() in {'1', 'true', 'yes'}\n",
341
- "print('1' if run and torch.cuda.is_available() else '0')\n",
342
- "PY\n",
343
- ")\n",
344
- "echo \"GRPO ready: $GRPO_READY\"\n",
345
- "if [ \"$GRPO_READY\" != \"1\" ]; then\n",
346
- " echo \"Skipping optional GRPO smoke. Change RUN_GRPO_CADFORGE_SMOKE=1 in this cell and use a GPU runtime to run it.\"\n",
347
- " exit 0\n",
348
- "fi\n",
349
- "\n",
350
- "uv run training/train_grpo_cadforge.py \\\n",
351
- " --model unsloth/Qwen3.5-2B \\\n",
352
- " --adapter outputs/notebook-qwen35-2b-cadforge-sft-3step \\\n",
353
- " --output-dir outputs/notebook-qwen35-2b-cadforge-grpo-3step \\\n",
354
- " --reward-backend cadforge \\\n",
355
- " --strict-build-gate \\\n",
356
- " --cadforge-python \"$CADFORGE_PYTHON\" \\\n",
357
- " --cadforge-reward-mode fast \\\n",
358
- " --limit-prompts 2 \\\n",
359
- " --max-steps 3 \\\n",
360
- " --num-generations 2 \\\n",
361
- " --max-prompt-length 2048 \\\n",
362
- " --max-completion-length 1024 \\\n",
363
- " --max-seq-length 4096 \\\n",
364
- " --per-device-train-batch-size 2 \\\n",
365
- " --gradient-accumulation-steps 1 \\\n",
366
- " --learning-rate 5e-6 \\\n",
367
- " --debug-completions-jsonl training/logs/notebook-grpo-3step-completions.jsonl \\\n",
368
- " --run-name notebook-qwen35-2b-grpo-3step"
369
- ]
370
- },
371
- {
372
- "cell_type": "markdown",
373
- "metadata": {},
374
- "source": [
375
- "## 7. Exact production scripts\n",
376
- "\n",
377
- "The full submitted run used the scripts below, not notebook-only code:\n",
378
- "\n",
379
- "- `training/train_sft_unsloth.py`\n",
380
- "- `training/train_grpo_cadforge.py`\n",
381
- "- `training/evaluate_cadforge_model.py`\n",
382
- "- `training/make_training_report.py`\n",
383
- "- `training/cadforge_training_pipeline.sh`\n",
384
- "- `training/run_strict_9b_grpo.sh`\n",
385
- "\n",
386
- "The final strict GRPO run used build gating: failed CAD builds received negative reward, while successful builds unlocked dense topology, semantic, reference, and editability rewards."
387
- ]
388
- },
389
- {
390
- "cell_type": "code",
391
- "execution_count": null,
392
- "metadata": {},
393
- "outputs": [],
394
- "source": [
395
- "%%bash\n",
396
- "set -euo pipefail\n",
397
- "cd /content/open-env-meta-final-hackathon\n",
398
- "export PATH=\"$HOME/.local/bin:$PATH\"\n",
399
- "\n",
400
- "printf '\\n--- SFT script CLI ---\\n'\n",
401
- "uv run training/train_sft_unsloth.py --help | sed -n '1,120p'\n",
402
- "\n",
403
- "printf '\\n--- GRPO script CLI ---\\n'\n",
404
- "uv run training/train_grpo_cadforge.py --help | sed -n '1,180p'\n",
405
- "\n",
406
- "printf '\\n--- Strict 9B production wrapper ---\\n'\n",
407
- "sed -n '1,220p' training/run_strict_9b_grpo.sh"
408
- ]
409
- },
410
- {
411
- "cell_type": "markdown",
412
- "metadata": {},
413
- "source": [
414
- "## 8. Production run evidence\n",
415
- "\n",
416
- "These reports are committed in the public repo and mirrored in the Hugging Face Space repo. They are the evidence for the real H200 training run."
417
- ]
418
- },
419
- {
420
- "cell_type": "code",
421
- "execution_count": null,
422
- "metadata": {},
423
- "outputs": [],
424
- "source": [
425
- "from IPython.display import Image, Markdown, display\n",
426
- "from pathlib import Path\n",
427
- "\n",
428
- "repo = Path('/content/open-env-meta-final-hackathon')\n",
429
- "report_dir = repo / 'training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build'\n",
430
- "\n",
431
- "for image_name in ['grpo_reward_curve.png', 'grpo_code_health.png', 'grpo_error_breakdown.png']:\n",
432
- " path = report_dir / image_name\n",
433
- " if path.exists():\n",
434
- " display(Markdown(f'### {image_name}'))\n",
435
- " display(Image(filename=str(path)))\n",
436
- " else:\n",
437
- " print('Missing', path)"
438
- ]
439
- },
440
- {
441
- "cell_type": "code",
442
- "execution_count": null,
443
- "metadata": {},
444
- "outputs": [],
445
- "source": [
446
- "from IPython.display import Markdown, display\n",
447
- "from pathlib import Path\n",
448
- "\n",
449
- "repo = Path('/content/open-env-meta-final-hackathon')\n",
450
- "for rel in [\n",
451
- " 'training/RUNPOD_SMOKE_REPORT.md',\n",
452
- " 'training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/training_curve_report.md',\n",
453
- " 'training/eval/qwen35-9b-cadforge-grpo-strict-build-20260426-strict-build/eval_report.md',\n",
454
- "]:\n",
455
- " path = repo / rel\n",
456
- " display(Markdown(f'## {rel}'))\n",
457
- " if path.exists():\n",
458
- " display(Markdown(path.read_text()))\n",
459
- " else:\n",
460
- " print('Missing', path)"
461
- ]
462
- },
463
- {
464
- "cell_type": "markdown",
465
- "metadata": {},
466
- "source": [
467
- "## 9. Summary of the submitted training run\n",
468
- "\n",
469
- "- Qwen3.5-2B SFT: train loss `1.4480 -> 0.1658`, eval loss `0.4477 -> 0.2676`.\n",
470
- "- Qwen3.5-9B SFT: train loss `2.6020 -> 0.1413`, eval loss `0.3650 -> 0.2398`.\n",
471
- "- Qwen3.5-9B strict GRPO: `320` completions, `96` buildable completions, best CADForge score `0.9352`.\n",
472
- "- Quick held-out eval after strict GRPO: `2/3` prompts built successfully.\n",
473
- "\n",
474
- "The 3-step Colab run above is a reproducibility smoke test. The linked reports and adapters are the full training artifacts."
475
- ]
476
- }
477
- ],
478
- "metadata": {
479
- "accelerator": "GPU",
480
- "colab": {
481
- "include_colab_link": true,
482
- "provenance": []
483
- },
484
- "kernelspec": {
485
- "display_name": "Python 3",
486
- "language": "python",
487
- "name": "python3"
488
- },
489
- "language_info": {
490
- "name": "python",
491
- "version": "3.x"
492
- }
493
  },
494
- "nbformat": 4,
495
- "nbformat_minor": 5
496
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# CADForge OpenEnv Training Run Notebook\n",
8
+ "\n",
9
+ "[Open this notebook in Google Colab](https://colab.research.google.com/github/sanjuhs/open-env-meta-final-hackathon/blob/main/training/cadforge_openenv_training_colab.ipynb) | [Notebook in the HF Space repo](https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/blob/main/training/cadforge_openenv_training_colab.ipynb)\n",
10
+ "\n",
11
+ "This is the public, judge-runnable training notebook for CADForge. It is intentionally split into two parts:\n",
12
+ "\n",
13
+ "1. **Colab smoke run**: runs a tiny 3-step SFT training job and validates the CADForge reward pipeline so judges can verify the code path without renting an H200.\n",
14
+ "2. **Production run evidence**: links to the exact scripts, reports, model adapters, dataset, and full H200 commands used for the submitted runs.\n",
15
+ "\n",
16
+ "The final submitted models were trained outside Colab on a RunPod H200 because Qwen3.5 9B + CADForge GRPO is too large for free Colab. The notebook does not pretend to rerun the full job; it proves the same training scripts and public dataset execute end-to-end on a tiny sample."
17
+ ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  },
19
+ {
20
+ "cell_type": "markdown",
21
+ "metadata": {},
22
+ "source": [
23
+ "## What judges should check\n",
24
+ "\n",
25
+ "- The repo clone and dependency install succeed.\n",
26
+ "- The OpenEnv app validates.\n",
27
+ "- The public HF dataset loads.\n",
28
+ "- The CADForge reward smoke test compiles and scores real CadQuery.\n",
29
+ "- The 3-step SFT smoke run calls `training/train_sft_unsloth.py`, the same script used for the production SFT runs.\n",
30
+ "- The optional GRPO cell calls `training/train_grpo_cadforge.py`, the same script used for strict build-gated GRPO.\n",
31
+ "- The final report cells show the committed metrics and generated artifacts from the full H200 training run."
32
+ ]
33
+ },
34
+ {
35
+ "cell_type": "markdown",
36
+ "metadata": {},
37
+ "source": [
38
+ "## Public artifacts\n",
39
+ "\n",
40
+ "- Project repo: https://github.com/sanjuhs/open-env-meta-final-hackathon\n",
41
+ "- Hugging Face Space repo: https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv\n",
42
+ "- Training dataset: https://huggingface.co/datasets/sanjuhs/cadforge-cadquery-agentic-traces\n",
43
+ "- 2B SFT adapter: https://huggingface.co/sanjuhs/qwen35-2b-cadforge-sft-lora\n",
44
+ "- 2B GRPO adapter: https://huggingface.co/sanjuhs/qwen35-2b-cadforge-grpo-lora\n",
45
+ "- 9B SFT adapter: https://huggingface.co/sanjuhs/qwen35-9b-cadforge-sft-lora\n",
46
+ "- 9B GRPO adapter: https://huggingface.co/sanjuhs/qwen35-9b-cadforge-grpo-lora\n",
47
+ "- 9B strict build-gated GRPO adapter: https://huggingface.co/sanjuhs/qwen35-9b-cadforge-grpo-strict-build-lora"
48
+ ]
49
+ },
50
+ {
51
+ "cell_type": "markdown",
52
+ "metadata": {},
53
+ "source": [
54
+ "## 1. Configure\n",
55
+ "\n",
56
+ "For a normal judge rerun, keep `RUN_SFT_SMOKE = True` and `RUN_GRPO_CADFORGE_SMOKE = False`.\n",
57
+ "\n",
58
+ "The SFT smoke is the best Colab-friendly proof that the training script is real. The CADForge reward smoke separately proves that the OpenEnv/CadQuery reward backend is real. GRPO is left optional because even a tiny CADForge GRPO run can be slow or memory-heavy on Colab T4/L4 runtimes."
59
+ ]
60
+ },
61
+ {
62
+ "cell_type": "code",
63
+ "execution_count": null,
64
+ "metadata": {},
65
+ "outputs": [],
66
+ "source": [
67
+ "import os\n",
68
+ "from pathlib import Path\n",
69
+ "\n",
70
+ "REPO_URL = \"https://github.com/sanjuhs/open-env-meta-final-hackathon.git\"\n",
71
+ "REPO_DIR = Path(\"/content/open-env-meta-final-hackathon\")\n",
72
+ "\n",
73
+ "# Public dataset produced for CADForge.\n",
74
+ "DATASET = \"sanjuhs/cadforge-cadquery-agentic-traces\"\n",
75
+ "\n",
76
+ "# Keep these small for public judge reruns.\n",
77
+ "RUN_SFT_SMOKE = True\n",
78
+ "RUN_GRPO_CADFORGE_SMOKE = False\n",
79
+ "SFT_SMOKE_STEPS = 3\n",
80
+ "\n",
81
+ "# Make the flags visible to subprocess-backed shell cells.\n",
82
+ "os.environ[\"RUN_SFT_SMOKE\"] = \"1\" if RUN_SFT_SMOKE else \"0\"\n",
83
+ "os.environ[\"RUN_GRPO_CADFORGE_SMOKE\"] = \"1\" if RUN_GRPO_CADFORGE_SMOKE else \"0\"\n",
84
+ "\n",
85
+ "# Optional. Only needed if you want to push adapters or access private repos.\n",
86
+ "os.environ.setdefault(\"HF_TOKEN\", \"\")\n",
87
+ "\n",
88
+ "BASE_2B = \"unsloth/Qwen3.5-2B\"\n",
89
+ "BASE_9B = \"unsloth/Qwen3.5-9B\"\n",
90
+ "print(\"Configured notebook smoke run\")\n",
91
+ "\n",
92
+ "import subprocess\n",
93
+ "import textwrap\n",
94
+ "\n",
95
+ "def run_bash(script: str, check: bool = True) -> None:\n",
96
+ " \"\"\"Run a bash snippet from a normal Python Colab cell.\n",
97
+ "\n",
98
+ " Colab occasionally makes bash-magic cells awkward to save or rerun from a\n",
99
+ " shared notebook. Keeping shell commands behind this helper makes every cell\n",
100
+ " valid Python while still running the exact production commands.\n",
101
+ " \"\"\"\n",
102
+ " env = os.environ.copy()\n",
103
+ " env[\"PATH\"] = f\"{Path.home()}/.local/bin:\" + env.get(\"PATH\", \"\")\n",
104
+ " subprocess.run(\n",
105
+ " textwrap.dedent(script),\n",
106
+ " shell=True,\n",
107
+ " executable=\"/bin/bash\",\n",
108
+ " check=check,\n",
109
+ " env=env,\n",
110
+ " )\n"
111
+ ]
112
+ },
113
+ {
114
+ "cell_type": "markdown",
115
+ "metadata": {},
116
+ "source": [
117
+ "## 2. Clone and install\n",
118
+ "\n",
119
+ "This uses `uv` to match the production RunPod environment. The install can take several minutes on a fresh Colab runtime."
120
+ ]
121
+ },
122
+ {
123
+ "cell_type": "code",
124
+ "execution_count": null,
125
+ "metadata": {},
126
+ "outputs": [],
127
+ "source": [
128
+ "run_bash(r\"\"\"\n",
129
+ "set -euo pipefail\n",
130
+ "\n",
131
+ "if [ ! -d /content/open-env-meta-final-hackathon/.git ]; then\n",
132
+ " rm -rf /content/open-env-meta-final-hackathon\n",
133
+ " git clone --depth 1 https://github.com/sanjuhs/open-env-meta-final-hackathon.git /content/open-env-meta-final-hackathon\n",
134
+ "else\n",
135
+ " cd /content/open-env-meta-final-hackathon\n",
136
+ " git fetch --depth 1 origin main\n",
137
+ " git reset --hard origin/main\n",
138
+ "fi\n",
139
+ "\n",
140
+ "cd /content/open-env-meta-final-hackathon\n",
141
+ "curl -LsSf https://astral.sh/uv/install.sh | sh || true\n",
142
+ "export PATH=\"$HOME/.local/bin:$PATH\"\n",
143
+ "\n",
144
+ "export UV_CACHE_DIR=/content/.uv-cache\n",
145
+ "export HF_HOME=/content/.cache/huggingface\n",
146
+ "export TORCH_HOME=/content/.cache/torch\n",
147
+ "export TRITON_CACHE_DIR=/content/.cache/triton\n",
148
+ "export VLLM_CACHE_ROOT=/content/.cache/vllm\n",
149
+ "export UV_LINK_MODE=copy\n",
150
+ "export HF_HUB_ENABLE_HF_TRANSFER=1\n",
151
+ "export HF_HUB_DISABLE_XET=1\n",
152
+ "\n",
153
+ "uv sync --project experiment-2-cadforge\n",
154
+ "uv run --project experiment-2-cadforge python - <<'PY'\n",
155
+ "import numpy, trimesh, scipy, cadquery\n",
156
+ "print('CADForge project env OK:', numpy.__version__)\n",
157
+ "PY\n",
158
+ "\"\"\")\n"
159
+ ]
160
+ },
161
+ {
162
+ "cell_type": "markdown",
163
+ "metadata": {},
164
+ "source": [
165
+ "## 3. Validate the OpenEnv submission and inline reward backend\n",
166
+ "\n",
167
+ "This validates the OpenEnv app, then evaluates a small pasted CadQuery program through the real CADForge reward environment.\n",
168
+ "\n",
169
+ "The old root-level smoke command is kept as comments inside the cell because it can fail in Colab when `cadquery_env.py` runs outside the `experiment-2-cadforge` virtualenv. The working path is `uv run --project experiment-2-cadforge ...`, which gives the reward code access to `numpy`, `trimesh`, `scipy`, and `cadquery`.\n"
170
+ ]
171
+ },
172
+ {
173
+ "cell_type": "code",
174
+ "execution_count": null,
175
+ "metadata": {},
176
+ "outputs": [],
177
+ "source": [
178
+ "run_bash(r\"\"\"\n",
179
+ "set -euo pipefail\n",
180
+ "cd /content/open-env-meta-final-hackathon\n",
181
+ "export PATH=\"$HOME/.local/bin:$PATH\"\n",
182
+ "\n",
183
+ "uv run --project experiment-2-cadforge openenv validate experiment-2-cadforge\n",
184
+ "\n",
185
+ "# Kept for reference only. In Colab this can run from the root uv env and miss\n",
186
+ "# numpy/cadquery/trimesh, which causes ModuleNotFoundError inside cadquery_env.py.\n",
187
+ "# uv run training/smoke_cadforge_reward.py --reward-mode fast\n",
188
+ "\n",
189
+ "cat > /tmp/cadforge_inline_candidate.py <<'CANDIDATEPY'\n",
190
+ "import cadquery as cq\n",
191
+ "\n",
192
+ "# Tiny editable wall-mounted J-hook for the notebook smoke test.\n",
193
+ "# This is deliberately simple so judges can verify the CADForge reward path quickly.\n",
194
+ "plate_w = 46.0\n",
195
+ "plate_h = 72.0\n",
196
+ "plate_t = 6.0\n",
197
+ "hook_len = 58.0\n",
198
+ "hook_t = 12.0\n",
199
+ "hook_drop = 34.0\n",
200
+ "rib_t = 8.0\n",
201
+ "hole_d = 6.0\n",
202
+ "\n",
203
+ "plate = cq.Workplane(\"XY\").box(plate_w, plate_t, plate_h).translate((0, 0, plate_h / 2.0))\n",
204
+ "\n",
205
+ "upper_hole = (\n",
206
+ " cq.Workplane(\"XY\")\n",
207
+ " .circle(hole_d / 2.0)\n",
208
+ " .extrude(plate_t + 2.0)\n",
209
+ " .translate((0, -1.0, plate_h * 0.72))\n",
210
+ ")\n",
211
+ "lower_hole = (\n",
212
+ " cq.Workplane(\"XY\")\n",
213
+ " .circle(hole_d / 2.0)\n",
214
+ " .extrude(plate_t + 2.0)\n",
215
+ " .translate((0, -1.0, plate_h * 0.28))\n",
216
+ ")\n",
217
+ "plate = plate.cut(upper_hole).cut(lower_hole)\n",
218
+ "\n",
219
+ "arm = cq.Workplane(\"XY\").box(hook_len, hook_t, hook_t).translate((hook_len / 2.0, 0, plate_h * 0.52))\n",
220
+ "tip = cq.Workplane(\"XY\").box(hook_t, hook_t, hook_drop).translate((hook_len, 0, plate_h * 0.52 - hook_drop / 2.0))\n",
221
+ "rib = cq.Workplane(\"XY\").box(hook_len * 0.65, rib_t, rib_t).translate((hook_len * 0.33, 0, plate_h * 0.38))\n",
222
+ "\n",
223
+ "fixture = plate.union(arm).union(tip).union(rib).clean()\n",
224
+ "CANDIDATEPY\n",
225
+ "\n",
226
+ "uv run --project experiment-2-cadforge python \\\n",
227
+ " experiment-2-cadforge/python_tools/cadquery_env.py evaluate \\\n",
228
+ " --code-file /tmp/cadforge_inline_candidate.py \\\n",
229
+ " --episode-id notebook-inline-smoke \\\n",
230
+ " --step-id pasted-cadquery-candidate \\\n",
231
+ " --task-spec wall_j_hook_120n \\\n",
232
+ " --task-prompt \"Design a simple editable wall mounted J hook with a wall plate, hook arm, support rib, and screw holes.\" \\\n",
233
+ " --reward-mode fast\n",
234
+ "\"\"\")\n"
235
+ ]
236
+ },
237
+ {
238
+ "cell_type": "markdown",
239
+ "metadata": {},
240
+ "source": [
241
+ "## 4. Inspect the public training data\n",
242
+ "\n",
243
+ "The production scripts download from the same public Hugging Face dataset if local JSONL files are absent."
244
+ ]
245
+ },
246
+ {
247
+ "cell_type": "code",
248
+ "execution_count": null,
249
+ "metadata": {},
250
+ "outputs": [],
251
+ "source": [
252
+ "run_bash(r\"\"\"\n",
253
+ "set -euo pipefail\n",
254
+ "cd /content/open-env-meta-final-hackathon\n",
255
+ "export PATH=\"$HOME/.local/bin:$PATH\"\n",
256
+ "\n",
257
+ "uv run --with datasets --with huggingface_hub python - <<'PY'\n",
258
+ "from datasets import load_dataset\n",
259
+ "\n",
260
+ "repo = \"sanjuhs/cadforge-cadquery-agentic-traces\"\n",
261
+ "files = {\n",
262
+ " \"sft_train\": \"data/sft/prompt_to_cadquery_train.jsonl\",\n",
263
+ " \"repair_train\": \"data/repair/cadquery_repair_train.jsonl\",\n",
264
+ " \"rl_rollouts\": \"data/rl/cadquery_rollouts.jsonl\",\n",
265
+ "}\n",
266
+ "for name, path in files.items():\n",
267
+ " ds = load_dataset(\"json\", data_files={name: f\"hf://datasets/{repo}/{path}\"}, split=name)\n",
268
+ " print(name, len(ds), list(ds[0].keys()))\n",
269
+ "PY\n",
270
+ "\"\"\")\n"
271
+ ]
272
+ },
273
+ {
274
+ "cell_type": "markdown",
275
+ "metadata": {},
276
+ "source": [
277
+ "## 5. Three-step SFT smoke training\n",
278
+ "\n",
279
+ "This runs the same SFT script used for the full runs, but with a tiny row limit and 3 optimizer steps. It is the Colab-friendly proof of the training entrypoint.\n",
280
+ "\n",
281
+ "The production SFT runs used longer schedules and larger context on RunPod H200. Those commands and reports are shown below."
282
+ ]
283
+ },
284
+ {
285
+ "cell_type": "code",
286
+ "execution_count": null,
287
+ "metadata": {},
288
+ "outputs": [],
289
+ "source": [
290
+ "run_bash(r\"\"\"\n",
291
+ "set -euo pipefail\n",
292
+ "cd /content/open-env-meta-final-hackathon\n",
293
+ "export PATH=\"$HOME/.local/bin:$PATH\"\n",
294
+ "export HF_HUB_DISABLE_XET=1\n",
295
+ "\n",
296
+ "if [ \"${RUN_SFT_SMOKE:-1}\" != \"1\" ]; then\n",
297
+ " echo \"SFT smoke disabled in config.\"\n",
298
+ " exit 0\n",
299
+ "fi\n",
300
+ "\n",
301
+ "CUDA_READY=$(python - <<'PY'\n",
302
+ "import torch\n",
303
+ "print('1' if torch.cuda.is_available() else '0')\n",
304
+ "PY\n",
305
+ ")\n",
306
+ "echo \"CUDA available: $CUDA_READY\"\n",
307
+ "if [ \"$CUDA_READY\" != \"1\" ]; then\n",
308
+ " echo \"Skipping SFT smoke: switch Colab runtime to GPU, then rerun this cell.\"\n",
309
+ " exit 0\n",
310
+ "fi\n",
311
+ "\n",
312
+ "uv run training/train_sft_unsloth.py \\\n",
313
+ " --model unsloth/Qwen3.5-2B \\\n",
314
+ " --output-dir outputs/notebook-qwen35-2b-cadforge-sft-3step \\\n",
315
+ " --max-steps 3 \\\n",
316
+ " --limit-train-rows 12 \\\n",
317
+ " --limit-val-rows 4 \\\n",
318
+ " --max-seq-length 2048 \\\n",
319
+ " --per-device-train-batch-size 1 \\\n",
320
+ " --gradient-accumulation-steps 1 \\\n",
321
+ " --eval-steps 3 \\\n",
322
+ " --save-steps 3 \\\n",
323
+ " --lora-r 16 \\\n",
324
+ " --lora-alpha 32 \\\n",
325
+ " --load-in-4bit \\\n",
326
+ " --run-name notebook-qwen35-2b-sft-3step\n",
327
+ "\"\"\")\n"
328
+ ]
329
+ },
330
+ {
331
+ "cell_type": "markdown",
332
+ "metadata": {},
333
+ "source": [
334
+ "## 6. Optional 3-step GRPO smoke\n",
335
+ "\n",
336
+ "This calls the actual GRPO training script with strict build gating and the real CADForge reward backend. Leave it disabled for quick judging; enable it only on a strong GPU runtime."
337
+ ]
338
+ },
339
+ {
340
+ "cell_type": "code",
341
+ "execution_count": null,
342
+ "metadata": {},
343
+ "outputs": [],
344
+ "source": [
345
+ "if not RUN_GRPO_CADFORGE_SMOKE:\n",
346
+ " print(\"GRPO CADForge smoke disabled by default. Set RUN_GRPO_CADFORGE_SMOKE = True in the config cell to run it.\")"
347
+ ]
348
+ },
349
+ {
350
+ "cell_type": "code",
351
+ "execution_count": null,
352
+ "metadata": {},
353
+ "outputs": [],
354
+ "source": [
355
+ "run_bash(r\"\"\"\n",
356
+ "set -euo pipefail\n",
357
+ "cd /content/open-env-meta-final-hackathon\n",
358
+ "export PATH=\"$HOME/.local/bin:$PATH\"\n",
359
+ "export HF_HUB_DISABLE_XET=1\n",
360
+ "export CADFORGE_PYTHON=/content/open-env-meta-final-hackathon/experiment-2-cadforge/.venv/bin/python\n",
361
+ "\n",
362
+ "GRPO_READY=$(python - <<'PY'\n",
363
+ "import os, torch\n",
364
+ "run = os.environ.get('RUN_GRPO_CADFORGE_SMOKE', '').lower() in {'1', 'true', 'yes'}\n",
365
+ "print('1' if run and torch.cuda.is_available() else '0')\n",
366
+ "PY\n",
367
+ ")\n",
368
+ "echo \"GRPO ready: $GRPO_READY\"\n",
369
+ "if [ \"$GRPO_READY\" != \"1\" ]; then\n",
370
+ " echo \"Skipping optional GRPO smoke. Change RUN_GRPO_CADFORGE_SMOKE=1 in this cell and use a GPU runtime to run it.\"\n",
371
+ " exit 0\n",
372
+ "fi\n",
373
+ "\n",
374
+ "uv run training/train_grpo_cadforge.py \\\n",
375
+ " --model unsloth/Qwen3.5-2B \\\n",
376
+ " --adapter outputs/notebook-qwen35-2b-cadforge-sft-3step \\\n",
377
+ " --output-dir outputs/notebook-qwen35-2b-cadforge-grpo-3step \\\n",
378
+ " --reward-backend cadforge \\\n",
379
+ " --strict-build-gate \\\n",
380
+ " --cadforge-python \"$CADFORGE_PYTHON\" \\\n",
381
+ " --cadforge-reward-mode fast \\\n",
382
+ " --limit-prompts 2 \\\n",
383
+ " --max-steps 3 \\\n",
384
+ " --num-generations 2 \\\n",
385
+ " --max-prompt-length 2048 \\\n",
386
+ " --max-completion-length 1024 \\\n",
387
+ " --max-seq-length 4096 \\\n",
388
+ " --per-device-train-batch-size 2 \\\n",
389
+ " --gradient-accumulation-steps 1 \\\n",
390
+ " --learning-rate 5e-6 \\\n",
391
+ " --debug-completions-jsonl training/logs/notebook-grpo-3step-completions.jsonl \\\n",
392
+ " --run-name notebook-qwen35-2b-grpo-3step\n",
393
+ "\"\"\")\n"
394
+ ]
395
+ },
396
+ {
397
+ "cell_type": "markdown",
398
+ "metadata": {},
399
+ "source": [
400
+ "## 7. Exact production scripts\n",
401
+ "\n",
402
+ "The full submitted run used the scripts below, not notebook-only code:\n",
403
+ "\n",
404
+ "- `training/train_sft_unsloth.py`\n",
405
+ "- `training/train_grpo_cadforge.py`\n",
406
+ "- `training/evaluate_cadforge_model.py`\n",
407
+ "- `training/make_training_report.py`\n",
408
+ "- `training/cadforge_training_pipeline.sh`\n",
409
+ "- `training/run_strict_9b_grpo.sh`\n",
410
+ "\n",
411
+ "The final strict GRPO run used build gating: failed CAD builds received negative reward, while successful builds unlocked dense topology, semantic, reference, and editability rewards."
412
+ ]
413
+ },
414
+ {
415
+ "cell_type": "code",
416
+ "execution_count": null,
417
+ "metadata": {},
418
+ "outputs": [],
419
+ "source": [
420
+ "run_bash(r\"\"\"\n",
421
+ "set -euo pipefail\n",
422
+ "cd /content/open-env-meta-final-hackathon\n",
423
+ "export PATH=\"$HOME/.local/bin:$PATH\"\n",
424
+ "\n",
425
+ "printf '\\n--- SFT script CLI ---\\n'\n",
426
+ "uv run training/train_sft_unsloth.py --help | sed -n '1,120p'\n",
427
+ "\n",
428
+ "printf '\\n--- GRPO script CLI ---\\n'\n",
429
+ "uv run training/train_grpo_cadforge.py --help | sed -n '1,180p'\n",
430
+ "\n",
431
+ "printf '\\n--- Strict 9B production wrapper ---\\n'\n",
432
+ "sed -n '1,220p' training/run_strict_9b_grpo.sh\n",
433
+ "\"\"\")\n"
434
+ ]
435
+ },
436
+ {
437
+ "cell_type": "markdown",
438
+ "metadata": {},
439
+ "source": [
440
+ "## 8. Production run evidence\n",
441
+ "\n",
442
+ "These reports are committed in the public repo and mirrored in the Hugging Face Space repo. They are the evidence for the real H200 training run."
443
+ ]
444
+ },
445
+ {
446
+ "cell_type": "code",
447
+ "execution_count": null,
448
+ "metadata": {},
449
+ "outputs": [],
450
+ "source": [
451
+ "from IPython.display import Image, Markdown, display\n",
452
+ "from pathlib import Path\n",
453
+ "\n",
454
+ "repo = Path('/content/open-env-meta-final-hackathon')\n",
455
+ "report_dir = repo / 'training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build'\n",
456
+ "\n",
457
+ "for image_name in ['grpo_reward_curve.png', 'grpo_code_health.png', 'grpo_error_breakdown.png']:\n",
458
+ " path = report_dir / image_name\n",
459
+ " if path.exists():\n",
460
+ " display(Markdown(f'### {image_name}'))\n",
461
+ " display(Image(filename=str(path)))\n",
462
+ " else:\n",
463
+ " print('Missing', path)"
464
+ ]
465
+ },
466
+ {
467
+ "cell_type": "code",
468
+ "execution_count": null,
469
+ "metadata": {},
470
+ "outputs": [],
471
+ "source": [
472
+ "from IPython.display import Markdown, display\n",
473
+ "from pathlib import Path\n",
474
+ "\n",
475
+ "repo = Path('/content/open-env-meta-final-hackathon')\n",
476
+ "for rel in [\n",
477
+ " 'training/RUNPOD_SMOKE_REPORT.md',\n",
478
+ " 'training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/training_curve_report.md',\n",
479
+ " 'training/eval/qwen35-9b-cadforge-grpo-strict-build-20260426-strict-build/eval_report.md',\n",
480
+ "]:\n",
481
+ " path = repo / rel\n",
482
+ " display(Markdown(f'## {rel}'))\n",
483
+ " if path.exists():\n",
484
+ " display(Markdown(path.read_text()))\n",
485
+ " else:\n",
486
+ " print('Missing', path)"
487
+ ]
488
+ },
489
+ {
490
+ "cell_type": "markdown",
491
+ "metadata": {},
492
+ "source": [
493
+ "## 9. Summary of the submitted training run\n",
494
+ "\n",
495
+ "- Qwen3.5-2B SFT: train loss `1.4480 -> 0.1658`, eval loss `0.4477 -> 0.2676`.\n",
496
+ "- Qwen3.5-9B SFT: train loss `2.6020 -> 0.1413`, eval loss `0.3650 -> 0.2398`.\n",
497
+ "- Qwen3.5-9B strict GRPO: `320` completions, `96` buildable completions, best CADForge score `0.9352`.\n",
498
+ "- Quick held-out eval after strict GRPO: `2/3` prompts built successfully.\n",
499
+ "\n",
500
+ "The 3-step Colab run above is a reproducibility smoke test. The linked reports and adapters are the full training artifacts."
501
+ ]
502
+ }
503
+ ],
504
+ "metadata": {
505
+ "accelerator": "GPU",
506
+ "colab": {
507
+ "include_colab_link": true,
508
+ "provenance": []
509
+ },
510
+ "kernelspec": {
511
+ "display_name": "Python 3",
512
+ "language": "python",
513
+ "name": "python3"
514
+ },
515
+ "language_info": {
516
+ "name": "python",
517
+ "version": "3.x"
518
+ }
519
+ },
520
+ "nbformat": 4,
521
+ "nbformat_minor": 5
522
+ }