Spaces:

sanjuhs
/

cadforge-cadquery-openenv

Running

App Files Files Community

cadforge-cadquery-openenv / README.md

sanjuhs

Link real GitHub training scripts gist

1551e7d verified 12 days ago

preview code

raw

history blame contribute delete

11.9 kB

	---
	title: CADForge CadQuery
	emoji: 🪑
	colorFrom: green
	colorTo: blue
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv
	- cadquery
	- reinforcement-learning
	---

	# CADForge Experiment 2

	CADForge is an OpenEnv environment for training LLMs to produce editable, buildable CadQuery CAD.

	The agent receives a design request, writes a complete CadQuery Python file, and the environment runs real CAD tooling: CadQuery build, STL export, topology checks, semantic scoring, reference similarity, editability scoring, and persistent artifact logging.

	## Judge-Facing Links

	- GitHub repo: [sanjuhs/open-env-meta-final-hackathon](https://github.com/sanjuhs/open-env-meta-final-hackathon)
	- GitHub Gist: training scripts: [CADForge OpenEnv SFT/GRPO scripts](https://gist.github.com/sanjuhs/10596f688e8b4560910a3b1b137bfeeb)
	- Raw training logs and evidence: [sanjuhs/cadforge-training-evidence](https://huggingface.co/datasets/sanjuhs/cadforge-training-evidence)
	- Training notebook on this HF Space: [training/cadforge_openenv_training_colab.ipynb](training/cadforge_openenv_training_colab.ipynb)
	- Open the same notebook in Google Colab: [Colab training notebook](https://colab.research.google.com/github/sanjuhs/open-env-meta-final-hackathon/blob/main/training/cadforge_openenv_training_colab.ipynb)
	- Mini-blog: [CADFORGE_BLOG.md](CADFORGE_BLOG.md)
	- Detailed technical blog: [docs/detailed-blog/cadforge-detailed-blog.md](docs/detailed-blog/cadforge-detailed-blog.md)
	- Full project report: [docs/cadforge-openenv-project-report.md](docs/cadforge-openenv-project-report.md)
	- Self-improving RLVE design: [docs/brainstorm/21-cadforge-self-improving-rlve.md](docs/brainstorm/21-cadforge-self-improving-rlve.md)
	- Strict GRPO training report: [training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/training_curve_report.md](https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/blob/main/training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/training_curve_report.md)
	- Strict GRPO eval report: [training/eval/qwen35-9b-cadforge-grpo-strict-build-20260426-strict-build/eval_report.md](training/eval/qwen35-9b-cadforge-grpo-strict-build-20260426-strict-build/eval_report.md)
	- Inference comparison: [inference/results/stator-qwen-vs-frontier/report.md](https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/blob/main/inference/results/stator-qwen-vs-frontier/report.md)
	- Training dataset: [sanjuhs/cadforge-cadquery-agentic-traces](https://huggingface.co/datasets/sanjuhs/cadforge-cadquery-agentic-traces)
	- Training logs and evidence bundle: [sanjuhs/cadforge-training-evidence](https://huggingface.co/datasets/sanjuhs/cadforge-training-evidence)
	- Strict 9B GRPO LoRA: [sanjuhs/qwen35-9b-cadforge-grpo-strict-build-lora](https://huggingface.co/sanjuhs/qwen35-9b-cadforge-grpo-strict-build-lora)
	- Adaptive repair GRPO LoRA: [sanjuhs/qwen35-9b-cadforge-grpo-adaptive-repair-lora](https://huggingface.co/sanjuhs/qwen35-9b-cadforge-grpo-adaptive-repair-lora)

	RunPod/H200 clarification: the full 2B/9B SFT and GRPO runs were executed on RunPod H200 as distinct production scripts. The Colab notebook is the judge-runnable smoke path that validates OpenEnv, the public dataset, the CadQuery reward backend, and tiny SFT/GRPO launches using those same scripts.

	## Results Snapshot

	\| Run \| Result \|
	\|---\|---\|
	\| Qwen3.5-2B SFT \| train loss `1.4480 -> 0.1658`, eval loss `0.4477 -> 0.2676` \|
	\| Qwen3.5-2B dense GRPO \| mean reward `0.3387`, best `0.5303`; useful reward signal but too forgiving on broken builds \|
	\| Qwen3.5-9B SFT \| train loss `2.6020 -> 0.1413`, eval loss `0.3650 -> 0.2398` \|
	\| Qwen3.5-9B strict GRPO \| `320` completions, `96` buildable, best CADForge score `0.9352` \|
	\| Qwen3.5-9B adaptive repair GRPO \| `180` repair completions, `53` buildable, `0` clipped completions \|
	\| Strict 9B quick eval \| `2/3` held-out prompts built successfully \|
	\| Stator inference comparison \| base Qwen failed build; RL-tuned Qwen built a `0.654` stator; GPT-5.4 built a `0.709` stator \|

	![Strict GRPO reward curve](https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/resolve/main/training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/grpo_reward_curve.png)

	![Strict GRPO code health](https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/resolve/main/training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/grpo_code_health.png)

	![Base Qwen vs RL-tuned Qwen vs GPT-5.4 stator comparison](https://huggingface.co/spaces/sanjuhs/cadforge-cadquery-openenv/resolve/main/inference/results/stator-qwen-vs-frontier/comparison.png)

	## Training Logs

	The raw logs are backed up separately so judges can inspect the training evidence without relying on screenshots:

	- Evidence dataset: [sanjuhs/cadforge-training-evidence](https://huggingface.co/datasets/sanjuhs/cadforge-training-evidence)
	- Compressed archive: `archives/cadforge-training-evidence-20260426.tar.gz`
	- Key JSONL traces: `training/logs/*completions.jsonl`

	The logs show the core result: dense GRPO had positive-looking reward but `0%` buildability; strict build-gating produced `96/320` buildable completions; adaptive repair fixed clipped outputs and produced `53/180` buildable repairs.

	## Hackathon Theme Alignment

	- Theme 2: Long-horizon planning: CAD improves through repeated code edits and reward feedback.
	- Theme 3.1: Professional world modeling: the agent must use real CadQuery tools and survive compiler/export/mesh checks.
	- Theme 4: Self-improvement: environment failures become new curriculum. The strict build-gated reward was created because the first dense reward was too forgiving.
	- Theme 5: Wild Card: editable CAD generation is a practical, underexplored RLVE target.

	## The Environment Fights Back

	The first dense GRPO reward gave useful shape feedback, but it still rewarded some non-buildable CAD. CADForge responded by tightening the rules:

	1. Buildability became the first gate.
	2. failed CadQuery code receives negative reward.
	3. syntax errors, missing `fixture`, undefined variables, and invented APIs are tracked separately.
	4. successful builds unlock dense rewards for topology, semantics, reference similarity, contact, editability, and efficiency.

	This produced useful GRPO variance: buildable CAD separated from pretty-but-broken code.

	---

	Legacy prototype notes follow.

	Local prototype for a multi-step CADForge environment: prompt -> CSG/CAD actions -> geometry validation -> structural household part scoring.

	Experiment 1 focuses on prompt-to-mechanical-design plus coarse 3D FEA. Experiment 2 keeps that renderer/verifier base, but reframes the loop around reliable code-CAD behavior:

	- the agent plans small CAD operations,
	- the trace is treated like an AST/feature-tree construction episode,
	- the verifier reports CADForge metrics such as AST nodes, connected components, watertight/manifold proxy, editability proxy, and pseudo-OpenSCAD output,
	- structural MechForge feedback remains as the first physical reward suite.

	## Why This Exists

	LLMs can often describe a chair, hook, or bracket, but they are unreliable at making CAD that builds, edits, exports, and stays physically coherent. CADForge turns those failure modes into reward:

	- no floating parts,
	- connected CSG/feature tree,
	- watertight/manifold exported geometry,
	- clean editable parameters,
	- manufacturable features,
	- structural safety under load.

	The long-term target is an OpenEnv-compatible RLVE environment where an agent can take 100-300 CAD actions before committing a valid part.

	## OpenEnv Space

	This directory is now a deployable OpenEnv environment named `cadforge_cadquery`.
	The action is a complete CadQuery Python file. The environment runs it through a constrained CadQuery runner, exports STL, scores build/topology/contact/task semantics/reference similarity/editability, and returns reward JSON plus verifier notes.

	Local validation:

	```bash
	../.venv/bin/openenv validate .
	PYTHONPATH=python_tools ../.venv/bin/uvicorn server.app:app --host 0.0.0.0 --port 8000
	OPENENV_BASE_URL=http://localhost:8000 ../.venv/bin/python inference.py
	```

	Push to Hugging Face Spaces:

	```bash
	set -a; source ../.env; set +a
	../.venv/bin/openenv push . --repo-id sanjuhs/cadforge-cadquery-openenv --interface
	```

	## Setup

	```bash
	cp .env.example .env
	# Either paste your OpenAI key into this .env, or keep it in the repo-root .env.
	npm install
	npm run dev
	```

	Open:

	```text
	http://localhost:5177
	```

	The API listens on:

	```text
	http://localhost:8791
	```

	## What To Try

	Chair benchmark:

	```text
	Build a simple four-legged chair as editable code-CAD. It must support a 700 N seated load, include a seat panel, four connected legs, lower crossbars, and a backrest, fit inside a 500 mm x 500 mm x 900 mm envelope, and avoid floating parts.
	```

	Truss benchmark:

	```text
	Build a simple lightweight truss support as code-CAD. Use connected triangular load paths, two fixed mounting holes on the left, a load boss on the right, and enough ribs/cross-members to carry a 250 N downward load with safety factor above 2.0.
	```

	Wall hook benchmark:

	```text
	Build a wall-mounted J hook as code-CAD. It needs two screw holes, one connected curved hook arm, a rounded tip lip, and support ribs at the root. It must carry a 120 N hanging load and avoid floating or disconnected geometry.
	```

	## OpenSCAD Rendering

	The UI includes an OpenSCAD code panel with:

	- `Generate SCAD`
	- `Iterate SCAD`
	- `Render SCAD`
	- `Load Example`

	This is a real browser-side CSG renderer for a constrained OpenSCAD subset. It currently supports:

	- `cube`
	- `sphere`
	- `cylinder`
	- `translate`
	- `rotate`
	- `scale`
	- `union`
	- `difference`
	- `intersection`

	The renderer parses SCAD text and builds an actual Three.js mesh. Boolean operations use `three-csg-ts`.

	Full OpenSCAD CLI rendering is not enabled yet because `openscad` is not installed on this machine. The UI and README should not claim full OpenSCAD compatibility until that real dependency is available.

	The server endpoints are:

	```text
	POST /api/scad-generate
	POST /api/scad-iterate
	```

	Both use the configured model API key. They do not return fallback or mock SCAD when the key is missing.

	## Current CADForge Metrics

	The current prototype adds a `cadforge` block to each analysis result:

	- `ast_nodes`
	- `connected_components`
	- `floating_parts`
	- `watertight_proxy`
	- `manifold_proxy`
	- `clean_feature_tree_proxy`
	- `named_parameter_count`
	- `editability_score`
	- `chair_core_features_passed`
	- `pseudo_openscad`

	These are MVP proxies, not a full OpenSCAD/trimesh compile yet. The next step is to replace the analysis proxies with:

	```text
	CSG AST -> OpenSCAD/CadQuery -> STL/STEP -> trimesh/solid validation -> reward
	```

	## OpenEnv Direction

	The final environment should expose actions such as:

	- `add_cube`
	- `add_cylinder`
	- `translate`
	- `rotate`
	- `union`
	- `difference`
	- `add_mount_hole`
	- `add_rib`
	- `compile_cad`
	- `check_connected_components`
	- `check_watertight`
	- `check_editability`
	- `run_structural_check`
	- `commit_design`

	This gives judges the story they want:

	> The agent improves on a long-horizon world-modeling task where every CAD operation changes the physical world, and rewards come from objective geometric and structural checks.

	## Python Solver

	This copy still includes the MechForge Python solver under `python_tools/mechforge`. Prefer the repo-level Python 3.12 virtual environment:

	```bash
	UV_CACHE_DIR=.uv-cache uv venv --python python3.12 .venv
	UV_CACHE_DIR=.uv-cache uv pip install numpy scipy pydantic fastapi uvicorn meshio gmsh scikit-fem cadquery openmdao openenv-core openai trimesh
	```

	Headless smoke test:

	```bash
	PYTHONPATH=experiment-2-cadforge/python_tools .venv/bin/python -m mechforge.cli sample
	```