title: CADForge CadQuery
emoji: 🪑
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
- cadquery
- reinforcement-learning
CADForge Experiment 2
CADForge is an OpenEnv environment for training LLMs to produce editable, buildable CadQuery CAD.
The agent receives a design request, writes a complete CadQuery Python file, and the environment runs real CAD tooling: CadQuery build, STL export, topology checks, semantic scoring, reference similarity, editability scoring, and persistent artifact logging.
Judge-Facing Links
- GitHub repo: sanjuhs/open-env-meta-final-hackathon
- GitHub Gist: training scripts: CADForge OpenEnv SFT/GRPO scripts
- Raw training logs and evidence: sanjuhs/cadforge-training-evidence
- Training notebook on this HF Space: training/cadforge_openenv_training_colab.ipynb
- Open the same notebook in Google Colab: Colab training notebook
- Mini-blog: CADFORGE_BLOG.md
- Detailed technical blog: docs/detailed-blog/cadforge-detailed-blog.md
- Full project report: docs/cadforge-openenv-project-report.md
- Self-improving RLVE design: docs/brainstorm/21-cadforge-self-improving-rlve.md
- Strict GRPO training report: training/reports/qwen35-9b-grpo-strict-build-20260426-strict-build/training_curve_report.md
- Strict GRPO eval report: training/eval/qwen35-9b-cadforge-grpo-strict-build-20260426-strict-build/eval_report.md
- Inference comparison: inference/results/stator-qwen-vs-frontier/report.md
- Training dataset: sanjuhs/cadforge-cadquery-agentic-traces
- Training logs and evidence bundle: sanjuhs/cadforge-training-evidence
- Strict 9B GRPO LoRA: sanjuhs/qwen35-9b-cadforge-grpo-strict-build-lora
- Adaptive repair GRPO LoRA: sanjuhs/qwen35-9b-cadforge-grpo-adaptive-repair-lora
RunPod/H200 clarification: the full 2B/9B SFT and GRPO runs were executed on RunPod H200 as distinct production scripts. The Colab notebook is the judge-runnable smoke path that validates OpenEnv, the public dataset, the CadQuery reward backend, and tiny SFT/GRPO launches using those same scripts.
Results Snapshot
| Run | Result |
|---|---|
| Qwen3.5-2B SFT | train loss 1.4480 -> 0.1658, eval loss 0.4477 -> 0.2676 |
| Qwen3.5-2B dense GRPO | mean reward 0.3387, best 0.5303; useful reward signal but too forgiving on broken builds |
| Qwen3.5-9B SFT | train loss 2.6020 -> 0.1413, eval loss 0.3650 -> 0.2398 |
| Qwen3.5-9B strict GRPO | 320 completions, 96 buildable, best CADForge score 0.9352 |
| Qwen3.5-9B adaptive repair GRPO | 180 repair completions, 53 buildable, 0 clipped completions |
| Strict 9B quick eval | 2/3 held-out prompts built successfully |
| Stator inference comparison | base Qwen failed build; RL-tuned Qwen built a 0.654 stator; GPT-5.4 built a 0.709 stator |
Training Logs
The raw logs are backed up separately so judges can inspect the training evidence without relying on screenshots:
- Evidence dataset: sanjuhs/cadforge-training-evidence
- Compressed archive:
archives/cadforge-training-evidence-20260426.tar.gz - Key JSONL traces:
training/logs/*completions.jsonl
The logs show the core result: dense GRPO had positive-looking reward but 0% buildability; strict build-gating produced 96/320 buildable completions; adaptive repair fixed clipped outputs and produced 53/180 buildable repairs.
Hackathon Theme Alignment
- Theme 2: Long-horizon planning: CAD improves through repeated code edits and reward feedback.
- Theme 3.1: Professional world modeling: the agent must use real CadQuery tools and survive compiler/export/mesh checks.
- Theme 4: Self-improvement: environment failures become new curriculum. The strict build-gated reward was created because the first dense reward was too forgiving.
- Theme 5: Wild Card: editable CAD generation is a practical, underexplored RLVE target.
The Environment Fights Back
The first dense GRPO reward gave useful shape feedback, but it still rewarded some non-buildable CAD. CADForge responded by tightening the rules:
- Buildability became the first gate.
- failed CadQuery code receives negative reward.
- syntax errors, missing
fixture, undefined variables, and invented APIs are tracked separately. - successful builds unlock dense rewards for topology, semantics, reference similarity, contact, editability, and efficiency.
This produced useful GRPO variance: buildable CAD separated from pretty-but-broken code.
Legacy prototype notes follow.
Local prototype for a multi-step CADForge environment: prompt -> CSG/CAD actions -> geometry validation -> structural household part scoring.
Experiment 1 focuses on prompt-to-mechanical-design plus coarse 3D FEA. Experiment 2 keeps that renderer/verifier base, but reframes the loop around reliable code-CAD behavior:
- the agent plans small CAD operations,
- the trace is treated like an AST/feature-tree construction episode,
- the verifier reports CADForge metrics such as AST nodes, connected components, watertight/manifold proxy, editability proxy, and pseudo-OpenSCAD output,
- structural MechForge feedback remains as the first physical reward suite.
Why This Exists
LLMs can often describe a chair, hook, or bracket, but they are unreliable at making CAD that builds, edits, exports, and stays physically coherent. CADForge turns those failure modes into reward:
- no floating parts,
- connected CSG/feature tree,
- watertight/manifold exported geometry,
- clean editable parameters,
- manufacturable features,
- structural safety under load.
The long-term target is an OpenEnv-compatible RLVE environment where an agent can take 100-300 CAD actions before committing a valid part.
OpenEnv Space
This directory is now a deployable OpenEnv environment named cadforge_cadquery.
The action is a complete CadQuery Python file. The environment runs it through a constrained CadQuery runner, exports STL, scores build/topology/contact/task semantics/reference similarity/editability, and returns reward JSON plus verifier notes.
Local validation:
../.venv/bin/openenv validate .
PYTHONPATH=python_tools ../.venv/bin/uvicorn server.app:app --host 0.0.0.0 --port 8000
OPENENV_BASE_URL=http://localhost:8000 ../.venv/bin/python inference.py
Push to Hugging Face Spaces:
set -a; source ../.env; set +a
../.venv/bin/openenv push . --repo-id sanjuhs/cadforge-cadquery-openenv --interface
Setup
cp .env.example .env
# Either paste your OpenAI key into this .env, or keep it in the repo-root .env.
npm install
npm run dev
Open:
http://localhost:5177
The API listens on:
http://localhost:8791
What To Try
Chair benchmark:
Build a simple four-legged chair as editable code-CAD. It must support a 700 N seated load, include a seat panel, four connected legs, lower crossbars, and a backrest, fit inside a 500 mm x 500 mm x 900 mm envelope, and avoid floating parts.
Truss benchmark:
Build a simple lightweight truss support as code-CAD. Use connected triangular load paths, two fixed mounting holes on the left, a load boss on the right, and enough ribs/cross-members to carry a 250 N downward load with safety factor above 2.0.
Wall hook benchmark:
Build a wall-mounted J hook as code-CAD. It needs two screw holes, one connected curved hook arm, a rounded tip lip, and support ribs at the root. It must carry a 120 N hanging load and avoid floating or disconnected geometry.
OpenSCAD Rendering
The UI includes an OpenSCAD code panel with:
Generate SCADIterate SCADRender SCADLoad Example
This is a real browser-side CSG renderer for a constrained OpenSCAD subset. It currently supports:
cubespherecylindertranslaterotatescaleuniondifferenceintersection
The renderer parses SCAD text and builds an actual Three.js mesh. Boolean operations use three-csg-ts.
Full OpenSCAD CLI rendering is not enabled yet because openscad is not installed on this machine. The UI and README should not claim full OpenSCAD compatibility until that real dependency is available.
The server endpoints are:
POST /api/scad-generate
POST /api/scad-iterate
Both use the configured model API key. They do not return fallback or mock SCAD when the key is missing.
Current CADForge Metrics
The current prototype adds a cadforge block to each analysis result:
ast_nodesconnected_componentsfloating_partswatertight_proxymanifold_proxyclean_feature_tree_proxynamed_parameter_counteditability_scorechair_core_features_passedpseudo_openscad
These are MVP proxies, not a full OpenSCAD/trimesh compile yet. The next step is to replace the analysis proxies with:
CSG AST -> OpenSCAD/CadQuery -> STL/STEP -> trimesh/solid validation -> reward
OpenEnv Direction
The final environment should expose actions such as:
add_cubeadd_cylindertranslaterotateuniondifferenceadd_mount_holeadd_ribcompile_cadcheck_connected_componentscheck_watertightcheck_editabilityrun_structural_checkcommit_design
This gives judges the story they want:
The agent improves on a long-horizon world-modeling task where every CAD operation changes the physical world, and rewards come from objective geometric and structural checks.
Python Solver
This copy still includes the MechForge Python solver under python_tools/mechforge. Prefer the repo-level Python 3.12 virtual environment:
UV_CACHE_DIR=.uv-cache uv venv --python python3.12 .venv
UV_CACHE_DIR=.uv-cache uv pip install numpy scipy pydantic fastapi uvicorn meshio gmsh scikit-fem cadquery openmdao openenv-core openai trimesh
Headless smoke test:
PYTHONPATH=experiment-2-cadforge/python_tools .venv/bin/python -m mechforge.cli sample


