cadforge-cadquery-openenv / docs /brainstorm /05-mechforge-rendering-stack.md
sanjuhs's picture
Upload CADForge judge evidence docs
ca16fdf verified

MechForge Rendering And Simulation Stack

Date: 2026-04-24

The Confusion To Resolve

For MechForge there are four separate jobs:

  1. Generate or modify a design.
  2. Render the design so humans can inspect it.
  3. Simulate or verify the design.
  4. Export the design to real CAD/manufacturing formats.

One tool does not need to do all four.

Recommended MVP Stack

Layer MVP choice Why
Design representation Structured parametric JSON Easy for LLMs, easy to validate, easy to convert.
Browser renderer Three.js Fast, visual, interactive, works inside a web demo.
Fast verifier Custom beam/truss-style solver Good enough for reward curves and RL feedback.
Export STL from Three.js mesh Immediate tangible artifact.
Future CAD backend CadQuery first, OpenSCAD second CadQuery is Python-native and more flexible for OpenEnv.
Future simulation backend simplified FEM, FEniCSx, or specialized solver Swap in after the environment loop works.

Why Not OpenSCAD First?

OpenSCAD is good for deterministic programmatic CAD. It is available on macOS and can generate real geometry, but it is not the fastest path for a live web app.

Use OpenSCAD later if we want:

  • scriptable constructive solid geometry,
  • reproducible .scad artifacts,
  • STL export through the OpenSCAD CLI,
  • simple parts made from unions/differences.

For the first experiment, Three.js is better because it gives immediate visual feedback in the browser.

Why Not Full FEA First?

Full FEA is the wrong first milestone. It risks spending the hackathon on meshing, solver stability, and packaging instead of the OpenEnv loop.

Better:

  1. Start with a simplified verifier that produces a reward.
  2. Show that LLM behavior improves under that reward.
  3. Add higher-fidelity simulation only after the loop is stable.

The judges care most that the environment trains meaningful behavior and shows improvement. A simple but coherent verifier is acceptable if we explain the limitations honestly.

Benchmark Plan

Before committing to the full environment, run GPT-5.4 through a small prompt-to-design benchmark:

  • Prompt asks for a lightweight bracket under a load case.
  • Model returns structured design JSON.
  • Renderer shows the part.
  • Verifier scores mass, stress proxy, deflection proxy, safety factor, and manufacturability.
  • We inspect whether the model uses real design patterns like ribs, load paths, holes in low-stress areas, and avoids invalid geometry.

This tells us whether current frontier models already solve the task or whether there is room for RL improvement.

What The Experiment App Does

The app in experiment-mechanical-idea/ implements this benchmark:

  • Frontend: Vite + Three.js.
  • Backend: Express + OpenAI Responses API.
  • Input: natural-language mechanical design prompt.
  • Output: structured parametric design JSON.
  • Render: plate, ribs, holes, bosses, fixed holes, load arrow.
  • Verifier: fast beam-style estimate.
  • Export: STL from the rendered mesh.

Final Recommendation

For the OpenEnv version:

  1. Keep the agent action space constrained.
  2. Use Three.js for the judge-facing demo.
  3. Use Python/CadQuery later for real CAD export.
  4. Keep simulation/verifier independent from the renderer.
  5. Do not let the LLM generate arbitrary meshes in the first version.