Spaces:

sanjuhs
/

cadforge-cadquery-openenv

Running

App Files Files Community

cadforge-cadquery-openenv / inference /README.md

sanjuhs

Upload CADForge inference comparison artifacts

58415cd verified 16 days ago

preview code

raw

history blame contribute delete

1.72 kB

CADForge Inference Comparisons

This folder contains local inference/evaluation scripts for comparing generated CadQuery outputs.

The main benchmark is:

.venv/bin/python inference/compare_cadquery_models.py --baseline-source ollama

It compares three candidates on the same axial_motor_stator_12_slot task:

Base Qwen: generated live through local Ollama, default qwen3.5:9b.
RL-tuned Qwen: saved strict build-gated GRPO held-out stator artifact.
GPT-5.4: saved frontier baseline artifact by default, or live OpenAI generation with --gpt-source openai and OPENAI_API_KEY.

Outputs are written under inference/results/<run-id>/:

report.md
comparison.png
results.json
per-model candidate.py, reward.json, STL files, and render images

Important: the default run is a reproducible local comparison using one live base-Qwen generation plus saved trained/frontier artifacts. It is not a broad benchmark. The right claim is that CADForge makes a small Qwen model competitive on buildable, editable code-CAD behavior for a medium-difficulty part family, not that it beats frontier models globally.

Current Stator Result

Latest local run:

Report: results/stator-qwen-vs-frontier/report.md
Comparison image: results/stator-qwen-vs-frontier/comparison.png

Model	Total	Build	Semantic	Editability
Base Qwen	-1.000	0.0	0.000	0.000
RL-tuned Qwen	0.654	1.0	0.300	0.825
GPT-5.4	0.709	1.0	0.638	0.825