Zero-to-CAD

Zero-to-CAD — Qwen3-VL-2B

A vision-language model fine-tuned to reconstruct executable CAD programs from multi-view images.

Zero-to-CAD agentic synthesis pipeline

Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data

Mohammadmehdi Ataei, Farzaneh Askari, Kamal Rahimi Malekshan, Pradeep Kumar Jayaraman

Autodesk Research

Related Resources

Resource Link
📄 Paper Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data
📦 Zero-to-CAD 1M (full dataset) ADSKAILab/Zero-To-CAD-1m
📦 Zero-to-CAD 100K (curated subset) ADSKAILab/Zero-To-CAD-100k
🤖 Fine-tuned Model (this model) You are here
🗂️ Collection ADSKAILab/Zero-To-CAD

Model Description

This model is a fully fine-tuned Qwen3-VL-2B-Instruct that takes 8 rendered views of a 3D shape (4 front, 4 rear at 256×256) and generates executable CadQuery Python code that reproduces the geometry.

The model was trained entirely on synthetic data from Zero-to-CAD 1M (979,633 training samples) — no real-world CAD files were used.

Key Results

Benchmark Success Rate Mean IoU Median IoU P90 IoU
Zero-to-CAD test 82.1% 0.747 0.847 0.999
ABC (out-of-distribution) 61.0% 0.377 0.303 0.854

Comparison with Baselines

Model Zero-to-CAD Success Zero-to-CAD Mean IoU ABC Success ABC Mean IoU
This model 82.1% 0.747 61.0% 0.377
GPT-5.2 High 72.2% 0.485 66.2% 0.344
GPT-5.2 Medium 71.1% 0.495 62.6% 0.346
Qwen3-VL-2B (base) 6.6% 0.184 5.4% 0.131

Quick Start

Inference

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from datasets import load_dataset
from PIL import Image
import io


model_name = "ADSKAILab/Zero-To-CAD-Qwen3-VL-2B"
model = Qwen3VLForConditionalGeneration.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
processor = AutoProcessor.from_pretrained(model_name)

# Load 8 rendered views from the dataset
ds = load_dataset("ADSKAILab/Zero-To-CAD-1m", split="train", streaming=True)
sample = next(iter(ds))
views = [
    Image.open(io.BytesIO(sample[f"image_{i}"])) if isinstance(sample[f"image_{i}"], bytes)
    else sample[f"image_{i}"]
    for i in range(8)
]

# Or load 8 views from local files:
# views = [Image.open(f"view_{i}.png") for i in range(8)]

messages = [
    {
        "role": "system",
        "content": "You are a CAD code assistant. Given multiple rendered views of a 3D shape, generate clean, well-structured CadQuery Python code that accurately reproduces the geometry."
    },
    {
        "role": "user",
        "content": [
            *[{"type": "image", "image": view} for view in views],
            {"type": "text", "text": "Generate CadQuery code for this shape."}
        ]
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=views, return_tensors="pt").to(model.device)

output_ids = model.generate(**inputs, max_new_tokens=4096)
output_text = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]

print(output_text)

Execute the generated code

import cadquery as cq

exec(output_text)
# `result` contains the reconstructed CadQuery solid

# Export
cq.exporters.export(result, "output.step")
cq.exporters.export(result, "output.stl")

Training Details

Hyperparameter Value
Base model Qwen3-VL-2B-Instruct
Training mode Full fine-tuning
Max sequence length 4,096 tokens
Optimizer AdamW
Learning rate 1 × 10⁻⁴
Weight decay 0.0
LR scheduler Cosine
Warmup ratio 0.03
Attention dropout 0.1
GPUs 16 × NVIDIA H100 80GB
Per-GPU batch size 1
Effective batch size 16
Epochs 3
Precision bfloat16
Distributed strategy DDP

Evaluation Protocol

  • Metric: Voxelized IoU at 64³ resolution between generated and ground-truth solids
  • Rotational alignment: Maximum IoU over 45° rotation increments
  • Success rate: Percentage of generations producing valid, executable CadQuery code

Intended Uses

  • Image-to-CAD reconstruction — reconstruct editable parametric CAD from rendered views
  • Research baseline — starting point for Image-to-Sequence CAD generation research
  • Integration — combine with rendering pipelines for end-to-end 3D reconstruction

Limitations

  • Trained on synthetic data only; may struggle with photorealistic or noisy inputs
  • Expects 8 clean rendered views at 256×256 — other configurations are untested
  • Outputs CadQuery code only; other CAD formats require post-processing
  • Complex multi-part assemblies may exceed the 4,096 token context window

Citation

If you use this model, please cite:

@misc{ataei2026zerotocadagenticsynthesisinterpretable,
  title={Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data}, 
  author={Mohammadmehdi Ataei and Farzaneh Askari and Kamal Rahimi Malekshan and Pradeep Kumar Jayaraman},
  year={2026},
  eprint={2604.24479},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2604.24479}
}

License

This model is released under the Apache License 2.0.

Downloads last month
435
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ADSKAILab/Zero-To-CAD-Qwen3-VL-2B

Finetuned
(202)
this model
Quantizations
2 models

Dataset used to train ADSKAILab/Zero-To-CAD-Qwen3-VL-2B

Collection including ADSKAILab/Zero-To-CAD-Qwen3-VL-2B

Paper for ADSKAILab/Zero-To-CAD-Qwen3-VL-2B