Shape Foundation Model — Small v3

A 3D geometry foundation model for industrial CAD analysis. Takes a mesh and produces dense geometric embeddings plus a self-supervised reconstruction prior that enables per-token attribution for explainable predictions.

Model Details


Architecture	GAOTBackbone (MAGNO Encoder → Transformer Processor → Task Heads)
Parameters	10,913,297
Training objective	Self-supervised masked token reconstruction + multi-resolution contrastive learning
Training data	61,052 industrial CAD meshes from Fusion360, MFCAD, and Thingi10K
Precision	bf16 mixed precision
Compute	8 × NVIDIA H100 80GB, 50 epochs
Val reconstruction R²	0.729
Val SmoothL1 (β=1.0)	0.024
Contrastive top-1 accuracy	98.1%
Status	Self-supervised backbone only — supervised task heads are present but disabled (see Limitations)

Evaluation Results

Metrics on the held-out validation split (N = 2,983 meshes, deterministic hash-based split):

Reconstruction (pretraining objective, in normalized target space):

Metric	Value
SmoothL1 loss (β=1.0) at masked positions	0.024
MSE at masked positions	0.326
Coefficient of determination (R²)	0.729

Contrastive embedding quality (Wang & Isola 2020 framework, pool size 2048):

Metric	Value
Top-1 positive-pair retrieval accuracy	98.1%
InfoNCE loss (τ=0.07)	0.146
Alignment (positive pairs)	0.132
Uniformity (random pairs)	−3.84

Embedding geometry:

Metric	Value
Random-pair cosine mean	0.002
Random-pair cosine std	0.139
Embedding L2 norm (mean)	1.33

Files

File	Size	Purpose
`checkpoint_final.pt`	~45 MB	Full model state (backbone + loss_computer + optimizer + config)
`small.yaml`	2 KB	Training config (required to instantiate the model)
`embeddings.npy`	31 MB	Precomputed 128-dim pooled embeddings for all 61,052 training meshes
`point_clouds.npy`	358 MB	512-point samples per training mesh (for retrieval visualization)
`metadata.json`	6 MB	File names and source dataset per training mesh

Usage

Install dependencies

pip install torch trimesh einops numpy scipy huggingface-hub

You also need the shape_foundation package from the training repo.

Download the model

from huggingface_hub import snapshot_download

local_dir = snapshot_download(
    repo_id="bayang/shape-foundation-small-v3",
    local_dir="./shape-foundation-small-v3",
)

Or from the command line:

hf download bayang/shape-foundation-small-v3 --local-dir ./shape-foundation-small-v3

Load and run inference

import torch
import trimesh
from shape_foundation.configs.default import ShapeConfig
from shape_foundation.models.gaot_backbone import GAOTBackbone
from shape_foundation.data.preprocessing import MeshPreprocessor
from shape_foundation.data.sampling import SurfaceSampler

# Load checkpoint
ckpt = torch.load("shape-foundation-small-v3/checkpoint_final.pt",
                  map_location="cpu", weights_only=False)
cfg: ShapeConfig = ckpt["config"]

# Build model + load weights
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GAOTBackbone(cfg).to(device).eval()
model.load_state_dict(ckpt["model_state_dict"], strict=False)

# Preprocess a mesh
mesh = trimesh.load("your_mesh.stl", force="mesh")
prep = MeshPreprocessor(cfg.input)(
    torch.tensor(mesh.vertices, dtype=torch.float32),
    torch.tensor(mesh.faces, dtype=torch.int64),
    torch.tensor(mesh.vertex_normals, dtype=torch.float32),
)
sampled = SurfaceSampler(cfg.input).sample(
    prep["vertices"], prep["faces"], prep["normals"], prep.get("curvature"),
)

# Run forward pass
with torch.no_grad():
    out = model.forward_tokens(
        sampled["points"].unsqueeze(0).to(device),
        sampled["features"].unsqueeze(0).to(device),
        sampled["normals"].unsqueeze(0).to(device) if sampled.get("normals") is not None else None,
        sampled["curvature"].unsqueeze(0).to(device) if sampled.get("curvature") is not None else None,
    )

pooled = out["pooled_embedding"]        # (1, 128) — global mesh embedding
tokens = out["token_embeddings"]        # (1, 13824, 128) — per-token features

Shape retrieval

Use the precomputed embedding index to find similar shapes:

import numpy as np
import json

embeddings = np.load("shape-foundation-small-v3/embeddings.npy")  # (61052, 128)
with open("shape-foundation-small-v3/metadata.json") as f:
    metadata = json.load(f)

query = pooled.squeeze(0).cpu().numpy()
query = query / (np.linalg.norm(query) + 1e-8)
index_norm = embeddings / (np.linalg.norm(embeddings, axis=1, keepdims=True) + 1e-8)

similarities = index_norm @ query
top_k = np.argsort(similarities)[::-1][:5]

for idx in top_k:
    print(f"{metadata[idx]['name']:30s}  {metadata[idx]['source']:12s}  {similarities[idx]:.3f}")

Masked reconstruction heatmap (explainability)

The model was pretrained to reconstruct masked token geometry statistics from surrounding context — this gives you a built-in per-region attribution map. High reconstruction error regions are where the model's learned geometric prior considers the input novel or surprising.

Training Data

Dataset	Meshes	Share	Domain
Fusion360	35,681	58.4%	Parametric CAD designs
MFCAD	15,488	25.4%	Manufacturing CAD parts
Thingi10K	9,883	16.2%	Community 3D printing / misc geometries
Total	61,052	100%	—

All three sources are industrial / CAD-focused, aligned with the target application domain (engineering, manufacturing, simulation setup).

Train / val split is deterministic (58,069 train / 2,983 val, 4.89%) using md5 hashing of file paths so assignments are stable across runs, ranks, and machines.

Training Details

Masked token reconstruction (weight 1.0): 50% of latent tokens masked, SmoothL1 loss (β=1.0) on normalized geometry statistics
Multi-resolution contrastive (weight 0.2): InfoNCE with jitter σ=0.02 and 30% point dropout
Per-dimension target normalization: calibrated once on 56M tokens from the training split, stored as buffers on the loss computer
Optimizer: AdamW, lr 3e-4, cosine schedule, 500 warmup steps
Epochs: 50
Mixed precision: bf16 + DDP + torch.compile on 8 × H100 80GB

Limitations

Supervised task heads are disabled. The checkpoint contains symmetry, primitive, part, and reduction heads from the architecture, but all supervised loss weights are set to 0.0 during training. Attempting to use these heads for inference will return near-random outputs because they were never updated by gradient descent. The stock synthetic labels in the training data do not generalize across unseen meshes (train CE ~1e-4 vs val CE ~2.5 in prior runs), which is why they were disabled. Only use the backbone embeddings and the reconstruction head.

Domain is industrial CAD. The training data is 100% CAD / engineering parts. The model will transfer poorly to organic shapes (humans, animals, plants) or to reconstructed 3D scans with heavy noise. If your target domain differs, you should fine-tune or retrain.

Contrastive signal saturates. With per-rank batch size 16, the InfoNCE objective only has 15 negatives per anchor. This is too easy once the backbone has basic shape awareness. The embeddings are still useful for retrieval but the contrastive loss stops providing gradient after the first few epochs.

Intended Use

Dense geometric feature extraction for downstream CAD / engineering tasks
Shape retrieval via learned embedding similarity
Per-region anomaly detection via masked reconstruction error heatmaps
Foundation for fine-tuning on domain-specific labels (once high-quality labels are available)

Not suitable for: reconstructing 3D geometry from scratch, generating new meshes, classifying general 3D objects outside the CAD domain, or any task that requires the supervised heads to be active.

Citation

@software{notelink_shape_foundation_2026,
  author = {{Notelink LLC}},
  title  = {Shape Foundation Model},
  year   = {2026},
  url    = {https://huggingface.co/bayang/shape-foundation-small-v3}
}

Downloads last month: -; Downloads are not tracked for this model. How to track