Shape Foundation Model — Small v3
A 3D geometry foundation model for industrial CAD analysis. Takes a mesh and produces dense geometric embeddings plus a self-supervised reconstruction prior that enables per-token attribution for explainable predictions.
Model Details
| Architecture | GAOTBackbone (MAGNO Encoder → Transformer Processor → Task Heads) |
| Parameters | 10,913,297 |
| Training objective | Self-supervised masked token reconstruction + multi-resolution contrastive learning |
| Training data | 61,052 industrial CAD meshes from Fusion360, MFCAD, and Thingi10K |
| Precision | bf16 mixed precision |
| Compute | 8 × NVIDIA H100 80GB, 50 epochs |
| Val reconstruction R² | 0.729 |
| Val SmoothL1 (β=1.0) | 0.024 |
| Contrastive top-1 accuracy | 98.1% |
| Status | Self-supervised backbone only — supervised task heads are present but disabled (see Limitations) |
Evaluation Results
Metrics on the held-out validation split (N = 2,983 meshes, deterministic hash-based split):
Reconstruction (pretraining objective, in normalized target space):
| Metric | Value |
|---|---|
| SmoothL1 loss (β=1.0) at masked positions | 0.024 |
| MSE at masked positions | 0.326 |
| Coefficient of determination (R²) | 0.729 |
Contrastive embedding quality (Wang & Isola 2020 framework, pool size 2048):
| Metric | Value |
|---|---|
| Top-1 positive-pair retrieval accuracy | 98.1% |
| InfoNCE loss (Ï„=0.07) | 0.146 |
| Alignment (positive pairs) | 0.132 |
| Uniformity (random pairs) | −3.84 |
Embedding geometry:
| Metric | Value |
|---|---|
| Random-pair cosine mean | 0.002 |
| Random-pair cosine std | 0.139 |
| Embedding L2 norm (mean) | 1.33 |
Files
| File | Size | Purpose |
|---|---|---|
checkpoint_final.pt |
~45 MB | Full model state (backbone + loss_computer + optimizer + config) |
small.yaml |
2 KB | Training config (required to instantiate the model) |
embeddings.npy |
31 MB | Precomputed 128-dim pooled embeddings for all 61,052 training meshes |
point_clouds.npy |
358 MB | 512-point samples per training mesh (for retrieval visualization) |
metadata.json |
6 MB | File names and source dataset per training mesh |
Usage
Install dependencies
pip install torch trimesh einops numpy scipy huggingface-hub
You also need the shape_foundation package from the training repo.
Download the model
from huggingface_hub import snapshot_download
local_dir = snapshot_download(
repo_id="bayang/shape-foundation-small-v3",
local_dir="./shape-foundation-small-v3",
)
Or from the command line:
hf download bayang/shape-foundation-small-v3 --local-dir ./shape-foundation-small-v3
Load and run inference
import torch
import trimesh
from shape_foundation.configs.default import ShapeConfig
from shape_foundation.models.gaot_backbone import GAOTBackbone
from shape_foundation.data.preprocessing import MeshPreprocessor
from shape_foundation.data.sampling import SurfaceSampler
# Load checkpoint
ckpt = torch.load("shape-foundation-small-v3/checkpoint_final.pt",
map_location="cpu", weights_only=False)
cfg: ShapeConfig = ckpt["config"]
# Build model + load weights
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GAOTBackbone(cfg).to(device).eval()
model.load_state_dict(ckpt["model_state_dict"], strict=False)
# Preprocess a mesh
mesh = trimesh.load("your_mesh.stl", force="mesh")
prep = MeshPreprocessor(cfg.input)(
torch.tensor(mesh.vertices, dtype=torch.float32),
torch.tensor(mesh.faces, dtype=torch.int64),
torch.tensor(mesh.vertex_normals, dtype=torch.float32),
)
sampled = SurfaceSampler(cfg.input).sample(
prep["vertices"], prep["faces"], prep["normals"], prep.get("curvature"),
)
# Run forward pass
with torch.no_grad():
out = model.forward_tokens(
sampled["points"].unsqueeze(0).to(device),
sampled["features"].unsqueeze(0).to(device),
sampled["normals"].unsqueeze(0).to(device) if sampled.get("normals") is not None else None,
sampled["curvature"].unsqueeze(0).to(device) if sampled.get("curvature") is not None else None,
)
pooled = out["pooled_embedding"] # (1, 128) — global mesh embedding
tokens = out["token_embeddings"] # (1, 13824, 128) — per-token features
Shape retrieval
Use the precomputed embedding index to find similar shapes:
import numpy as np
import json
embeddings = np.load("shape-foundation-small-v3/embeddings.npy") # (61052, 128)
with open("shape-foundation-small-v3/metadata.json") as f:
metadata = json.load(f)
query = pooled.squeeze(0).cpu().numpy()
query = query / (np.linalg.norm(query) + 1e-8)
index_norm = embeddings / (np.linalg.norm(embeddings, axis=1, keepdims=True) + 1e-8)
similarities = index_norm @ query
top_k = np.argsort(similarities)[::-1][:5]
for idx in top_k:
print(f"{metadata[idx]['name']:30s} {metadata[idx]['source']:12s} {similarities[idx]:.3f}")
Masked reconstruction heatmap (explainability)
The model was pretrained to reconstruct masked token geometry statistics from surrounding context — this gives you a built-in per-region attribution map. High reconstruction error regions are where the model's learned geometric prior considers the input novel or surprising.
Training Data
| Dataset | Meshes | Share | Domain |
|---|---|---|---|
| Fusion360 | 35,681 | 58.4% | Parametric CAD designs |
| MFCAD | 15,488 | 25.4% | Manufacturing CAD parts |
| Thingi10K | 9,883 | 16.2% | Community 3D printing / misc geometries |
| Total | 61,052 | 100% | — |
All three sources are industrial / CAD-focused, aligned with the target application domain (engineering, manufacturing, simulation setup).
Train / val split is deterministic (58,069 train / 2,983 val, 4.89%) using md5 hashing of file paths so assignments are stable across runs, ranks, and machines.
Training Details
- Masked token reconstruction (weight 1.0): 50% of latent tokens masked, SmoothL1 loss (β=1.0) on normalized geometry statistics
- Multi-resolution contrastive (weight 0.2): InfoNCE with jitter σ=0.02 and 30% point dropout
- Per-dimension target normalization: calibrated once on 56M tokens from the training split, stored as buffers on the loss computer
- Optimizer: AdamW, lr 3e-4, cosine schedule, 500 warmup steps
- Epochs: 50
- Mixed precision: bf16 + DDP + torch.compile on 8 × H100 80GB
Limitations
Supervised task heads are disabled. The checkpoint contains symmetry, primitive, part, and reduction heads from the architecture, but all supervised loss weights are set to 0.0 during training. Attempting to use these heads for inference will return near-random outputs because they were never updated by gradient descent. The stock synthetic labels in the training data do not generalize across unseen meshes (train CE ~1e-4 vs val CE ~2.5 in prior runs), which is why they were disabled. Only use the backbone embeddings and the reconstruction head.
Domain is industrial CAD. The training data is 100% CAD / engineering parts. The model will transfer poorly to organic shapes (humans, animals, plants) or to reconstructed 3D scans with heavy noise. If your target domain differs, you should fine-tune or retrain.
Contrastive signal saturates. With per-rank batch size 16, the InfoNCE objective only has 15 negatives per anchor. This is too easy once the backbone has basic shape awareness. The embeddings are still useful for retrieval but the contrastive loss stops providing gradient after the first few epochs.
Intended Use
- Dense geometric feature extraction for downstream CAD / engineering tasks
- Shape retrieval via learned embedding similarity
- Per-region anomaly detection via masked reconstruction error heatmaps
- Foundation for fine-tuning on domain-specific labels (once high-quality labels are available)
Not suitable for: reconstructing 3D geometry from scratch, generating new meshes, classifying general 3D objects outside the CAD domain, or any task that requires the supervised heads to be active.
Citation
@software{notelink_shape_foundation_2026,
author = {{Notelink LLC}},
title = {Shape Foundation Model},
year = {2026},
url = {https://huggingface.co/bayang/shape-foundation-small-v3}
}