EfficientTAM-Ti @ 512 (ONNX Bundle)

ONNX export of EfficientTAM (Tiny variant, 512x512 input) for use with kubrick-tracking.

EfficientTAM is a distilled variant of SAM 2 optimized for efficient video object segmentation. This bundle splits the model into 5 independently-runnable ONNX sessions for flexible deployment across CPU, CoreML, CUDA, and TensorRT backends.

Variants

Variant Precision Total Size Notes
fp32/ float32 ~77 MB Reference quality, works everywhere
fp16/ float16 ~40 MB 2x smaller, GPU-accelerated backends

Architecture

Module File Input Shape Purpose
image_encoder image_encoder.onnx [1, 3, 512, 512] Frame feature extraction
prompt_encoder prompt_encoder.onnx [1, 2, 2] Bbox/click/mask prompt encoding
mask_decoder mask_decoder.onnx [1, 256, 32, 32] Mask prediction from features + prompt
memory_encoder memory_encoder.onnx [1, 256, 32, 32] Encode frame into memory bank
memory_attention memory_attention.onnx dynamic Cross-attention with memory bank

Additional assets:

  • maskmem_tpos_enc.npy -- temporal positional encoding for memory frames
  • no_obj_ptr.npy -- no-object pointer embedding

Usage with kubrick-tracking

from kubrick.tracking import Tracker, MachineConfig, BBoxPrompt, BBox

# Automatically downloads and caches this bundle
config = MachineConfig.mac_m_series()  # uses fp16 by default
tracker = Tracker.from_config(config)

tracker.init(frame, prompt=BBoxPrompt(bbox=BBox(x=100, y=50, w=80, h=120)))
result = tracker.step(next_frame)

Manual download

from huggingface_hub import snapshot_download

# Download fp16 variant
path = snapshot_download(
    repo_id="egordm/efficienttam-ti-512",
    allow_patterns=["fp16/**"],
)

Export reproduction

The bundle was exported using the script in the kubrick-tracking repository:

git clone https://github.com/egordm/kubrick-tracking.git
cd kubrick-tracking
uv run python models/efficienttam-ti-512/export.py --dtype fp16

Requires the EfficientTAM checkpoint from the upstream repository.

Citation

@article{xiong2024efficienttam,
  title={EfficientTAM: Efficient Track Anything Model for Video Object Segmentation},
  author={Xiong, Yunyang and Varadarajan, Siddharth and Wu, Zechun and Wang, Yong and Wang, Xiaolong},
  journal={arXiv preprint arXiv:2403.08243},
  year={2024}
}

License

Apache-2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for egordm/efficienttam-ti-512