VidEoMT-L on YouTube-VIS 2021

This repository contains the Hugging Face Transformers conversion of the official VidEoMT checkpoint yt_2021_vit_large_63.1.pth from tue-mps/VidEoMT.

Model details

Reported metrics

Metric Value
AP 63.1
AR@10 68.1
FPS 160

The metrics above are the numbers reported by the authors in the official model zoo.

Usage

from transformers import AutoModelForUniversalSegmentation, AutoVideoProcessor

model_id = "tue-mps/videomt-dinov2-large-ytvis2021"
processor = AutoVideoProcessor.from_pretrained(model_id)
model = AutoModelForUniversalSegmentation.from_pretrained(model_id)

Use processor.post_process_instance_segmentation, processor.post_process_panoptic_segmentation, or processor.post_process_semantic_segmentation depending on the target task.

Downloads last month
22
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for tue-mps/videomt-dinov2-large-ytvis2021