VITA-MLLM
/

VITA-1.5

Video-Text-to-Text

Model card Files Files and versions

Add model card

#2

by nielsr HF Staff - opened Jan 7, 2025

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

This PR adds a model card, linking to the paper VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.

It also adds the pipeline_tag, ensuring people can find it at https://huggingface.co/models?pipeline_tag=video-text-to-text as well as a link to the Github repository.

Add model card34ce5f2e

lxysl changed pull request status to merged Jan 16, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment