Papers
arxiv:2312.14125

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Published on Dec 21, 2023
· Submitted by
AK
on Dec 21, 2023
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

VideoPoet is a language model that synthesizes high-quality video and audio from diverse inputs using a transformer architecture, showcasing superior zero-shot video generation capabilities.

AI-generated summary

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet's ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/

Community

Astronaut holding a game console in white plain background music

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

this one is worth a look, especially the zero shot video generation setup. i found a decent breakdown here https://arxivexplained.com/paper/videopoet-a-large-language-model-for-zero-shot-video-generation

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2312.14125
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 6

Browse 6 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2312.14125 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 9