Vista4D / README.md

Duplicate from Eyeline-Labs/Vista4D

0cf19fe 2 days ago

4.9 kB

	---
	license: apache-2.0
	datasets:
	- KlingTeam/MultiCamVideo-Dataset
	- nkp37/OpenVid-1M
	base_model:
	- Wan-AI/Wan2.1-T2V-14B
	pipeline_tag: video-to-video
	---

	# Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight) – Model Checkpoints

	[![Project Page](https://img.shields.io/badge/Project-Page-yellow?logo=data:image/svg%2Bxml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCIgZmlsbD0ibm9uZSIgc3Ryb2tlPSJ5ZWxsb3ciIHN0cm9rZS13aWR0aD0iMiIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIiBzdHJva2UtbGluZWpvaW49InJvdW5kIj48Y2lyY2xlIGN4PSIxMiIgY3k9IjEyIiByPSIxMCIvPjxsaW5lIHgxPSIyIiB5MT0iMTIiIHgyPSIyMiIgeTI9IjEyIi8+PHBhdGggZD0iTTEyIDJhMTUuMyAxNS4zIDAgMCAxIDQgMTAgMTUuMyAxNS4zIDAgMCAxLTQgMTAgMTUuMyAxNS4zIDAgMCAxLTQtMTAgMTUuMyAxNS4zIDAgMCAxIDQtMTB6Ii8+PC9zdmc+)](https://eyeline-labs.github.io/Vista4D)
	[![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2604.21915)
	[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Vista4D-blue)](https://huggingface.co/Eyeline-Labs/Vista4D)
	[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Eval%20Data-blue)](https://huggingface.co/datasets/Eyeline-Labs/Vista4D-Eval-Data)

	[Kuan Heng Lin](https://kuanhenglin.github.io)<sup>1,3&lowast;</sup>, [Zhizheng Liu](https://bosmallear.github.io)<sup>1,4&lowast;</sup>, [Pablo Salamanca](https://pablosalaman.ca)<sup>1,2</sup>, [Yash Kant](https://yashkant.github.io)<sup>1,2</sup>, [Ryan Burgert](https://ryanndagreat.github.io)<sup>1,2,5&lowast;</sup>, [Yuancheng Xu](https://yuancheng-xu.github.io)<sup>1,2</sup>, [Koichi Namekata](https://kmcode1.github.io)<sup>1,2,6&lowast;</sup>, [Yiwei Zhao](https://zhaoyw007.github.io)<sup>2</sup>, [Bolei Zhou](https://boleizhou.github.io)<sup>4</sup>, [Micah Goldblum](https://goldblum.github.io)<sup>3</sup>, [Paul Debevec](https://www.pauldebevec.com)<sup>1,2</sup>, [Ning Yu](https://ningyu1991.github.io)<sup>1,2</sup> <br/>
	<sup>1</sup>Eyeline Labs, <sup>2</sup>Netflix, <sup>3</sup>Columbia University, <sup>4</sup>UCLA, <sup>5</sup>Stony Brook University, <sup>6</sup>University of Oxford<br>

	<sup>&lowast;</sup>Work done during an internship at Eyeline Labs

	<div align="center">
	<video controls autoplay muted style="width: 100%;" src="https://media.githubusercontent.com/media/Eyeline-Labs/Vista4D/website/media/vista4d.mp4"></video>
	</div>

	Vista4D is a video reshooting framework which synthesizes the dynamic scene represented by an input source video from novel camera trajectories and viewpoints. We bridge the distribution shift between training and inference for point-cloud-grounded video reshooting, as Vista4D is robust to point cloud artifacts from imprecise 4D reconstruction of real-world videos by training on noisy, reconstructed multiview videos. Our 4D point cloud with temporally-persistent static points also explicitly preserves scene content and improved camera control. Vista4D generalizes to real-world applications such as dynamic scene expansion (casual video capture of scene as background reference), 4D scene recomposition (point cloud editing), and long video inference with memory.

	This is the Hugging Face repository containing our model weights. We provide two Vista4D checkpoints finetuned on [`Wan-AI/Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)(https://huggingface.co/Wan-AI/Wan2.1-T2V-14B):

	\| Checkpoint \| Base model \| Training resolution \| Training steps \| Notes \|
	\|---\|---\|---\|---\|---\|
	\| `384p49_step=30000` \| [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) \| 672 × 384, 49 frames \| 30000 \| N/A \|
	\| `720p49_step=3000` \| [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) \| 1280 × 720, 49 frames \| 3000 \| Finetuned from `384p49_step=30000` \|

	To do Vista4D inference, first download the Wan 2.1 and Vista4D checkpoints to `./checkpoints/`. The Vista4D checkpoints are hosted on [Eyeline-Labs/Vista4D](https://huggingface.co/Eyeline-Labs/Vista4D). Download both the `384p` and `720p` checkpoints into `./checkpoints/vista4d/` with
	```bash
	hf download Eyeline-Labs/Vista4D --local-dir ./checkpoints/vista4d
	```
	If you only need one resolution, pass `--include` to grab just that variant with
	```bash
	hf download Eyeline-Labs/Vista4D --local-dir ./checkpoints/vista4d --include "384p49_step=30000/" OR "720p49_step=3000/"
	```
	You'll also need the `Wan2.1-T2V-14B` base model. Download it from [Wan-AI/Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) into `./checkpoints/wan/Wan2.1-T2V-14B/` with
	```bash
	hf download Wan-AI/Wan2.1-T2V-14B --local-dir ./checkpoints/wan/Wan2.1-T2V-14B
	```

	Instructions on how to use these weights, more results, and paper can be found on our [project page](https://eyeline-labs.github.io/Vista4D/) and [GitHub repository](https://github.com/Eyeline-Labs/Vista4D/tree/main).