Eyeline-Labs
/

Vista4D

Video-to-Video

Model card Files Files and versions

xet

Community

Improve model card metadata and documentation

by nielsr HF Staff - opened 13 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+37

-22

Files changed (1) hide show

README.md +37 -22

README.md CHANGED Viewed

@@ -1,49 +1,64 @@
 ---
-license: apache-2.0
 datasets:
 - KlingTeam/MultiCamVideo-Dataset
 - nkp37/OpenVid-1M
-base_model:
-- Wan-AI/Wan2.1-T2V-14B
-pipeline_tag: video-to-video
 ---
-# Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight) – Model Checkpoints
 [![Project Page](https://img.shields.io/badge/Project-Page-yellow?logo=data:image/svg%2Bxml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCIgZmlsbD0ibm9uZSIgc3Ryb2tlPSJ5ZWxsb3ciIHN0cm9rZS13aWR0aD0iMiIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIiBzdHJva2UtbGluZWpvaW49InJvdW5kIj48Y2lyY2xlIGN4PSIxMiIgY3k9IjEyIiByPSIxMCIvPjxsaW5lIHgxPSIyIiB5MT0iMTIiIHgyPSIyMiIgeTI9IjEyIi8+PHBhdGggZD0iTTEyIDJhMTUuMyAxNS4zIDAgMCAxIDQgMTAgMTUuMyAxNS4zIDAgMCAxLTQgMTAgMTUuMyAxNS4zIDAgMCAxLTQtMTAgMTUuMyAxNS4zIDAgMCAxIDQtMTB6Ii8+PC9zdmc+)](https://eyeline-labs.github.io/Vista4D)
-[![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2604.21915)
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Vista4D-blue)](https://huggingface.co/Eyeline-Labs/Vista4D)
-[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Eval%20Data-blue)](https://huggingface.co/datasets/Eyeline-Labs/Vista4D-Eval-Data)
-[Kuan Heng Lin](https://kuanhenglin.github.io)<sup>1,3&lowast;</sup>, [Zhizheng Liu](https://bosmallear.github.io)<sup>1,4&lowast;</sup>, [Pablo Salamanca](https://pablosalaman.ca)<sup>1,2</sup>, [Yash Kant](https://yashkant.github.io)<sup>1,2</sup>, [Ryan Burgert](https://ryanndagreat.github.io)<sup>1,2,5&lowast;</sup>, [Yuancheng Xu](https://yuancheng-xu.github.io)<sup>1,2</sup>, [Koichi Namekata](https://kmcode1.github.io)<sup>1,2,6&lowast;</sup>, [Yiwei Zhao](https://zhaoyw007.github.io)<sup>2</sup>, [Bolei Zhou](https://boleizhou.github.io)<sup>4</sup>, [Micah Goldblum](https://goldblum.github.io)<sup>3</sup>, [Paul Debevec](https://www.pauldebevec.com)<sup>1,2</sup>, [Ning Yu](https://ningyu1991.github.io)<sup>1,2</sup> <br/>
-<sup>1</sup>Eyeline Labs, <sup>2</sup>Netflix, <sup>3</sup>Columbia University, <sup>4</sup>UCLA, <sup>5</sup>Stony Brook University, <sup>6</sup>University of Oxford<br>
-<sup>&lowast;</sup>*Work done during an internship at Eyeline Labs*
 <div align="center">
   <video controls autoplay muted style="width: 100%;" src="https://media.githubusercontent.com/media/Eyeline-Labs/Vista4D/website/media/vista4d.mp4"></video>
 </div>
-**Vista4D** is a *video reshooting* framework which synthesizes the dynamic scene represented by an input source video from novel camera trajectories and viewpoints. We bridge the distribution shift between training and inference for point-cloud-grounded video reshooting, as Vista4D is robust to point cloud artifacts from imprecise 4D reconstruction of real-world videos by training on noisy, reconstructed multiview videos. Our 4D point cloud with temporally-persistent static points also explicitly preserves scene content and improved camera control. Vista4D generalizes to real-world applications such as dynamic scene expansion (casual video capture of scene as background reference), 4D scene recomposition (point cloud editing), and long video inference with memory.
-This is the Hugging Face repository containing our model weights. We provide two Vista4D checkpoints finetuned on [`Wan-AI/Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)(https://huggingface.co/Wan-AI/Wan2.1-T2V-14B):
 | Checkpoint | Base model | Training resolution | Training steps | Notes |
 |---|---|---|---|---|
 | `384p49_step=30000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 672 &times; 384, 49 frames | 30000 | N/A |
 | `720p49_step=3000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 1280 &times; 720, 49 frames | 3000 | Finetuned from `384p49_step=30000` |
-To do Vista4D inference, first download the Wan 2.1 and Vista4D checkpoints to `./checkpoints/`. The Vista4D checkpoints are hosted on [Eyeline-Labs/Vista4D](https://huggingface.co/Eyeline-Labs/Vista4D). Download both the `384p` and `720p` checkpoints into `./checkpoints/vista4d/` with
-```bash
-hf download Eyeline-Labs/Vista4D --local-dir ./checkpoints/vista4d
-```
-If you only need one resolution, pass `--include` to grab just that variant with
 ```bash
-hf download Eyeline-Labs/Vista4D --local-dir ./checkpoints/vista4d --include "384p49_step=30000/*" OR "720p49_step=3000/*"
 ```
-You'll also need the `Wan2.1-T2V-14B` base model. Download it from [Wan-AI/Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) into `./checkpoints/wan/Wan2.1-T2V-14B/` with
 ```bash
-hf download Wan-AI/Wan2.1-T2V-14B --local-dir ./checkpoints/wan/Wan2.1-T2V-14B
 ```
-Instructions on how to use these weights, more results, and paper can be found on our [project page](https://eyeline-labs.github.io/Vista4D/) and [GitHub repository](https://github.com/Eyeline-Labs/Vista4D/tree/main).

 ---
+base_model:
+- Wan-AI/Wan2.1-T2V-14B
 datasets:
 - KlingTeam/MultiCamVideo-Dataset
 - nkp37/OpenVid-1M
+license: apache-2.0
+pipeline_tag: image-to-video
 ---
+# Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight)
 [![Project Page](https://img.shields.io/badge/Project-Page-yellow?logo=data:image/svg%2Bxml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCIgZmlsbD0ibm9uZSIgc3Ryb2tlPSJ5ZWxsb3ciIHN0cm9rZS13aWR0aD0iMiIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIiBzdHJva2UtbGluZWpvaW49InJvdW5kIj48Y2lyY2xlIGN4PSIxMiIgY3k9IjEyIiByPSIxMCIvPjxsaW5lIHgxPSIyIiB5MT0iMTIiIHgyPSIyMiIgeTI9IjEyIi8+PHBhdGggZD0iTTEyIDJhMTUuMyAxNS4zIDAgMCAxIDQgMTAgMTUuMyAxNS4zIDAgMCAxLTQgMTAgMTUuMyAxNS4zIDAgMCAxLTQtMTAgMTUuMyAxNS4zIDAgMCAxIDQtMTB6Ii8+PC9zdmc+)](https://eyeline-labs.github.io/Vista4D)
+[![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=red)](https://huggingface.co/papers/2604.21915)
+[![GitHub](https://img.shields.io/badge/GitHub-Repo-black?logo=github)](https://github.com/Eyeline-Labs/Vista4D)
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Vista4D-blue)](https://huggingface.co/Eyeline-Labs/Vista4D)
+**Vista4D** is a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an input video, the method re-synthesizes the scene with the same dynamics from a different camera trajectory and viewpoint.
 <div align="center">
   <video controls autoplay muted style="width: 100%;" src="https://media.githubusercontent.com/media/Eyeline-Labs/Vista4D/website/media/vista4d.mp4"></video>
 </div>
+## Model Checkpoints
+This repository provides two Vista4D checkpoints finetuned on [`Wan-AI/Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B):
 | Checkpoint | Base model | Training resolution | Training steps | Notes |
 |---|---|---|---|---|
 | `384p49_step=30000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 672 &times; 384, 49 frames | 30000 | N/A |
 | `720p49_step=3000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 1280 &times; 720, 49 frames | 3000 | Finetuned from `384p49_step=30000` |
+## Usage
+To perform Vista4D inference, you need to download both the Wan 2.1 base model and the Vista4D checkpoints.
+### Download Weights
 ```bash
+# Download Vista4D checkpoints
+huggingface-cli download Eyeline-Labs/Vista4D --local-dir ./checkpoints/vista4d
+# Download the Wan2.1-T2V-14B base model
+huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir ./checkpoints/wan/Wan2.1-T2V-14B
 ```
+### Inference
+After environment setup and preprocessing as described in the [official repository](https://github.com/Eyeline-Labs/Vista4D), run inference with:
 ```bash
+EXAMPLE=couple-newspaper RESOLUTION=720p bash scripts/inference/example_inference_single.sh
 ```
+## Citation
+```bibtex
+@inproceedings{lin2026vista4d,
+    author = {Lin, {Kuan Heng} and Liu, Zhizheng and Salamanca, Pablo and Kant, Yash and Burgert, Ryan and Xu, Yuancheng and Namekata, Koichi and Zhao, Yiwei and Zhou, Bolei and Goldblum, Micah and Debevec, Paul and Yu, Ning},
+    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+    title = {{Vista4D}: Video Reshooting with 4D Point Clouds},
+    year = {2026}
+}
+```