Improve model card metadata and documentation
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,49 +1,64 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- KlingTeam/MultiCamVideo-Dataset
|
| 5 |
- nkp37/OpenVid-1M
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
pipeline_tag: video-to-video
|
| 9 |
---
|
| 10 |
|
| 11 |
-
# Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight)
|
| 12 |
|
| 13 |
[](https://eyeline-labs.github.io/Vista4D)
|
| 14 |
-
[](https://
|
|
|
|
| 15 |
[](https://huggingface.co/Eyeline-Labs/Vista4D)
|
| 16 |
-
[](https://huggingface.co/datasets/Eyeline-Labs/Vista4D-Eval-Data)
|
| 17 |
-
|
| 18 |
-
[Kuan Heng Lin](https://kuanhenglin.github.io)<sup>1,3∗</sup>, [Zhizheng Liu](https://bosmallear.github.io)<sup>1,4∗</sup>, [Pablo Salamanca](https://pablosalaman.ca)<sup>1,2</sup>, [Yash Kant](https://yashkant.github.io)<sup>1,2</sup>, [Ryan Burgert](https://ryanndagreat.github.io)<sup>1,2,5∗</sup>, [Yuancheng Xu](https://yuancheng-xu.github.io)<sup>1,2</sup>, [Koichi Namekata](https://kmcode1.github.io)<sup>1,2,6∗</sup>, [Yiwei Zhao](https://zhaoyw007.github.io)<sup>2</sup>, [Bolei Zhou](https://boleizhou.github.io)<sup>4</sup>, [Micah Goldblum](https://goldblum.github.io)<sup>3</sup>, [Paul Debevec](https://www.pauldebevec.com)<sup>1,2</sup>, [Ning Yu](https://ningyu1991.github.io)<sup>1,2</sup> <br/>
|
| 19 |
-
<sup>1</sup>Eyeline Labs, <sup>2</sup>Netflix, <sup>3</sup>Columbia University, <sup>4</sup>UCLA, <sup>5</sup>Stony Brook University, <sup>6</sup>University of Oxford<br>
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
<div align="center">
|
| 24 |
<video controls autoplay muted style="width: 100%;" src="https://media.githubusercontent.com/media/Eyeline-Labs/Vista4D/website/media/vista4d.mp4"></video>
|
| 25 |
</div>
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
This
|
| 30 |
|
| 31 |
| Checkpoint | Base model | Training resolution | Training steps | Notes |
|
| 32 |
|---|---|---|---|---|
|
| 33 |
| `384p49_step=30000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 672 × 384, 49 frames | 30000 | N/A |
|
| 34 |
| `720p49_step=3000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 1280 × 720, 49 frames | 3000 | Finetuned from `384p49_step=30000` |
|
| 35 |
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
|
|
|
| 41 |
```bash
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
```
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
```bash
|
| 46 |
-
|
| 47 |
```
|
| 48 |
|
| 49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Wan-AI/Wan2.1-T2V-14B
|
| 4 |
datasets:
|
| 5 |
- KlingTeam/MultiCamVideo-Dataset
|
| 6 |
- nkp37/OpenVid-1M
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
pipeline_tag: image-to-video
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight)
|
| 12 |
|
| 13 |
[](https://eyeline-labs.github.io/Vista4D)
|
| 14 |
+
[](https://huggingface.co/papers/2604.21915)
|
| 15 |
+
[](https://github.com/Eyeline-Labs/Vista4D)
|
| 16 |
[](https://huggingface.co/Eyeline-Labs/Vista4D)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
**Vista4D** is a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an input video, the method re-synthesizes the scene with the same dynamics from a different camera trajectory and viewpoint.
|
| 19 |
|
| 20 |
<div align="center">
|
| 21 |
<video controls autoplay muted style="width: 100%;" src="https://media.githubusercontent.com/media/Eyeline-Labs/Vista4D/website/media/vista4d.mp4"></video>
|
| 22 |
</div>
|
| 23 |
|
| 24 |
+
## Model Checkpoints
|
| 25 |
|
| 26 |
+
This repository provides two Vista4D checkpoints finetuned on [`Wan-AI/Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B):
|
| 27 |
|
| 28 |
| Checkpoint | Base model | Training resolution | Training steps | Notes |
|
| 29 |
|---|---|---|---|---|
|
| 30 |
| `384p49_step=30000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 672 × 384, 49 frames | 30000 | N/A |
|
| 31 |
| `720p49_step=3000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 1280 × 720, 49 frames | 3000 | Finetuned from `384p49_step=30000` |
|
| 32 |
|
| 33 |
+
## Usage
|
| 34 |
+
|
| 35 |
+
To perform Vista4D inference, you need to download both the Wan 2.1 base model and the Vista4D checkpoints.
|
| 36 |
+
|
| 37 |
+
### Download Weights
|
| 38 |
+
|
| 39 |
```bash
|
| 40 |
+
# Download Vista4D checkpoints
|
| 41 |
+
huggingface-cli download Eyeline-Labs/Vista4D --local-dir ./checkpoints/vista4d
|
| 42 |
+
|
| 43 |
+
# Download the Wan2.1-T2V-14B base model
|
| 44 |
+
huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir ./checkpoints/wan/Wan2.1-T2V-14B
|
| 45 |
```
|
| 46 |
+
|
| 47 |
+
### Inference
|
| 48 |
+
|
| 49 |
+
After environment setup and preprocessing as described in the [official repository](https://github.com/Eyeline-Labs/Vista4D), run inference with:
|
| 50 |
+
|
| 51 |
```bash
|
| 52 |
+
EXAMPLE=couple-newspaper RESOLUTION=720p bash scripts/inference/example_inference_single.sh
|
| 53 |
```
|
| 54 |
|
| 55 |
+
## Citation
|
| 56 |
+
|
| 57 |
+
```bibtex
|
| 58 |
+
@inproceedings{lin2026vista4d,
|
| 59 |
+
author = {Lin, {Kuan Heng} and Liu, Zhizheng and Salamanca, Pablo and Kant, Yash and Burgert, Ryan and Xu, Yuancheng and Namekata, Koichi and Zhao, Yiwei and Zhou, Bolei and Goldblum, Micah and Debevec, Paul and Yu, Ning},
|
| 60 |
+
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
|
| 61 |
+
title = {{Vista4D}: Video Reshooting with 4D Point Clouds},
|
| 62 |
+
year = {2026}
|
| 63 |
+
}
|
| 64 |
+
```
|