Commit ·
0cf19fe
0
Parent(s):
Duplicate from Eyeline-Labs/Vista4D
Browse filesCo-authored-by: Ning Yu <ningyu1991@users.noreply.huggingface.co>
- .gitattributes +35 -0
- 384p49_step=30000/config.yaml +12 -0
- 384p49_step=30000/dit.pth +3 -0
- 720p49_step=3000/config.yaml +12 -0
- 720p49_step=3000/dit.pth +3 -0
- README.md +49 -0
- config.json +3 -0
.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
384p49_step=30000/config.yaml
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
dit:
|
| 2 |
+
positional_embedding_offset: 31
|
| 3 |
+
latent_encoder:
|
| 4 |
+
source_init_mode: wan_patch_embed
|
| 5 |
+
point_cloud_init_mode: wan_patch_embed
|
| 6 |
+
mask_init_mode: zero_init
|
| 7 |
+
use_source_masks: True
|
| 8 |
+
use_point_cloud_masks: True
|
| 9 |
+
augmentation:
|
| 10 |
+
source_noise_level: &source_noise_level 0.0
|
| 11 |
+
point_cloud_noise_level: &point_cloud_noise_level 0.0
|
| 12 |
+
image_noise_level: &image_noise_level 0.0
|
384p49_step=30000/dit.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:215d31ff294477fdcf0a136d1734ed03a8782261d6a66a8dd9561bdf1cd74a6a
|
| 3 |
+
size 21069758850
|
720p49_step=3000/config.yaml
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
dit:
|
| 2 |
+
positional_embedding_offset: 31
|
| 3 |
+
latent_encoder:
|
| 4 |
+
source_init_mode: wan_patch_embed
|
| 5 |
+
point_cloud_init_mode: wan_patch_embed
|
| 6 |
+
mask_init_mode: zero_init
|
| 7 |
+
use_source_masks: True
|
| 8 |
+
use_point_cloud_masks: True
|
| 9 |
+
augmentation:
|
| 10 |
+
source_noise_level: &source_noise_level 0.0
|
| 11 |
+
point_cloud_noise_level: &point_cloud_noise_level 0.0
|
| 12 |
+
image_noise_level: &image_noise_level 0.0
|
720p49_step=3000/dit.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2e1bc5dac57e3a4dcd74834b3124d4754db2d403d3cd701e96581491d717664f
|
| 3 |
+
size 21069748043
|
README.md
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- KlingTeam/MultiCamVideo-Dataset
|
| 5 |
+
- nkp37/OpenVid-1M
|
| 6 |
+
base_model:
|
| 7 |
+
- Wan-AI/Wan2.1-T2V-14B
|
| 8 |
+
pipeline_tag: video-to-video
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight) – Model Checkpoints
|
| 12 |
+
|
| 13 |
+
[](https://eyeline-labs.github.io/Vista4D)
|
| 14 |
+
[](https://arxiv.org/abs/2604.21915)
|
| 15 |
+
[](https://huggingface.co/Eyeline-Labs/Vista4D)
|
| 16 |
+
[](https://huggingface.co/datasets/Eyeline-Labs/Vista4D-Eval-Data)
|
| 17 |
+
|
| 18 |
+
[Kuan Heng Lin](https://kuanhenglin.github.io)<sup>1,3∗</sup>, [Zhizheng Liu](https://bosmallear.github.io)<sup>1,4∗</sup>, [Pablo Salamanca](https://pablosalaman.ca)<sup>1,2</sup>, [Yash Kant](https://yashkant.github.io)<sup>1,2</sup>, [Ryan Burgert](https://ryanndagreat.github.io)<sup>1,2,5∗</sup>, [Yuancheng Xu](https://yuancheng-xu.github.io)<sup>1,2</sup>, [Koichi Namekata](https://kmcode1.github.io)<sup>1,2,6∗</sup>, [Yiwei Zhao](https://zhaoyw007.github.io)<sup>2</sup>, [Bolei Zhou](https://boleizhou.github.io)<sup>4</sup>, [Micah Goldblum](https://goldblum.github.io)<sup>3</sup>, [Paul Debevec](https://www.pauldebevec.com)<sup>1,2</sup>, [Ning Yu](https://ningyu1991.github.io)<sup>1,2</sup> <br/>
|
| 19 |
+
<sup>1</sup>Eyeline Labs, <sup>2</sup>Netflix, <sup>3</sup>Columbia University, <sup>4</sup>UCLA, <sup>5</sup>Stony Brook University, <sup>6</sup>University of Oxford<br>
|
| 20 |
+
|
| 21 |
+
<sup>∗</sup>*Work done during an internship at Eyeline Labs*
|
| 22 |
+
|
| 23 |
+
<div align="center">
|
| 24 |
+
<video controls autoplay muted style="width: 100%;" src="https://media.githubusercontent.com/media/Eyeline-Labs/Vista4D/website/media/vista4d.mp4"></video>
|
| 25 |
+
</div>
|
| 26 |
+
|
| 27 |
+
**Vista4D** is a *video reshooting* framework which synthesizes the dynamic scene represented by an input source video from novel camera trajectories and viewpoints. We bridge the distribution shift between training and inference for point-cloud-grounded video reshooting, as Vista4D is robust to point cloud artifacts from imprecise 4D reconstruction of real-world videos by training on noisy, reconstructed multiview videos. Our 4D point cloud with temporally-persistent static points also explicitly preserves scene content and improved camera control. Vista4D generalizes to real-world applications such as dynamic scene expansion (casual video capture of scene as background reference), 4D scene recomposition (point cloud editing), and long video inference with memory.
|
| 28 |
+
|
| 29 |
+
This is the Hugging Face repository containing our model weights. We provide two Vista4D checkpoints finetuned on [`Wan-AI/Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)(https://huggingface.co/Wan-AI/Wan2.1-T2V-14B):
|
| 30 |
+
|
| 31 |
+
| Checkpoint | Base model | Training resolution | Training steps | Notes |
|
| 32 |
+
|---|---|---|---|---|
|
| 33 |
+
| `384p49_step=30000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 672 × 384, 49 frames | 30000 | N/A |
|
| 34 |
+
| `720p49_step=3000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 1280 × 720, 49 frames | 3000 | Finetuned from `384p49_step=30000` |
|
| 35 |
+
|
| 36 |
+
To do Vista4D inference, first download the Wan 2.1 and Vista4D checkpoints to `./checkpoints/`. The Vista4D checkpoints are hosted on [Eyeline-Labs/Vista4D](https://huggingface.co/Eyeline-Labs/Vista4D). Download both the `384p` and `720p` checkpoints into `./checkpoints/vista4d/` with
|
| 37 |
+
```bash
|
| 38 |
+
hf download Eyeline-Labs/Vista4D --local-dir ./checkpoints/vista4d
|
| 39 |
+
```
|
| 40 |
+
If you only need one resolution, pass `--include` to grab just that variant with
|
| 41 |
+
```bash
|
| 42 |
+
hf download Eyeline-Labs/Vista4D --local-dir ./checkpoints/vista4d --include "384p49_step=30000/*" OR "720p49_step=3000/*"
|
| 43 |
+
```
|
| 44 |
+
You'll also need the `Wan2.1-T2V-14B` base model. Download it from [Wan-AI/Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) into `./checkpoints/wan/Wan2.1-T2V-14B/` with
|
| 45 |
+
```bash
|
| 46 |
+
hf download Wan-AI/Wan2.1-T2V-14B --local-dir ./checkpoints/wan/Wan2.1-T2V-14B
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
Instructions on how to use these weights, more results, and paper can be found on our [project page](https://eyeline-labs.github.io/Vista4D/) and [GitHub repository](https://github.com/Eyeline-Labs/Vista4D/tree/main).
|
config.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"Why is this here?": "This file is here so we can track model download stats."
|
| 3 |
+
}
|