Duplicate from Eyeline-Labs/Vista4D

Browse files

Co-authored-by: Ning Yu <ningyu1991@users.noreply.huggingface.co>

Files changed (7) hide show

.gitattributes +35 -0
384p49_step=30000/config.yaml +12 -0
384p49_step=30000/dit.pth +3 -0
720p49_step=3000/config.yaml +12 -0
720p49_step=3000/dit.pth +3 -0
README.md +49 -0
config.json +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

384p49_step=30000/config.yaml ADDED Viewed

	@@ -0,0 +1,12 @@

+dit:
+    positional_embedding_offset: 31
+    latent_encoder:
+        source_init_mode: wan_patch_embed
+        point_cloud_init_mode: wan_patch_embed
+        mask_init_mode: zero_init
+        use_source_masks: True
+        use_point_cloud_masks: True
+    augmentation:
+        source_noise_level: &source_noise_level 0.0
+        point_cloud_noise_level: &point_cloud_noise_level 0.0
+        image_noise_level: &image_noise_level 0.0

384p49_step=30000/dit.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:215d31ff294477fdcf0a136d1734ed03a8782261d6a66a8dd9561bdf1cd74a6a
+size 21069758850

720p49_step=3000/config.yaml ADDED Viewed

	@@ -0,0 +1,12 @@

+dit:
+    positional_embedding_offset: 31
+    latent_encoder:
+        source_init_mode: wan_patch_embed
+        point_cloud_init_mode: wan_patch_embed
+        mask_init_mode: zero_init
+        use_source_masks: True
+        use_point_cloud_masks: True
+    augmentation:
+        source_noise_level: &source_noise_level 0.0
+        point_cloud_noise_level: &point_cloud_noise_level 0.0
+        image_noise_level: &image_noise_level 0.0

720p49_step=3000/dit.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e1bc5dac57e3a4dcd74834b3124d4754db2d403d3cd701e96581491d717664f
+size 21069748043

README.md ADDED Viewed

	@@ -0,0 +1,49 @@

+---
+license: apache-2.0
+datasets:
+- KlingTeam/MultiCamVideo-Dataset
+- nkp37/OpenVid-1M
+base_model:
+- Wan-AI/Wan2.1-T2V-14B
+pipeline_tag: video-to-video
+---
+# Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight) – Model Checkpoints
+[![Project Page](https://img.shields.io/badge/Project-Page-yellow?logo=data:image/svg%2Bxml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCIgZmlsbD0ibm9uZSIgc3Ryb2tlPSJ5ZWxsb3ciIHN0cm9rZS13aWR0aD0iMiIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIiBzdHJva2UtbGluZWpvaW49InJvdW5kIj48Y2lyY2xlIGN4PSIxMiIgY3k9IjEyIiByPSIxMCIvPjxsaW5lIHgxPSIyIiB5MT0iMTIiIHgyPSIyMiIgeTI9IjEyIi8+PHBhdGggZD0iTTEyIDJhMTUuMyAxNS4zIDAgMCAxIDQgMTAgMTUuMyAxNS4zIDAgMCAxLTQgMTAgMTUuMyAxNS4zIDAgMCAxLTQtMTAgMTUuMyAxNS4zIDAgMCAxIDQtMTB6Ii8+PC9zdmc+)](https://eyeline-labs.github.io/Vista4D)
+[![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2604.21915)
+[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Vista4D-blue)](https://huggingface.co/Eyeline-Labs/Vista4D)
+[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Eval%20Data-blue)](https://huggingface.co/datasets/Eyeline-Labs/Vista4D-Eval-Data)
+[Kuan Heng Lin](https://kuanhenglin.github.io)<sup>1,3&lowast;</sup>, [Zhizheng Liu](https://bosmallear.github.io)<sup>1,4&lowast;</sup>, [Pablo Salamanca](https://pablosalaman.ca)<sup>1,2</sup>, [Yash Kant](https://yashkant.github.io)<sup>1,2</sup>, [Ryan Burgert](https://ryanndagreat.github.io)<sup>1,2,5&lowast;</sup>, [Yuancheng Xu](https://yuancheng-xu.github.io)<sup>1,2</sup>, [Koichi Namekata](https://kmcode1.github.io)<sup>1,2,6&lowast;</sup>, [Yiwei Zhao](https://zhaoyw007.github.io)<sup>2</sup>, [Bolei Zhou](https://boleizhou.github.io)<sup>4</sup>, [Micah Goldblum](https://goldblum.github.io)<sup>3</sup>, [Paul Debevec](https://www.pauldebevec.com)<sup>1,2</sup>, [Ning Yu](https://ningyu1991.github.io)<sup>1,2</sup> <br/>
+<sup>1</sup>Eyeline Labs, <sup>2</sup>Netflix, <sup>3</sup>Columbia University, <sup>4</sup>UCLA, <sup>5</sup>Stony Brook University, <sup>6</sup>University of Oxford<br>
+<sup>&lowast;</sup>*Work done during an internship at Eyeline Labs*
+<div align="center">
+  <video controls autoplay muted style="width: 100%;" src="https://media.githubusercontent.com/media/Eyeline-Labs/Vista4D/website/media/vista4d.mp4"></video>
+</div>
+**Vista4D** is a *video reshooting* framework which synthesizes the dynamic scene represented by an input source video from novel camera trajectories and viewpoints. We bridge the distribution shift between training and inference for point-cloud-grounded video reshooting, as Vista4D is robust to point cloud artifacts from imprecise 4D reconstruction of real-world videos by training on noisy, reconstructed multiview videos. Our 4D point cloud with temporally-persistent static points also explicitly preserves scene content and improved camera control. Vista4D generalizes to real-world applications such as dynamic scene expansion (casual video capture of scene as background reference), 4D scene recomposition (point cloud editing), and long video inference with memory.
+This is the Hugging Face repository containing our model weights. We provide two Vista4D checkpoints finetuned on [`Wan-AI/Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)(https://huggingface.co/Wan-AI/Wan2.1-T2V-14B):
+| Checkpoint | Base model | Training resolution | Training steps | Notes |
+|---|---|---|---|---|
+| `384p49_step=30000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 672 &times; 384, 49 frames | 30000 | N/A |
+| `720p49_step=3000` | [`Wan2.1-T2V-14B`](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | 1280 &times; 720, 49 frames | 3000 | Finetuned from `384p49_step=30000` |
+To do Vista4D inference, first download the Wan 2.1 and Vista4D checkpoints to `./checkpoints/`. The Vista4D checkpoints are hosted on [Eyeline-Labs/Vista4D](https://huggingface.co/Eyeline-Labs/Vista4D). Download both the `384p` and `720p` checkpoints into `./checkpoints/vista4d/` with
+```bash
+hf download Eyeline-Labs/Vista4D --local-dir ./checkpoints/vista4d
+```
+If you only need one resolution, pass `--include` to grab just that variant with
+```bash
+hf download Eyeline-Labs/Vista4D --local-dir ./checkpoints/vista4d --include "384p49_step=30000/*" OR "720p49_step=3000/*"
+```
+You'll also need the `Wan2.1-T2V-14B` base model. Download it from [Wan-AI/Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) into `./checkpoints/wan/Wan2.1-T2V-14B/` with
+```bash
+hf download Wan-AI/Wan2.1-T2V-14B --local-dir ./checkpoints/wan/Wan2.1-T2V-14B
+```
+Instructions on how to use these weights, more results, and paper can be found on our [project page](https://eyeline-labs.github.io/Vista4D/) and [GitHub repository](https://github.com/Eyeline-Labs/Vista4D/tree/main).

config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+    "Why is this here?": "This file is here so we can track model download stats."
+}