Arrokothwhi commited on
Commit
67c98ad
Β·
verified Β·
1 Parent(s): 5b60941

Update README with VAE/ checkpoint paths

Browse files
Files changed (1) hide show
  1. README.md +110 -0
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: pytorch
4
+ pipeline_tag: image-to-video
5
+ tags:
6
+ - video-generation
7
+ - image-to-video
8
+ - vae
9
+ - video-vae
10
+ - video-reconstruction
11
+ - refdecoder
12
+ - wan2.1
13
+ - videovaeplus
14
+ ---
15
+
16
+ # RefDecoder
17
+
18
+ Reference-conditioned video VAE decoding for high-fidelity video reconstruction and generation.
19
+
20
+ [![arXiv](https://img.shields.io/badge/arXiv-Paper-b31b1b.svg)](https://arxiv.org/abs/TODO)
21
+ [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://refdecoder.github.io/)
22
+ [![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/RefDecoder/RefDecoder)
23
+
24
+ ## Overview
25
+
26
+ RefDecoder is a training and inference framework that adds reference-frame conditioning to video autoencoders. By injecting a selected reference frame into the decoder, RefDecoder preserves appearance and identity cues across the video, improving reconstruction and image-to-video generation quality compared to the original VAE decoders.
27
+
28
+ This repository hosts the released RefDecoder checkpoints for two backbones:
29
+
30
+ | Checkpoint | Backbone | File | Description |
31
+ | --- | --- | --- | --- |
32
+ | **RefDecoder-Wan** | Wan2.1 I2V VAE | `VAE/Wan2.1/wan2.1_ref.pt` | RefDecoder trained on top of the Wan2.1 image-to-video VAE decoder. |
33
+ | **RefDecoder-VideoVAEPlus** | VideoVAE+ (2+1D) | `VAE/VideoVAEPlus/videovaeplus_ref.pt` | RefDecoder trained on top of the VideoVAE+ autoencoder. |
34
+
35
+ ## Download
36
+
37
+ Using `huggingface_hub`:
38
+
39
+ ```python
40
+ from huggingface_hub import snapshot_download
41
+
42
+ snapshot_download(
43
+ repo_id="RefDecoder/RefDecoder",
44
+ local_dir="ckpt/RefDecoder",
45
+ )
46
+ ```
47
+
48
+ Or with the CLI:
49
+
50
+ ```bash
51
+ huggingface-cli download RefDecoder/RefDecoder --local-dir ckpt/RefDecoder
52
+ ```
53
+
54
+ Expected layout after download (matching the code repo's defaults):
55
+
56
+ ```text
57
+ ckpt/
58
+ └── RefDecoder/
59
+ └── VAE/
60
+ β”œβ”€β”€ Wan2.1/
61
+ β”‚ └── wan2.1_ref.pt
62
+ └── VideoVAEPlus/
63
+ └── videovaeplus_ref.pt
64
+ ```
65
+
66
+ ## Usage
67
+
68
+ Clone the code repository and follow its setup instructions:
69
+
70
+ ```bash
71
+ git clone https://github.com/RefDecoder/RefDecoder.git
72
+ cd RefDecoder
73
+ pip install -U uv && uv sync && source .venv/bin/activate
74
+ ```
75
+
76
+ Point the corresponding inference config to the downloaded checkpoint:
77
+
78
+ - `configs/inference/eval_wan.yaml` β€” set `model.params.ckpt_path` to `ckpt/RefDecoder/VAE/Wan2.1/wan2.1_ref.pt`
79
+ - `configs/inference/eval_videovaeplus.yaml` β€” set `model.params.ckpt_path` to `ckpt/RefDecoder/VAE/VideoVAEPlus/videovaeplus_ref.pt`
80
+
81
+ Wan2.1 reconstruction example:
82
+
83
+ ```bash
84
+ bash scripts/run_inference.sh eval_wan /path/to/input_videos outputs/wan 17 480 832 cuda:0
85
+ ```
86
+
87
+ VideoVAE+ reconstruction example:
88
+
89
+ ```bash
90
+ bash scripts/run_inference.sh eval_videovaeplus /path/to/input_videos outputs/videovaeplus 16 216 216 cuda:0
91
+ ```
92
+
93
+ ### Base-model requirements
94
+
95
+ - **RefDecoder-Wan** initializes its base VAE from `Wan-AI/Wan2.1-I2V-14B-480P-Diffusers` (subfolder `vae`). Make sure that model is accessible or already cached locally.
96
+ - **RefDecoder-VideoVAEPlus** requires the VideoVAE+ base checkpoint `sota-4-16z.ckpt` at `ckpt/VideoVAEPlus/sota-4-16z.ckpt`, or update the path in `src/models/VideoVAEPlus/videovaeplus_ref0conv.py`.
97
+
98
+ See the [GitHub README](https://github.com/RefDecoder/RefDecoder) for training, multi-GPU inference, and VBench image-to-video decoding workflows.
99
+
100
+ ## Citation
101
+
102
+ If you find RefDecoder useful, please cite:
103
+
104
+ ```bibtex
105
+ TODO
106
+ ```
107
+
108
+ ## License
109
+
110
+ Released under the Apache 2.0 License. See the [LICENSE](https://github.com/RefDecoder/RefDecoder/blob/main/LICENSE) in the code repository.