| --- |
| license: apache-2.0 |
| library_name: pytorch |
| pipeline_tag: image-to-video |
| tags: |
| - video-generation |
| - image-to-video |
| - vae |
| - video-vae |
| - video-reconstruction |
| - refdecoder |
| - wan2.1 |
| - videovaeplus |
| --- |
| |
| # RefDecoder |
|
|
| Reference-conditioned video VAE decoding for high-fidelity video reconstruction and generation. |
|
|
| [](https://arxiv.org/abs/2605.15196) |
| [](https://refdecoder.github.io/) |
| [](https://github.com/RefDecoder/RefDecoder) |
|
|
| ## Overview |
|
|
| RefDecoder is a training and inference framework that adds reference-frame conditioning to video autoencoders. By injecting a selected reference frame into the decoder, RefDecoder preserves appearance and identity cues across the video, improving reconstruction and image-to-video generation quality compared to the original VAE decoders. |
|
|
| This repository hosts the released RefDecoder checkpoints for two backbones: |
|
|
| | Checkpoint | Backbone | File | Description | |
| | --- | --- | --- | --- | |
| | **RefDecoder-Wan** | Wan2.1 I2V VAE | `VAE/Wan2.1/wan2.1_ref.pt` | RefDecoder trained on top of the Wan2.1 image-to-video VAE decoder. | |
| | **RefDecoder-VideoVAEPlus** | VideoVAE+ (2+1D) | `VAE/VideoVAEPlus/videovaeplus_ref.pt` | RefDecoder trained on top of the VideoVAE+ autoencoder. | |
|
|
| ## Download |
|
|
| Using `huggingface_hub`: |
|
|
| ```python |
| from huggingface_hub import snapshot_download |
| |
| snapshot_download( |
| repo_id="RefDecoder/RefDecoder", |
| local_dir="ckpt/RefDecoder", |
| ) |
| ``` |
|
|
| Or with the CLI: |
|
|
| ```bash |
| huggingface-cli download RefDecoder/RefDecoder --local-dir ckpt/RefDecoder |
| ``` |
|
|
| Expected layout after download (matching the code repo's defaults): |
|
|
| ```text |
| ckpt/ |
| βββ RefDecoder/ |
| βββ VAE/ |
| βββ Wan2.1/ |
| β βββ wan2.1_ref.pt |
| βββ VideoVAEPlus/ |
| βββ videovaeplus_ref.pt |
| ``` |
|
|
| ## Usage |
|
|
| Clone the code repository and follow its setup instructions: |
|
|
| ```bash |
| git clone https://github.com/RefDecoder/RefDecoder.git |
| cd RefDecoder |
| pip install -U uv && uv sync && source .venv/bin/activate |
| ``` |
|
|
| Point the corresponding inference config to the downloaded checkpoint: |
|
|
| - `configs/inference/eval_wan.yaml` β set `model.params.ckpt_path` to `ckpt/RefDecoder/VAE/Wan2.1/wan2.1_ref.pt` |
| - `configs/inference/eval_videovaeplus.yaml` β set `model.params.ckpt_path` to `ckpt/RefDecoder/VAE/VideoVAEPlus/videovaeplus_ref.pt` |
|
|
| Wan2.1 reconstruction example: |
|
|
| ```bash |
| bash scripts/run_inference.sh eval_wan /path/to/input_videos outputs/wan 17 480 832 cuda:0 |
| ``` |
|
|
| VideoVAE+ reconstruction example: |
|
|
| ```bash |
| bash scripts/run_inference.sh eval_videovaeplus /path/to/input_videos outputs/videovaeplus 16 216 216 cuda:0 |
| ``` |
|
|
| ### Base-model requirements |
|
|
| - **RefDecoder-Wan** initializes its base VAE from `Wan-AI/Wan2.1-I2V-14B-480P-Diffusers` (subfolder `vae`). Make sure that model is accessible or already cached locally. |
| - **RefDecoder-VideoVAEPlus** requires the VideoVAE+ base checkpoint `sota-4-16z.ckpt` at `ckpt/VideoVAEPlus/sota-4-16z.ckpt`, or update the path in `src/models/VideoVAEPlus/videovaeplus_ref0conv.py`. |
|
|
| See the [GitHub README](https://github.com/RefDecoder/RefDecoder) for training, multi-GPU inference, and VBench image-to-video decoding workflows. |
|
|
| ## Citation |
|
|
| If you find RefDecoder useful, please cite: |
|
|
| ```bibtex |
| @misc{fan2026refdecoderenhancingvisualgeneration, |
| title={RefDecoder: Enhancing Visual Generation with Conditional Video Decoding}, |
| author={Xiang Fan and Yuheng Wang and Bohan Fang and Zhongzheng Ren and Ranjay Krishna}, |
| year={2026}, |
| eprint={2605.15196}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV}, |
| url={https://arxiv.org/abs/2605.15196}, |
| } |
| ``` |
|
|
| ## License |
|
|
| Released under the Apache 2.0 License. See the [LICENSE](https://github.com/RefDecoder/RefDecoder/blob/main/LICENSE) in the code repository. |
|
|