File size: 3,970 Bytes
67c98ad 23311ca 67c98ad 23311ca 67c98ad | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | ---
license: apache-2.0
library_name: pytorch
pipeline_tag: image-to-video
tags:
- video-generation
- image-to-video
- vae
- video-vae
- video-reconstruction
- refdecoder
- wan2.1
- videovaeplus
---
# RefDecoder
Reference-conditioned video VAE decoding for high-fidelity video reconstruction and generation.
[](https://arxiv.org/abs/2605.15196)
[](https://refdecoder.github.io/)
[](https://github.com/RefDecoder/RefDecoder)
## Overview
RefDecoder is a training and inference framework that adds reference-frame conditioning to video autoencoders. By injecting a selected reference frame into the decoder, RefDecoder preserves appearance and identity cues across the video, improving reconstruction and image-to-video generation quality compared to the original VAE decoders.
This repository hosts the released RefDecoder checkpoints for two backbones:
| Checkpoint | Backbone | File | Description |
| --- | --- | --- | --- |
| **RefDecoder-Wan** | Wan2.1 I2V VAE | `VAE/Wan2.1/wan2.1_ref.pt` | RefDecoder trained on top of the Wan2.1 image-to-video VAE decoder. |
| **RefDecoder-VideoVAEPlus** | VideoVAE+ (2+1D) | `VAE/VideoVAEPlus/videovaeplus_ref.pt` | RefDecoder trained on top of the VideoVAE+ autoencoder. |
## Download
Using `huggingface_hub`:
```python
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="RefDecoder/RefDecoder",
local_dir="ckpt/RefDecoder",
)
```
Or with the CLI:
```bash
huggingface-cli download RefDecoder/RefDecoder --local-dir ckpt/RefDecoder
```
Expected layout after download (matching the code repo's defaults):
```text
ckpt/
βββ RefDecoder/
βββ VAE/
βββ Wan2.1/
β βββ wan2.1_ref.pt
βββ VideoVAEPlus/
βββ videovaeplus_ref.pt
```
## Usage
Clone the code repository and follow its setup instructions:
```bash
git clone https://github.com/RefDecoder/RefDecoder.git
cd RefDecoder
pip install -U uv && uv sync && source .venv/bin/activate
```
Point the corresponding inference config to the downloaded checkpoint:
- `configs/inference/eval_wan.yaml` β set `model.params.ckpt_path` to `ckpt/RefDecoder/VAE/Wan2.1/wan2.1_ref.pt`
- `configs/inference/eval_videovaeplus.yaml` β set `model.params.ckpt_path` to `ckpt/RefDecoder/VAE/VideoVAEPlus/videovaeplus_ref.pt`
Wan2.1 reconstruction example:
```bash
bash scripts/run_inference.sh eval_wan /path/to/input_videos outputs/wan 17 480 832 cuda:0
```
VideoVAE+ reconstruction example:
```bash
bash scripts/run_inference.sh eval_videovaeplus /path/to/input_videos outputs/videovaeplus 16 216 216 cuda:0
```
### Base-model requirements
- **RefDecoder-Wan** initializes its base VAE from `Wan-AI/Wan2.1-I2V-14B-480P-Diffusers` (subfolder `vae`). Make sure that model is accessible or already cached locally.
- **RefDecoder-VideoVAEPlus** requires the VideoVAE+ base checkpoint `sota-4-16z.ckpt` at `ckpt/VideoVAEPlus/sota-4-16z.ckpt`, or update the path in `src/models/VideoVAEPlus/videovaeplus_ref0conv.py`.
See the [GitHub README](https://github.com/RefDecoder/RefDecoder) for training, multi-GPU inference, and VBench image-to-video decoding workflows.
## Citation
If you find RefDecoder useful, please cite:
```bibtex
@misc{fan2026refdecoderenhancingvisualgeneration,
title={RefDecoder: Enhancing Visual Generation with Conditional Video Decoding},
author={Xiang Fan and Yuheng Wang and Bohan Fang and Zhongzheng Ren and Ranjay Krishna},
year={2026},
eprint={2605.15196},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.15196},
}
```
## License
Released under the Apache 2.0 License. See the [LICENSE](https://github.com/RefDecoder/RefDecoder/blob/main/LICENSE) in the code repository.
|