RefDecoder
Reference-conditioned video VAE decoding for high-fidelity video reconstruction and generation.
Overview
RefDecoder is a training and inference framework that adds reference-frame conditioning to video autoencoders. By injecting a selected reference frame into the decoder, RefDecoder preserves appearance and identity cues across the video, improving reconstruction and image-to-video generation quality compared to the original VAE decoders.
This repository hosts the released RefDecoder checkpoints for two backbones:
| Checkpoint | Backbone | File | Description |
|---|---|---|---|
| RefDecoder-Wan | Wan2.1 I2V VAE | VAE/Wan2.1/wan2.1_ref.pt |
RefDecoder trained on top of the Wan2.1 image-to-video VAE decoder. |
| RefDecoder-VideoVAEPlus | VideoVAE+ (2+1D) | VAE/VideoVAEPlus/videovaeplus_ref.pt |
RefDecoder trained on top of the VideoVAE+ autoencoder. |
Download
Using huggingface_hub:
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="RefDecoder/RefDecoder",
local_dir="ckpt/RefDecoder",
)
Or with the CLI:
huggingface-cli download RefDecoder/RefDecoder --local-dir ckpt/RefDecoder
Expected layout after download (matching the code repo's defaults):
ckpt/
βββ RefDecoder/
βββ VAE/
βββ Wan2.1/
β βββ wan2.1_ref.pt
βββ VideoVAEPlus/
βββ videovaeplus_ref.pt
Usage
Clone the code repository and follow its setup instructions:
git clone https://github.com/RefDecoder/RefDecoder.git
cd RefDecoder
pip install -U uv && uv sync && source .venv/bin/activate
Point the corresponding inference config to the downloaded checkpoint:
configs/inference/eval_wan.yamlβ setmodel.params.ckpt_pathtockpt/RefDecoder/VAE/Wan2.1/wan2.1_ref.ptconfigs/inference/eval_videovaeplus.yamlβ setmodel.params.ckpt_pathtockpt/RefDecoder/VAE/VideoVAEPlus/videovaeplus_ref.pt
Wan2.1 reconstruction example:
bash scripts/run_inference.sh eval_wan /path/to/input_videos outputs/wan 17 480 832 cuda:0
VideoVAE+ reconstruction example:
bash scripts/run_inference.sh eval_videovaeplus /path/to/input_videos outputs/videovaeplus 16 216 216 cuda:0
Base-model requirements
- RefDecoder-Wan initializes its base VAE from
Wan-AI/Wan2.1-I2V-14B-480P-Diffusers(subfoldervae). Make sure that model is accessible or already cached locally. - RefDecoder-VideoVAEPlus requires the VideoVAE+ base checkpoint
sota-4-16z.ckptatckpt/VideoVAEPlus/sota-4-16z.ckpt, or update the path insrc/models/VideoVAEPlus/videovaeplus_ref0conv.py.
See the GitHub README for training, multi-GPU inference, and VBench image-to-video decoding workflows.
Citation
If you find RefDecoder useful, please cite:
@misc{fan2026refdecoderenhancingvisualgeneration,
title={RefDecoder: Enhancing Visual Generation with Conditional Video Decoding},
author={Xiang Fan and Yuheng Wang and Bohan Fang and Zhongzheng Ren and Ranjay Krishna},
year={2026},
eprint={2605.15196},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.15196},
}
License
Released under the Apache 2.0 License. See the LICENSE in the code repository.