Update README.md

23311ca verified about 13 hours ago

3.97 kB

	---
	license: apache-2.0
	library_name: pytorch
	pipeline_tag: image-to-video
	tags:
	- video-generation
	- image-to-video
	- vae
	- video-vae
	- video-reconstruction
	- refdecoder
	- wan2.1
	- videovaeplus
	---

	# RefDecoder

	Reference-conditioned video VAE decoding for high-fidelity video reconstruction and generation.

	[![arXiv](https://img.shields.io/badge/arXiv-Paper-b31b1b.svg)](https://arxiv.org/abs/2605.15196)
	[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://refdecoder.github.io/)
	[![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/RefDecoder/RefDecoder)

	## Overview

	RefDecoder is a training and inference framework that adds reference-frame conditioning to video autoencoders. By injecting a selected reference frame into the decoder, RefDecoder preserves appearance and identity cues across the video, improving reconstruction and image-to-video generation quality compared to the original VAE decoders.

	This repository hosts the released RefDecoder checkpoints for two backbones:

	\| Checkpoint \| Backbone \| File \| Description \|
	\| --- \| --- \| --- \| --- \|
	\| RefDecoder-Wan \| Wan2.1 I2V VAE \| `VAE/Wan2.1/wan2.1_ref.pt` \| RefDecoder trained on top of the Wan2.1 image-to-video VAE decoder. \|
	\| RefDecoder-VideoVAEPlus \| VideoVAE+ (2+1D) \| `VAE/VideoVAEPlus/videovaeplus_ref.pt` \| RefDecoder trained on top of the VideoVAE+ autoencoder. \|

	## Download

	Using `huggingface_hub`:

	```python
	from huggingface_hub import snapshot_download

	snapshot_download(
	repo_id="RefDecoder/RefDecoder",
	local_dir="ckpt/RefDecoder",
	)
	```

	Or with the CLI:

	```bash
	huggingface-cli download RefDecoder/RefDecoder --local-dir ckpt/RefDecoder
	```

	Expected layout after download (matching the code repo's defaults):

	```text
	ckpt/
	└── RefDecoder/
	└── VAE/
	├── Wan2.1/
	│ └── wan2.1_ref.pt
	└── VideoVAEPlus/
	└── videovaeplus_ref.pt
	```

	## Usage

	Clone the code repository and follow its setup instructions:

	```bash
	git clone https://github.com/RefDecoder/RefDecoder.git
	cd RefDecoder
	pip install -U uv && uv sync && source .venv/bin/activate
	```

	Point the corresponding inference config to the downloaded checkpoint:

	- `configs/inference/eval_wan.yaml` — set `model.params.ckpt_path` to `ckpt/RefDecoder/VAE/Wan2.1/wan2.1_ref.pt`
	- `configs/inference/eval_videovaeplus.yaml` — set `model.params.ckpt_path` to `ckpt/RefDecoder/VAE/VideoVAEPlus/videovaeplus_ref.pt`

	Wan2.1 reconstruction example:

	```bash
	bash scripts/run_inference.sh eval_wan /path/to/input_videos outputs/wan 17 480 832 cuda:0
	```

	VideoVAE+ reconstruction example:

	```bash
	bash scripts/run_inference.sh eval_videovaeplus /path/to/input_videos outputs/videovaeplus 16 216 216 cuda:0
	```

	### Base-model requirements

	- RefDecoder-Wan initializes its base VAE from `Wan-AI/Wan2.1-I2V-14B-480P-Diffusers` (subfolder `vae`). Make sure that model is accessible or already cached locally.
	- RefDecoder-VideoVAEPlus requires the VideoVAE+ base checkpoint `sota-4-16z.ckpt` at `ckpt/VideoVAEPlus/sota-4-16z.ckpt`, or update the path in `src/models/VideoVAEPlus/videovaeplus_ref0conv.py`.

	See the [GitHub README](https://github.com/RefDecoder/RefDecoder) for training, multi-GPU inference, and VBench image-to-video decoding workflows.

	## Citation

	If you find RefDecoder useful, please cite:

	```bibtex
	@misc{fan2026refdecoderenhancingvisualgeneration,
	title={RefDecoder: Enhancing Visual Generation with Conditional Video Decoding},
	author={Xiang Fan and Yuheng Wang and Bohan Fang and Zhongzheng Ren and Ranjay Krishna},
	year={2026},
	eprint={2605.15196},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2605.15196},
	}
	```

	## License

	Released under the Apache 2.0 License. See the [LICENSE](https://github.com/RefDecoder/RefDecoder/blob/main/LICENSE) in the code repository.