Arrokothwhi
/

RefDecoder

+---
+license: apache-2.0
+library_name: pytorch
+pipeline_tag: image-to-video
+tags:
+- video-generation
+- image-to-video
+- vae
+- video-vae
+- video-reconstruction
+- refdecoder
+- wan2.1
+- videovaeplus
+---
+# RefDecoder
+Reference-conditioned video VAE decoding for high-fidelity video reconstruction and generation.
+[![arXiv](https://img.shields.io/badge/arXiv-Paper-b31b1b.svg)](https://arxiv.org/abs/TODO)
+[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://refdecoder.github.io/)
+[![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/RefDecoder/RefDecoder)
+## Overview
+RefDecoder is a training and inference framework that adds reference-frame conditioning to video autoencoders. By injecting a selected reference frame into the decoder, RefDecoder preserves appearance and identity cues across the video, improving reconstruction and image-to-video generation quality compared to the original VAE decoders.
+This repository hosts the released RefDecoder checkpoints for two backbones:
+| Checkpoint | Backbone | File | Description |
+| --- | --- | --- | --- |
+| **RefDecoder-Wan** | Wan2.1 I2V VAE | `VAE/Wan2.1/wan2.1_ref.pt` | RefDecoder trained on top of the Wan2.1 image-to-video VAE decoder. |
+| **RefDecoder-VideoVAEPlus** | VideoVAE+ (2+1D) | `VAE/VideoVAEPlus/videovaeplus_ref.pt` | RefDecoder trained on top of the VideoVAE+ autoencoder. |
+## Download
+Using `huggingface_hub`:
+```python
+from huggingface_hub import snapshot_download
+snapshot_download(
+    repo_id="RefDecoder/RefDecoder",
+    local_dir="ckpt/RefDecoder",
+)
+```
+Or with the CLI:
+```bash
+huggingface-cli download RefDecoder/RefDecoder --local-dir ckpt/RefDecoder
+```
+Expected layout after download (matching the code repo's defaults):
+```text
+ckpt/
+└── RefDecoder/
+    └── VAE/
+        ├── Wan2.1/
+        │   └── wan2.1_ref.pt
+        └── VideoVAEPlus/
+            └── videovaeplus_ref.pt
+```
+## Usage
+Clone the code repository and follow its setup instructions:
+```bash
+git clone https://github.com/RefDecoder/RefDecoder.git
+cd RefDecoder
+pip install -U uv && uv sync && source .venv/bin/activate
+```
+Point the corresponding inference config to the downloaded checkpoint:
+- `configs/inference/eval_wan.yaml` — set `model.params.ckpt_path` to `ckpt/RefDecoder/VAE/Wan2.1/wan2.1_ref.pt`
+- `configs/inference/eval_videovaeplus.yaml` — set `model.params.ckpt_path` to `ckpt/RefDecoder/VAE/VideoVAEPlus/videovaeplus_ref.pt`
+Wan2.1 reconstruction example:
+```bash
+bash scripts/run_inference.sh eval_wan /path/to/input_videos outputs/wan 17 480 832 cuda:0
+```
+VideoVAE+ reconstruction example:
+```bash
+bash scripts/run_inference.sh eval_videovaeplus /path/to/input_videos outputs/videovaeplus 16 216 216 cuda:0
+```
+### Base-model requirements
+- **RefDecoder-Wan** initializes its base VAE from `Wan-AI/Wan2.1-I2V-14B-480P-Diffusers` (subfolder `vae`). Make sure that model is accessible or already cached locally.
+- **RefDecoder-VideoVAEPlus** requires the VideoVAE+ base checkpoint `sota-4-16z.ckpt` at `ckpt/VideoVAEPlus/sota-4-16z.ckpt`, or update the path in `src/models/VideoVAEPlus/videovaeplus_ref0conv.py`.
+See the [GitHub README](https://github.com/RefDecoder/RefDecoder) for training, multi-GPU inference, and VBench image-to-video decoding workflows.
+## Citation
+If you find RefDecoder useful, please cite:
+```bibtex
+TODO
+```
+## License
+Released under the Apache 2.0 License. See the [LICENSE](https://github.com/RefDecoder/RefDecoder/blob/main/LICENSE) in the code repository.