Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<p align="center" style="border-radius: 10px">
|
| 2 |
+
<img src="assets/icon+name.png" width="50%" alt="logo"/>
|
| 3 |
+
</p>
|
| 4 |
+
|
| 5 |
+
# <div align="center" >Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory<div align="center">
|
| 6 |
+
|
| 7 |
+
<div align="center">
|
| 8 |
+
<p>
|
| 9 |
+
<a href="https://eddie0521.github.io/">Jinzhuo Liu</a><sup>1</sup>,
|
| 10 |
+
<a href="https://zhangzjn.github.io">Jiangning Zhang</a><sup>1<a href="mailto:186368@zju.edu.cn">β</a></sup>,
|
| 11 |
+
<a href="https://github.com/Rinke02">Wencan Jiang</a><sup>1</sup>,
|
| 12 |
+
<a href="https://scholar.google.com/citations?user=xiK4nFUAAAAJ&hl=zh-CN">Yabiao Wang</a><sup>2</sup>,
|
| 13 |
+
<a href="https://dk-liang.github.io/">Dingkang Liang</a><sup>3</sup>,
|
| 14 |
+
<a href="https://scholar.google.com/citations?user=m3KDreEAAAAJ&hl=en">Zhucun Xue</a><sup>1</sup>,
|
| 15 |
+
<a href="https://yiranran.github.io/">Ran Yi</a><sup>4</sup>,
|
| 16 |
+
<a href="https://person.zju.edu.cn/yongliu">Yong Liu</a><sup>1</sup>
|
| 17 |
+
</p>
|
| 18 |
+
<p>
|
| 19 |
+
<sup>1</sup>Zhejiang University,
|
| 20 |
+
<sup>2</sup>Tencent Youtu Lab,
|
| 21 |
+
<sup>3</sup>Huazhong University of Science and Technology,<br>
|
| 22 |
+
<sup>4</sup>Shanghai Jiao Tong University
|
| 23 |
+
<sup><a href="mailto:186368@zju.edu.cn">β</a></sup>Corresponding author
|
| 24 |
+
</p>
|
| 25 |
+
</div>
|
| 26 |
+
<p align="center">
|
| 27 |
+
<a href="https://eddie0521.github.io/projects/iamflow/"><img src="https://img.shields.io/badge/Project-Page-Green"></a>
|
| 28 |
+
|
| 29 |
+
<a href="https://arxiv.org/abs/2605.18733"><img src="https://img.shields.io/static/v1?label=arXiv&message=2605.18733&color=red&logo=arxiv"></a>
|
| 30 |
+
|
| 31 |
+
<a href="https://huggingface.co/Eddie0521/IAMFlow-FP8"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-orange"></a>
|
| 32 |
+
</p>
|
| 33 |
+
|
| 34 |
+
## π₯ Updates
|
| 35 |
+
|
| 36 |
+
- __[2026.05.15]__: We release the [github repo](https://github.com/Eddie0521/IAMFlow), the [project page](https://eddie0521.github.io/projects/iamflow/), the quantized [model checkpoints](https://huggingface.co/Eddie0521/IAMFlow-FP8), the [NarraStream-Bench](https://github.com/Eddie0521/NarraStream-Bench), and the [paper](https://arxiv.org/abs/2605.18733).
|
| 37 |
+
|
| 38 |
+
## π· Introduction
|
| 39 |
+
π‘**TL;DR:**
|
| 40 |
+
[IAMFlow](https://arxiv.org/abs/2605.18733) uses explicit identity-aware memory to keep identities consistent across evolving narrative prompts, achieving faster and stronger long video generation on [NarraStream-Bench](https://arxiv.org/abs/2605.18733).
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
## β¨ Highlights
|
| 44 |
+
1. We introduce [**IAMFlow**](https://arxiv.org/abs/2605.18733), a training-free identity-aware memory framework that explicitly organizes historical information around persistent entities and attributes, enabling reliable identity preservation across evolving prompt transitions.
|
| 45 |
+
2. We design a systematic inference acceleration pipeline to make the framework computationally practical, combining asynchronous visual verification, adaptive prompt transition, and model quantization to preserve long-term consistency without sacrificing generation speed.
|
| 46 |
+
3. We introduce [**NarraStream-Bench**](https://arxiv.org/abs/2605.18733), a modern benchmark suite for assessing long-term consistency in narrative streaming video generation. Extensive experiments and ablation studies demonstrate that IAMFlow achieves superior performance across various metrics while enabling more efficient inference.
|
| 47 |
+
|
| 48 |
+
## π οΈ Installation
|
| 49 |
+
### 1. Install Requirements
|
| 50 |
+
|
| 51 |
+
```
|
| 52 |
+
git clone git@github.com:Eddie0521/IAMFlow.git
|
| 53 |
+
cd IAMFlow
|
| 54 |
+
conda create -n iamflow python=3.12 -y
|
| 55 |
+
conda activate iamflow
|
| 56 |
+
|
| 57 |
+
# Install PyTorch first according to your CUDA environment.
|
| 58 |
+
python -m pip install torch==2.9.1 torchvision==0.24.1
|
| 59 |
+
python -m pip install -r requirements.txt
|
| 60 |
+
pip install flash-attn --no-build-isolation
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
### 2. Download Checkpoints
|
| 64 |
+
Download models using hf:
|
| 65 |
+
``` sh
|
| 66 |
+
pip install "huggingface_hub[cli]"
|
| 67 |
+
hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir pretrained/Wan2.1-T2V-1.3B
|
| 68 |
+
hf download Eddie0521/IAMFlow --local-dir pretrained/iamflow_models
|
| 69 |
+
hf download Qwen/Qwen3-VL-2B-Instruct --local-dir pretrained/Qwen3-VL-2B-Instruct
|
| 70 |
+
hf download Qwen/Qwen3-4B-Instruct-2507 --local-dir pretrained/Qwen3-4B-Instruct-2507
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
## π Inference
|
| 74 |
+
We deploy DiT, TextEncoder, and LLM on one GPU, while VAE and VLM are deployed on another GPU.
|
| 75 |
+
|
| 76 |
+
```sh
|
| 77 |
+
bash ./scripts/run_iamflow.sh
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
## π Evaluation & Benchmark
|
| 82 |
+
See the [NarraStream-Bench](https://github.com/Eddie0521/NarraStream-Bench).
|
| 83 |
+
|
| 84 |
+
## π€ Acknowledgement
|
| 85 |
+
- [MemFlow](https://github.com/KlingAIResearch/MemFlow): the codebase we built upon. Thanks for their wonderful work.
|
| 86 |
+
- [Self-Forcing](https://github.com/guandeh17/Self-Forcing): the algorithm we built upon. Thanks for their wonderful work.
|
| 87 |
+
- [Wan](https://github.com/Wan-Video/Wan2.1): the base model we built upon. Thanks for their wonderful work.
|
| 88 |
+
|
| 89 |
+
## π Citation
|
| 90 |
+
Please leave us a star π and cite our paper if you find our work helpful.
|
| 91 |
+
|
| 92 |
+
```
|
| 93 |
+
@misc{liu2026advancingnarrativelongvideo,
|
| 94 |
+
title={Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory},
|
| 95 |
+
author={Jinzhuo Liu and Jiangning Zhang and Wencan Jiang and Yabiao Wang and Dingkang Liang and Zhucun Xue and Ran Yi and Yong Liu},
|
| 96 |
+
year={2026},
|
| 97 |
+
eprint={2605.18733},
|
| 98 |
+
archivePrefix={arXiv},
|
| 99 |
+
primaryClass={cs.CV},
|
| 100 |
+
url={https://arxiv.org/abs/2605.18733},
|
| 101 |
+
}
|
| 102 |
+
```
|