Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory

Jinzhuo Liu¹, Jiangning Zhang^1✉, Wencan Jiang¹, Yabiao Wang², Dingkang Liang³, Zhucun Xue¹, Ran Yi⁴, Yong Liu¹

¹Zhejiang University, ²Tencent Youtu Lab, ³Huazhong University of Science and Technology,
⁴Shanghai Jiao Tong University ^✉Corresponding author

## 🔥 Updates - __[2026.05.15]__: We release the [github repo](https://github.com/Eddie0521/IAMFlow), the [project page](https://eddie0521.github.io/projects/iamflow/), the quantized [model checkpoints](https://huggingface.co/Eddie0521/IAMFlow-FP8), the [NarraStream-Bench](https://github.com/Eddie0521/NarraStream-Bench), and the [paper](https://arxiv.org/abs/2605.18733). ## 📷 Introduction 💡**TL;DR:** [IAMFlow](https://arxiv.org/abs/2605.18733) uses explicit identity-aware memory to keep identities consistent across evolving narrative prompts, achieving faster and stronger long video generation on [NarraStream-Bench](https://arxiv.org/abs/2605.18733). ## ✨ Highlights 1. We introduce [**IAMFlow**](https://arxiv.org/abs/2605.18733), a training-free identity-aware memory framework that explicitly organizes historical information around persistent entities and attributes, enabling reliable identity preservation across evolving prompt transitions. 2. We design a systematic inference acceleration pipeline to make the framework computationally practical, combining asynchronous visual verification, adaptive prompt transition, and model quantization to preserve long-term consistency without sacrificing generation speed. 3. We introduce [**NarraStream-Bench**](https://arxiv.org/abs/2605.18733), a modern benchmark suite for assessing long-term consistency in narrative streaming video generation. Extensive experiments and ablation studies demonstrate that IAMFlow achieves superior performance across various metrics while enabling more efficient inference. ## 🛠️ Installation ### 1. Install Requirements ``` git clone git@github.com:Eddie0521/IAMFlow.git cd IAMFlow conda create -n iamflow python=3.12 -y conda activate iamflow # Install PyTorch first according to your CUDA environment. python -m pip install torch==2.9.1 torchvision==0.24.1 python -m pip install -r requirements.txt pip install flash-attn --no-build-isolation ``` ### 2. Download Checkpoints Download models using hf: ``` sh pip install "huggingface_hub[cli]" hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir pretrained/Wan2.1-T2V-1.3B hf download Eddie0521/IAMFlow --local-dir pretrained/iamflow_models hf download Qwen/Qwen3-VL-2B-Instruct --local-dir pretrained/Qwen3-VL-2B-Instruct hf download Qwen/Qwen3-4B-Instruct-2507 --local-dir pretrained/Qwen3-4B-Instruct-2507 ``` ## 🔑 Inference We deploy DiT, TextEncoder, and LLM on one GPU, while VAE and VLM are deployed on another GPU. ```sh bash ./scripts/run_iamflow.sh ``` ## 📏 Evaluation & Benchmark See the [NarraStream-Bench](https://github.com/Eddie0521/NarraStream-Bench). ## 🤗 Acknowledgement - [MemFlow](https://github.com/KlingAIResearch/MemFlow): the codebase we built upon. Thanks for their wonderful work. - [Self-Forcing](https://github.com/guandeh17/Self-Forcing): the algorithm we built upon. Thanks for their wonderful work. - [Wan](https://github.com/Wan-Video/Wan2.1): the base model we built upon. Thanks for their wonderful work. ## 🌟 Citation Please leave us a star 🌟 and cite our paper if you find our work helpful. ``` @misc{liu2026advancingnarrativelongvideo, title={Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory}, author={Jinzhuo Liu and Jiangning Zhang and Wencan Jiang and Yabiao Wang and Dingkang Liang and Zhucun Xue and Ran Yi and Yong Liu}, year={2026}, eprint={2605.18733}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2605.18733}, } ```