huaichang commited on 14 days ago

Commit

ed15e42

0 Parent(s):

Duplicate from huaichang/PersonaLive

Browse files

Co-authored-by: Zhiyuan Li <huaichang@users.noreply.huggingface.co>

Files changed (21) hide show

.gitattributes +46 -0
README.md +249 -0
assets/demo_1.gif +3 -0
assets/demo_2.gif +3 -0
assets/demo_3.gif +3 -0
assets/guide.png +3 -0
assets/header.svg +45 -0
assets/highlight.svg +21 -0
assets/overview.png +3 -0
pretrained_weights/.DS_Store +0 -0
pretrained_weights/onnx/.DS_Store +0 -0
pretrained_weights/onnx/unet_opt/unet_opt.onnx +3 -0
pretrained_weights/onnx/unet_opt/unet_opt.onnx.data +3 -0
pretrained_weights/personalive/denoising_unet.pth +3 -0
pretrained_weights/personalive/motion_encoder.pth +3 -0
pretrained_weights/personalive/motion_extractor.pth +3 -0
pretrained_weights/personalive/pose_guider.pth +3 -0
pretrained_weights/personalive/reference_unet.pth +3 -0
pretrained_weights/personalive/temporal_module.pth +3 -0
pretrained_weights/tensorrt/.DS_Store +0 -0
pretrained_weights/tensorrt/unet_work(H100).engine +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,46 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+demo/driving_video.mp4 filter=lfs diff=lfs merge=lfs -text
+demo/ref_image.png filter=lfs diff=lfs merge=lfs -text
+pretrained_weights/onnx/unet_opt/unet_opt.onnx.data filter=lfs diff=lfs merge=lfs -text
+pretrained_weights/tensorrt/unet_work(H100).engine filter=lfs diff=lfs merge=lfs -text
+results/20251209--personalive_offline/concat_vid/ref_image_driving_video.mp4 filter=lfs diff=lfs merge=lfs -text
+results/20251209--personalive_offline/split_vid/ref_image_driving_video.mp4 filter=lfs diff=lfs merge=lfs -text
+assets/demo_1.gif filter=lfs diff=lfs merge=lfs -text
+assets/demo_2.gif filter=lfs diff=lfs merge=lfs -text
+assets/demo_3.gif filter=lfs diff=lfs merge=lfs -text
+assets/overview.png filter=lfs diff=lfs merge=lfs -text
+assets/guide.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,249 @@

+---
+license: apache-2.0
+tags:
+- portrait-animation
+- real-time
+- diffusion
+pipeline_tag: image-to-video
+library_name: diffusers
+---
+<div align="center">
+<h1 align="center" style="font-weight: 900; font-size: 80px; color: #FF6B6B; margin-bottom: 20px;">
+  PersonaLive!
+</h1>
+<h2>Expressive Portrait Image Animation for Live Streaming</h2>
+<a href='https://arxiv.org/abs/2512.11253'><img src='https://img.shields.io/badge/ArXiv-2512.11253-red'></a> <a href='https://huggingface.co/huaichang/PersonaLive'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-ffc107'></a> <a href='https://modelscope.cn/models/huaichang/PersonaLive'><img src='https://img.shields.io/badge/ModelScope-Model-624AFF'></a> [![GitHub](https://img.shields.io/github/stars/GVCLab/PersonaLive?style=social)](https://github.com/GVCLab/PersonaLive)
+[Zhiyuan Li<sup>1,2,3</sup>](https://huai-chang.github.io/) · [Chi-Man Pun<sup>1,📪</sup>](https://cmpun.github.io/) · [Chen Fang<sup>2</sup>](http://fangchen.org/) · [Jue Wang<sup>2</sup>](https://scholar.google.com/citations?user=Bt4uDWMAAAAJ&hl=en) · [Xiaodong Cun<sup>3,📪</sup>](https://vinthony.github.io/academic/)
+<sup>1</sup> University of Macau  &nbsp;&nbsp; <sup>2</sup> [Dzine.ai](https://www.dzine.ai/)  &nbsp;&nbsp; <sup>3</sup> [GVC Lab, Great Bay University](https://gvclab.github.io/)
+<h3 align="center" style="color: #ff4d4d; font-weight: 900; margin-top: 0;">
+  ⚡️ Real-time, Streamable, Infinite-Length ⚡️ <br>
+ ⚡️ Portrait Animation requires only ~12GB VRAM ⚡️
+</h3>
+<table width="100%" align="center" style="border: none;">
+    <tr>
+        <td width="46.5%" align="center" style="border: none;">
+            <img src="assets/demo_3.gif" style="width: 100%;">
+        </td>
+        <td width="41%" align="center" style="border: none;">
+            <img src="assets/demo_2.gif" style="width: 100%;">
+        </td>
+    </tr>
+</table>
+</div>
+## 📋 TODO
+- [ ] If you find PersonaLive useful or interesting, please give us a Star 🌟 on our [GitHub repo](https://github.com/GVCLab/PersonaLive)! Your support drives us to keep improving. 🍻
+- [ ] Fix bugs (If you encounter any issues, please feel free to open an issue or contact me! 🙏)
+- [ ] Enhance WebUI (Support reference image replacement
+- [x] **[2025.12.22]** 🔥 Supported streaming strategy in offline inference to generate long videos on 12GB VRAM!
+- [x] **[2025.12.17]** 🔥 [ComfyUI-PersonaLive](https://github.com/okdalto/ComfyUI-PersonaLive) is now supported! (Thanks to [@okdalto](https://github.com/okdalto))
+- [x] **[2025.12.15]** 🔥 Release `paper`!
+- [x] **[2025.12.12]** 🔥 Release `inference code`, `config`, and `pretrained weights`!
+## ⚙️ Framework
+<img src="assets/overview.png" alt="Image 1" width="100%">
+We present PersonaLive, a `real-time` and `streamable` diffusion framework capable of generating `infinite-length` portrait animations on a single `12GB GPU`.
+## 🚀 Getting Started
+### 🛠 Installation
+```
+# clone this repo
+git clone https://github.com/GVCLab/PersonaLive
+cd PersonaLive
+# Create conda environment
+conda create -n personalive python=3.10
+conda activate personalive
+# Install packages with pip
+pip install -r requirements_base.txt
+```
+### ⏬ Download weights
+Option 1: Download pre-trained weights of base models and other components ([sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) and [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse)). You can run the following command to download weights automatically:
+```bash
+python tools/download_weights.py
+```
+Option 2: Download pre-trained weights into the `./pretrained_weights` folder from one of the below URLs:
+<a href='https://drive.google.com/drive/folders/1GOhDBKIeowkMpBnKhGB8jgEhJt_--vbT?usp=drive_link'><img src='https://img.shields.io/badge/Google%20Drive-5B8DEF?style=for-the-badge&logo=googledrive&logoColor=white'></a> <a href='https://pan.baidu.com/s/1DCv4NvUy_z7Gj2xCGqRMkQ?pwd=gj64'><img src='https://img.shields.io/badge/Baidu%20Netdisk-3E4A89?style=for-the-badge&logo=baidu&logoColor=white'></a> <a href='https://modelscope.cn/models/huaichang/PersonaLive'><img src='https://img.shields.io/badge/ModelScope-624AFF?style=for-the-badge&logo=alibabacloud&logoColor=white'></a> <a href='https://huggingface.co/huaichang/PersonaLive'><img src='https://img.shields.io/badge/HuggingFace-E67E22?style=for-the-badge&logo=huggingface&logoColor=white'></a>
+Finally, these weights should be organized as follows:
+```
+pretrained_weights
+├── onnx
+│   ├── unet_opt
+│   │   ├── unet_opt.onnx
+│   │   └── unet_opt.onnx.data
+│   └── unet
+├── personalive
+│   ├── denoising_unet.pth
+│   ├── motion_encoder.pth
+│   ├── motion_extractor.pth
+│   ├── pose_guider.pth
+│   ├── reference_unet.pth
+│   └── temporal_module.pth
+├── sd-vae-ft-mse
+│   ├── diffusion_pytorch_model.bin
+│   └── config.json
+├── sd-image-variations-diffusers
+│   ├── image_encoder
+│   │   ├── pytorch_model.bin
+│   │   └── config.json
+│   ├── unet
+│   │   ├── diffusion_pytorch_model.bin
+│   │   └── config.json
+│   └── model_index.json
+└── tensorrt
+    └── unet_work.engine
+```
+### 🎞️ Offline Inference
+```
+python inference_offline.py
+```
+⚠️ Note for RTX 50-Series (Blackwell) Users: xformers is not yet fully compatible with the new architecture. To avoid crashes, please disable it by running:
+```
+python inference_offline.py --use_xformers False
+```
+### 📸 Online Inference
+#### 📦 Setup Web UI
+```
+# install Node.js 18+
+curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
+nvm install 18
+cd webcam
+source start.sh
+```
+#### 🏎️ Acceleration (Optional)
+Converting the model to TensorRT can significantly speed up inference (~ 2x ⚡️). Building the engine may take about `20 minutes` depending on your device. Note that TensorRT optimizations may lead to slight variations or a small drop in output quality.
+```
+pip install -r requirements_trt.txt
+python torch2trt.py
+```
+*The provided TensorRT model is from an `H100`. We recommend `ALL users` (including H100 users) re-run `python torch2trt.py` locally to ensure best compatibility.*
+#### ▶️ Start Streaming
+```
+python inference_online.py --acceleration none (for RTX 50-Series) or xformers or tensorrt
+```
+Then open `http://0.0.0.0:7860` in your browser. (*If `http://0.0.0.0:7860` does not work well, try `http://localhost:7860`)
+**How to use**: Upload Image ➡️ Fuse Reference ➡️ Start Animation ➡️ Enjoy! 🎉
+<div align="center">
+  <img src="assets/guide.png" alt="PersonaLive" width="60%">
+</div>
+**Regarding Latency**: Latency varies depending on your device's computing power. You can try the following methods to optimize it:
+1. Lower the "Driving FPS" setting in the WebUI to reduce the computational workload.
+2. You can increase the multiplier (e.g., set to `num_frames_needed * 4` or higher) to better match your device's inference speed. https://github.com/GVCLab/PersonaLive/blob/6953d1a8b409f360a3ee1d7325093622b29f1e22/webcam/util.py#L73
+## 📚 Community Contribution
+Special thanks to the community for providing helpful setups! 🥂
+* **Windows + RTX 50-Series Guide**: Thanks to [@dknos](https://github.com/dknos) for providing a [detailed guide](https://github.com/GVCLab/PersonaLive/issues/10#issuecomment-3662785532) on running this project on Windows with Blackwell GPUs.
+* **TensorRT on Windows**: If you are trying to convert TensorRT models on Windows, [this discussion](https://github.com/GVCLab/PersonaLive/issues/8) might be helpful. Special thanks to [@MaraScott](https://github.com/MaraScott) and [@Jeremy8776](https://github.com/Jeremy8776) for their insights.
+* **ComfyUI**: Thanks to [@okdalto](https://github.com/okdalto) for helping implement the [ComfyUI-PersonaLive](https://github.com/okdalto/ComfyUI-PersonaLive) support.
+* **Useful Scripts**: Thanks to [@suruoxi](https://github.com/suruoxi) for implementing `download_weights.py`, and to [@andchir](https://github.com/andchir) for adding audio merging functionality.
+## 🎬 More Results
+#### 👀 Visualization results
+<table width="100%">
+  <tr>
+    <td width="50%">
+      <video src="https://github.com/user-attachments/assets/cdc885ef-5e1c-4139-987a-2fa50fefd6a4" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+    <td width="50%">
+      <video src="https://github.com/user-attachments/assets/014f7bae-74ce-4f56-8621-24bc76f3c123" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+  </tr>
+</table>
+<table width="100%">
+  <tr>
+    <td width="25%">
+      <video src="https://github.com/user-attachments/assets/1e6a0809-15d2-4cab-ae8f-8cf1728c6281" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+    <td width="25%">
+      <video src="https://github.com/user-attachments/assets/d9cf265d-9db0-4f83-81da-be967bbd5f26" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+    <td width="25%">
+      <video src="https://github.com/user-attachments/assets/86235139-b63e-4f26-b09c-d218466e8e24" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+    <td width="25%">
+      <video src="https://github.com/user-attachments/assets/238785de-3b4c-484e-9ad0-9d90e7962fee" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+  </tr>
+  <tr>
+    <td width="25%">
+      <video src="https://github.com/user-attachments/assets/c71c4717-d528-4a98-b132-2b0ec8cec22d" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+    <td width="25%">
+      <video src="https://github.com/user-attachments/assets/7e11fe71-fd16-4011-a6b2-2dbaf7e343fb" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+    <td width="25%">
+      <video src="https://github.com/user-attachments/assets/f62e2162-d239-4575-9514-34575c16301c" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+    <td width="25%">
+      <video src="https://github.com/user-attachments/assets/813e7fbd-37e9-47d7-a270-59887fafeca5" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+  </tr>
+</table>
+#### 🤺 Comparisons
+<table width="100%">
+  <tr>
+    <td width="100%">
+      <video src="https://github.com/user-attachments/assets/36407cf9-bf82-43ff-9508-a794d223d3f7" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+  </tr>
+  <tr>
+    <td width="100%">
+      <video src="https://github.com/user-attachments/assets/3be99b91-c6a1-4ca4-89e9-8fad42bb9583" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+  </tr>
+  <tr>
+    <td width="100%">
+      <video src="https://github.com/user-attachments/assets/5bd21fe4-96ae-4be6-bf06-a7c476b04ec9" controls="controls" style="max-width: 100%; display: block;"></video>
+    </td>
+  </tr>
+</table>
+## ⭐ Citation
+If you find PersonaLive useful for your research, welcome to cite our work using the following BibTeX:
+```bibtex
+@article{li2025personalive,
+  title={PersonaLive! Expressive Portrait Image Animation for Live Streaming},
+  author={Li, Zhiyuan and Pun, Chi-Man and Fang, Chen and Wang, Jue and Cun, Xiaodong},
+  journal={arXiv preprint arXiv:2512.11253},
+  year={2025}
+}
+```
+## ❤️ Acknowledgement
+This code is mainly built upon [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone), [X-NeMo](https://byteaigc.github.io/X-Portrait2/), [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion), [RAIN](https://pscgylotti.github.io/pages/RAIN/) and [LivePortrait](https://github.com/KlingTeam/LivePortrait), thanks to their invaluable contributions.

assets/demo_1.gif ADDED Viewed

Git LFS Details

SHA256: 0494bf0c7e14df986d93b00b57c30221fafcdb8a13d7702f922ade8adc2b5ad0
Pointer size: 133 Bytes
Size of remote file: 13.8 MB

assets/demo_2.gif ADDED Viewed

Git LFS Details

SHA256: 18361ab35cabf494704b2ca56d8d5a5c217254f1896ede5c4ecd8d6d73f32aef
Pointer size: 133 Bytes
Size of remote file: 10.5 MB

assets/demo_3.gif ADDED Viewed

Git LFS Details

SHA256: a70806e32dde3b6979c69d7c5cc4db687f1f7673a16351be4d221983e3940249
Pointer size: 133 Bytes
Size of remote file: 14.8 MB

assets/guide.png ADDED Viewed

Git LFS Details

SHA256: d2e6c017287d62ac220ec85a69fa05d75dd118db042d4e85f9c306807132c254
Pointer size: 131 Bytes
Size of remote file: 182 kB

assets/header.svg ADDED Viewed

assets/highlight.svg ADDED Viewed

assets/overview.png ADDED Viewed

Git LFS Details

SHA256: 03439f3547913f335be5807fc6c341635f447ff9bf6c675fb6c5fa695a8ad820
Pointer size: 132 Bytes
Size of remote file: 1.07 MB

pretrained_weights/.DS_Store ADDED Viewed

Binary file (8.2 kB). View file

pretrained_weights/onnx/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

pretrained_weights/onnx/unet_opt/unet_opt.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:484aee7e8c45cddaac227b6ad331a88a77121dee0886f2152cc4bd0e9974b6fa
+size 96224343

pretrained_weights/onnx/unet_opt/unet_opt.onnx.data ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa08ee8770f202be841e00f2bb94809c2ca6ca95ad8663c2917c4c6fa35d963e
+size 3593537864

pretrained_weights/personalive/denoising_unet.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d0446c4d2387f259d5f3c1ac54a5aefa93400f4672f942856bff2538df046162
+size 4927015578

pretrained_weights/personalive/motion_encoder.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff7c6b0a84cd750046e7687f7a6f6bbc21317055bfcacef950ed347debae4d2c
+size 246719031

pretrained_weights/personalive/motion_extractor.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:251e6a94ad667a1d0c69526d292677165110ef7f0cf0f6d199f0e414e8aa0ca5
+size 112545506

pretrained_weights/personalive/pose_guider.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b997db63343a6a5d489778172d9544bcccaf27e6756505dc6353d84e877269d
+size 4351790

pretrained_weights/personalive/reference_unet.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:85eb03e6c34fab69f9246ff14b3016789232e56dc4892d0581fea21a3a8480f6
+size 3438324340

pretrained_weights/personalive/temporal_module.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:295e8942a453adb48756432d99de103ecba9b840b5b8f6635a0687311cdff30e
+size 1817903018

pretrained_weights/tensorrt/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

pretrained_weights/tensorrt/unet_work(H100).engine ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:34bd6f7693300be8cf72a099f1160bfaedab7a677bcaf66f18ee33a5b871de50
+size 3697605036