Add model card for PoseGen (#1)
Browse files- Add model card for PoseGen (d7311ac7088f7dc3ec31f1fe8c22f927da35ee53)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
ADDED
|
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: image-to-video
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# PoseGen: In-Context LoRA Finetuning for Pose-Controllable Long Human Video Generation
|
| 6 |
+
|
| 7 |
+
[**Paper**](https://huggingface.co/papers/2508.05091) | [**Project Page**](https://jessie459.github.io/PoseGen-Page/) | [**GitHub**](https://github.com/Jessie459/PoseGen)
|
| 8 |
+
|
| 9 |
+
**PoseGen** is a novel framework that generates temporally coherent, long-duration human videos from a single reference image and a driving video. It utilizes an in-context LoRA finetuning design to preserve identity fidelity and a segment-interleaved generation strategy to maintain consistency across extended durations.
|
| 10 |
+
|
| 11 |
+
<p align="center">
|
| 12 |
+
<img src="https://github.com/Jessie459/PoseGen/raw/main/assets/teaser.png" alt="PoseGen Teaser">
|
| 13 |
+
</p>
|
| 14 |
+
|
| 15 |
+
## Installation
|
| 16 |
+
|
| 17 |
+
```bash
|
| 18 |
+
conda create -n posegen python=3.10 -y
|
| 19 |
+
conda activate posegen
|
| 20 |
+
|
| 21 |
+
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia -y
|
| 22 |
+
pip install diffsynth==1.1.7
|
| 23 |
+
pip install wan@git+https://github.com/Wan-Video/Wan2.1
|
| 24 |
+
pip install -r requirements.txt
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
## Inference
|
| 28 |
+
|
| 29 |
+
To run inference, you need to prepare a reference image and a driving video. The process consists of extracting conditions, preparing a prompt, and generating the video chunks.
|
| 30 |
+
|
| 31 |
+
### 1. Extract conditions
|
| 32 |
+
Extract pose and hand conditions from the driving video:
|
| 33 |
+
```bash
|
| 34 |
+
python prepare_input_pose.py \
|
| 35 |
+
--pose_path "results/video1/sapiens/pose.pkl" \
|
| 36 |
+
--output_dir "results/video1/inputs" \
|
| 37 |
+
--video_path "examples/video1.mp4"
|
| 38 |
+
|
| 39 |
+
python prepare_input_hand.py \
|
| 40 |
+
--normal_path "results/video1/sapiens/normal.npy" \
|
| 41 |
+
--seg_path "results/video1/sapiens/seg.npy" \
|
| 42 |
+
--output_dir "results/video1/inputs" \
|
| 43 |
+
--video_path "examples/video1.mp4"
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
### 2. Generate Video Chunks
|
| 47 |
+
Generate the anchor base chunk (which stabilizes the background for the rest of the sequence):
|
| 48 |
+
```bash
|
| 49 |
+
python inference.py \
|
| 50 |
+
--mode anch \
|
| 51 |
+
--image_path "examples/image1.png" \
|
| 52 |
+
--prompt_path "results/image1/prompt.txt" \
|
| 53 |
+
--hand_path "results/video1/inputs/hand.mp4" \
|
| 54 |
+
--pose_path "results/video1/inputs/pose.mp4" \
|
| 55 |
+
--output_dir "results/generated" \
|
| 56 |
+
--seed 42 \
|
| 57 |
+
--anch_chunk_idx 0 \
|
| 58 |
+
-s 0 2 \
|
| 59 |
+
-b 34 40 \
|
| 60 |
+
-p "4*10**9"
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
Refer to the [GitHub README](https://github.com/Jessie459/PoseGen) for full details on generating subsequent base chunks and merging them.
|
| 64 |
+
|
| 65 |
+
## Acknowledgement
|
| 66 |
+
This project is built upon [Sapiens](https://github.com/facebookresearch/sapiens), [Wan2.1](https://github.com/Wan-Video/Wan2.1), and [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio).
|
| 67 |
+
|
| 68 |
+
## Citation
|
| 69 |
+
|
| 70 |
+
```bibtex
|
| 71 |
+
@article{he2025posegen,
|
| 72 |
+
title={PoseGen: In-Context LoRA Finetuning for Pose-Controllable Long Human Video Generation},
|
| 73 |
+
author={He, Jingxuan and Su, Busheng and Wong, Finn},
|
| 74 |
+
journal={arXiv preprint arXiv:2508.05091},
|
| 75 |
+
year={2025}
|
| 76 |
+
}
|
| 77 |
+
```
|