# Multi View

Multi-view face as condition for human-centric video generation.
## ✔️ TODO List

- [x] Base code for single-shot video generation
- [x] Spliting RoPE for video and reference images
- [ ] Face selecting router
- [ ] Supporting multi-shot video generation
  - [ ] Four level shot RoPE
  - [ ] Inter-shot self-attention and frame-pack based intra-shot attention

## 🚀 Training

```bash
bash train.sh
```
## 🚀 Inference

```bash
bash test.sh
```
## ⚙️ Configuration
```bash
YAML:
train_args:
  max_checkpoints_to_keep: 3
  resume_from_checkpoint: True
  seed: 42
  save_steps: 150
  save_epoches: 1
  batch_size: 8

  visual_log_project_name: Wan2.2_5B-Multi_view-normal_rope_384_640-3ref
  output_path: /root/paddlejob/workspace/qizipeng/baidu/personal-code/Multi-view/multi_view/ckpts
  local_model_path: /root/paddlejob/workspace/qizipeng/wanx_pretrainedmodels

  zero_face_ratio: 0.1
  split_rope: False
  split1: False
  split2: False
  split3: False

infer_args:
  infer_step: 1350
  epoch_id: 17

dataset_args:
  base_path: /root/paddlejob/workspace/qizipeng/baidu/personal-code/Multi-view/multi_view/datasets/merged_wangpan_artgrid_taobao_visionchina_123rf_nasuyun_xinpianchang_disk.json

  height: 384
  width: 640

  num_frames: 81
  ref_num: 3

```
---