NovaEdit / README.md

Improve model card and add paper link (#1)

9312170 about 2 months ago

2.45 kB

license: apache-2.0
pipeline_tag: image-to-video
tags:
  - video-editing
  - diffusion
  - wan

NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing

CVPR 2026

Overview

NOVA is a pair-free video editing model built on WAN 1.3B Fun InP. It uses sparse keyframe control (e.g., a single edited first frame) to guide dense video synthesis, trained without requiring paired before/after video data.

Pair-free training via degradation simulation
Sparse keyframe control: provide one or more edited keyframes
Optional coarse mask for improved editing accuracy

The framework consists of a sparse branch providing semantic guidance through user-edited keyframes and a dense branch that incorporates motion and texture information from the original video to maintain high fidelity and coherence.

Usage

For full installation and training instructions, please visit the GitHub repository.

Inference via CLI

You can run inference using the infer_nova.py script. Below is an example for single GPU inference:

python infer_nova.py \
  --dataset_path ./example_videos \
  --metadata_file_name metadata.csv \
  --ckpt_path /path/to/checkpoints/stepXXX.ckpt \
  --output_path ./inference_results \
  --text_encoder_path /path/to/models_t5_umt5-xxl-enc-bf16.pth \
  --image_encoder_path /path/to/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
  --vae_path /path/to/Wan2.1_VAE.pth \
  --dit_path /path/to/diffusion_pytorch_model.safetensors \
  --num_samples 5 \
  --num_inference_steps 50 \
  --num_frames 81 \
  --height 480 \
  --width 832 \
  --first_only

Citation

@article{pan2026nova,
  title={NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing},
  author={Tianlin Pan and Jiayi Dai and Chenpu Yuan and Zhengyao Lv and Binxin Yang and Hubery Yin and Chen Li and Jing Lyu and Caifeng Shan and Chenyang Si},
  journal={arXiv preprint arXiv:2603.02802},
  year={2026}
}

ldiex
/

NovaEdit

NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing

Overview

Usage

Inference via CLI

Citation

Acknowledgements