Instructions to use saeed-5959/high_sync with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use saeed-5959/high_sync with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("saeed-5959/high_sync", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<h1 align='center'>HighSync: High-Quality Lip Synchronization via
|
| 2 |
+
Latent Diffusion Models</h1>
|
| 3 |
+
|
| 4 |
+
<div align='center'>
|
| 5 |
+
<a href='https://github.com/saeed5959' target='_blank'>Saeed Firouzi</a><sup>1</sup> 
|
| 6 |
+
</div>
|
| 7 |
+
|
| 8 |
+
<br>
|
| 9 |
+
|
| 10 |
+
<div align='center'>
|
| 11 |
+
<a href='https://github.com/saeed5959/high_sync'><img src='https://img.shields.io/badge/github-8da0cb?style=for-the-badge&labelColor=555555&logo=github'></a>
|
| 12 |
+
<a href='https://arxiv.org/abs/2605.16918'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
|
| 13 |
+
<a href='https://huggingface.co/datasets/saeed-5959/vfhq'><img src='https://img.shields.io/badge/Dataset-Hugging_Face-CFAFD4'></a>
|
| 14 |
+
</div>
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
## Abstraction
|
| 18 |
+
We present HighSync, an end-to-end diffusion-based
|
| 19 |
+
framework for high-fidelity lip synchronization that generates
|
| 20 |
+
photorealistic talking-face videos aligned with arbitrary input
|
| 21 |
+
audio. Existing approaches consistently struggle to reconcile
|
| 22 |
+
image quality with synchronization accuracy, producing either
|
| 23 |
+
visually degraded outputs or temporally inconsistent lip move-
|
| 24 |
+
ments. HighSync addresses both challenges simultaneously and,
|
| 25 |
+
to our knowledge, is the first lip sync model to operate natively
|
| 26 |
+
at 512×512 resolution, positioning it as a viable solution for
|
| 27 |
+
professional production environments such as the film and broad-
|
| 28 |
+
cast industries. Central to our approach is the identification and
|
| 29 |
+
systematic elimination of a data leakage phenomenon that has
|
| 30 |
+
silently undermined temporal modeling in prior work, preventing
|
| 31 |
+
models from developing a genuine dependence on the audio
|
| 32 |
+
signal. Comprehensive evaluations across both perceptual quality
|
| 33 |
+
and synchronization accuracy metrics confirm that HighSync
|
| 34 |
+
achieves state-of-the-art performance on both fronts.
|