saeed-5959
/

high_sync

Model card Files Files and versions

saeed-5959 commited on 8 days ago

Commit

7835199

·

verified ·

1 Parent(s): bb2e90f

Create README.md

Files changed (1) hide show

README.md +34 -0

README.md ADDED Viewed

	@@ -0,0 +1,34 @@

+<h1 align='center'>HighSync: High-Quality Lip Synchronization via
+Latent Diffusion Models</h1>
+<div align='center'>
+    <a href='https://github.com/saeed5959' target='_blank'>Saeed Firouzi</a><sup>1</sup>&emsp;
+</div>
+<br>
+<div align='center'>
+    <a href='https://github.com/saeed5959/high_sync'><img src='https://img.shields.io/badge/github-8da0cb?style=for-the-badge&labelColor=555555&logo=github'></a>
+    <a href='https://arxiv.org/abs/2605.16918'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
+    <a href='https://huggingface.co/datasets/saeed-5959/vfhq'><img src='https://img.shields.io/badge/Dataset-Hugging_Face-CFAFD4'></a>
+</div>
+## Abstraction
+We present HighSync, an end-to-end diffusion-based
+framework for high-fidelity lip synchronization that generates
+photorealistic talking-face videos aligned with arbitrary input
+audio. Existing approaches consistently struggle to reconcile
+image quality with synchronization accuracy, producing either
+visually degraded outputs or temporally inconsistent lip move-
+ments. HighSync addresses both challenges simultaneously and,
+to our knowledge, is the first lip sync model to operate natively
+at 512×512 resolution, positioning it as a viable solution for
+professional production environments such as the film and broad-
+cast industries. Central to our approach is the identification and
+systematic elimination of a data leakage phenomenon that has
+silently undermined temporal modeling in prior work, preventing
+models from developing a genuine dependence on the audio
+signal. Comprehensive evaluations across both perceptual quality
+and synchronization accuracy metrics confirm that HighSync
+achieves state-of-the-art performance on both fronts.