arxiv:2503.19462

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset

Published on Mar 25, 2025

· Submitted by

Haiyu Zhang on Mar 27, 2025

Upvote

Authors:

Haiyu Zhang ,

Xinyuan Chen ,

Yaohui Wang ,

Xihui Liu ,

Abstract

AccVideo accelerates video diffusion models by using a synthetic dataset and trajectory-based guidance, significantly reducing inference steps while maintaining or improving video quality and resolution.

AI-generated summary

Diffusion models have achieved remarkable progress in the field of video generation. However, their iterative denoising nature requires a large number of inference steps to generate a video, which is slow and computationally expensive. In this paper, we begin with a detailed analysis of the challenges present in existing diffusion distillation methods and propose a novel efficient method, namely AccVideo, to reduce the inference steps for accelerating video diffusion models with synthetic dataset. We leverage the pretrained video diffusion model to generate multiple valid denoising trajectories as our synthetic dataset, which eliminates the use of useless data points during distillation. Based on the synthetic dataset, we design a trajectory-based few-step guidance that utilizes key data points from the denoising trajectories to learn the noise-to-video mapping, enabling video generation in fewer steps. Furthermore, since the synthetic dataset captures the data distribution at each diffusion timestep, we introduce an adversarial training strategy to align the output distribution of the student model with that of our synthetic dataset, thereby enhancing the video quality. Extensive experiments demonstrate that our model achieves 8.5x improvements in generation speed compared to the teacher model while maintaining comparable performance. Compared to previous accelerating methods, our approach is capable of generating videos with higher quality and resolution, i.e., 5-seconds, 720x1280, 24fps.

View arXiv page View PDF Project page GitHub 285 auto Add to collection

Community

aejion

Paper author Paper submitter Mar 27, 2025

In this paper, we begin with a detailed analysis of the challenges present in existing diffusion distillation methods and propose a novel efficient method, namely AccVideo, to reduce the inference steps for accelerating video diffusion models with synthetic dataset.

librarian-bot

Mar 28, 2025

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

ayhmshawysh

15 days ago

‏الوصف التفصيلي (النسخة الخامسة - الأيقونية)‏1. نيفراك - Nivrak (اليمين)‏العمر: 12 عاماً.‏البشرة: بيضاء صافية مخملية، مع النمش الدقيق الذي يمنح وجهه تفاصيل سينمائية تحت الضوء.‏الشعر: أسود فاحم، كثيف ومتمرد (Curly) يغطي جزءاً من جبينه بعفوية تعكس شخصيته التي لا تقبل القيود.‏النظرة: عيون لوزية واسعة بحدة ذكاء استثنائية، نظرة ثاقبة "مسيطرة" تعلن عن وجود الزعيم بوضوح.‏2. مجهول - Majhool (اليسار) - [نسخة الهيبة والقوة الصامتة]‏العمر: 11 عاماً.‏الشعر: بني داكن، بقصة "تدرج احترافي" (Deep Fade Cut) من الجوانب والخلف بلمسة حادة، مما يبرز اتساع جبهته ونحت وجهه.‏الفم والهيبة (التعديل القاصم): يتميز مجهول بفم "عريض ومهيب"، ممتد بوقار يعطي لوجهه مساحة من العظمة. الشفتان مرسومتان بدقة، مغلقتان بثبات ينم عن صمتٍ حكيم، لكن عرض الفم واتساعه يمنح ملامحه "سعة وكبرياء"؛ بحيث يبدو وجهه ممتلئاً بالهيبة التي تجبر من يراه على الهرب من قوة حضوره. لا يبتسم، لكن شكل فمه العريض يوحي بالثقة المطلقة والقدرة على القيادة من خلف الستار.‏العينين: عيون بنية فاتحة براقة، نظرة "باردة ووقورة" جداً، هي مصدر الطمأنينة للأصدقاء والرهبة القاتلة للأعداء.‏البشرة: بيضاء كالبورسلين، صافية لدرجة تعكس الضوء، مما يبرز اتساع ملامحه وقوة فكه المنحوت.‏3. الإطار العام للمشهد‏الهالة الجسدية: مجهول ونيفراك يمثلان التوازن؛ نيفراك بحدة عينه وحركة شعره، ومجهول بعرض ملامحه ووقار فمه وهدوئه الجليدي.‏الانطباع: الشخصيتان تظهران كأنهما خرجا من أسطورة قديمة أو منظمة "مجلس الغسق" العريقة، حيث الوقار والجمال والرهبة تجتمع في جسد طفلين.‏الخلفية: ضبابية كئيبة (Cinematic Fog) تزيد من بروز بياض بشرتهما وتألق أعينهما. Scene 4 (Duration: 5 Seconds)Visual: Close-up on Nivrak's face. His dark, controlling eyes are piercing.Dialogue (Fuhsa): "لَقَد تَلَقَّيْتُ الهَدِيَّةَ، أَيُّهَا المَجْهُول..."Action: His lips move slowly; his expression is regal and unwavering. السلوك هدوء مهيب وملامح رواق مليئة بالوقار والقبول. وكلاهما يرتديان بدلات مافيا سوداء بالكامل ومهيبة ولائقه بهم

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2503.19462

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.19462 in a dataset README.md to link it from this page.

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset

Abstract

Community

Models citing this paper 3

Datasets citing this paper 0

Spaces citing this paper 2

Collections including this paper 2