Saachu
/

Image-to-Video
Diffusers
English
video generation

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Paper on Hugging Face Project Page SVI on GitHub SVI Dataset SVI Models

🎯 About This Repository

Stable-Video-Infinity(SVI) is able to generate ANY-length videos with high temporal consistency, plausible scene transitions, and controllable streaming storylines in ANY domains. This repository contains the model weights of SVI Family.

🌟 Key Highlights

  • OpenSVI: Everything is open-sourced: training & evaluation scripts, datasets, and more.
  • Infinite Length: No inherent limit on video duration; generate arbitrarily long stories (see the 10‑minute β€œTom and Jerry” demo).
  • Versatile: Supports diverse in-the-wild generation tasks: multi-scene short films, single‑scene animations, skeleton-/audio-conditioned generation, cartoons, and more.
  • Efficient: Only LoRA adapters are tuned, requiring very little training data: anyone can make their own SVI easily.

πŸ“¦ Resources

Model Task Input Output Hugging Face Link Comments
ALL Infinite possibility Image + X X video πŸ€— Folder Family bucket! I want to play with all!
SVI-Shot Single-scene generation Image + Text prompt Long video πŸ€— Model Generate consistent long video with 1 text prompt. (This will never drift)
SVI-Film Multi-scene generation Image + Text prompt stream Film-style video πŸ€— Model Generate creative long video with 1 text prompt stream (5 second per text).
SVI-Film (Transition) Multi-scene generation Image + Text prompt stream Film-style video πŸ€— Model Generate creative long video with 1 text prompt stream. (More scene transitions due to the training data)
SVI-Tom&Jerry Cartoon animation Image Cartoon video πŸ€— Model Generate creative long cartoon videos with 1 text prompt stream (This will never drift in our 20 min test)
SVI-Talk Talking head Image + Audio Talking video πŸ€— Model Generate long videos with audio-conditioned human speaking
SVI-Dance Dancing animation Image + Skeleton Dance video πŸ€— Model Generate long videos with skeleton-conditioned human dancing

Note: If you want to play with T2V, you can directly use SVI with an image generated by any T2I model!

πŸ“ Citation

If you find our work helpful for your research, please consider citing our paper. Thank you so much!

@article{li2025stable,
      title={Stable Video Infinity: Infinite-Length Video Generation with Error Recycling}, 
      author={Wuyang Li and Wentao Pan and Po-Chien Luan and Yang Gao and Alexandre Alahi},
      journal={arXiv preprint arXiv: arXiv:2510.09212},
      year={2025},
      url={https://huggingface.co/papers/2510.09212},
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Saachu/svi-model

Paper for Saachu/svi-model