--- license: other license_name: qualcomm-ai-hub-proprietary-license license_link: >- https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf pipeline_tag: text-to-video tags: - efficient - mobile video generation - dit - pyramidal diffusion language: - en base_model: - qualcomm/Neodragon ---
We introduce Neodragon, a text-to-video system capable of generating 2s (49 frames @24 fps) videos
at a resolution of [640×1024] directly on a Qualcomm Hexagon NPU in a
record ~6.7s (7 FPS). Differing from existing transformer-based offline text-to-video
generation models, Neodragon is the first to have been specifically optimized for mobile
hardware to achieve efficient, low-cost, and high-fidelity video synthesis.
When paired with an optimized SSD1B first-frame image generator and QuickSRNet for 2× super-resolution, our end-to-end Neodragon system becomes a highly parameter (4.945B full model), memory (3.5GB peak RAM usage), and runtime (6.7s E2E latency) efficient mobile-friendly model, while achieving a VBench total score of 81.61, yielding high-fidelity generated videos.
By enabling low-cost, private, and on-device text-to-video synthesis, Neodragon democratizes AI-based video content creation, empowering creators to generate high-quality videos without reliance on cloud services.
Inference code is available at: https://github.com/qualcomm-ai-research/neodragon