Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis Paper • 2402.14797 • Published Feb 22, 2024 • 21
TextCraftor: Your Text Encoder Can be Image Quality Controller Paper • 2403.18978 • Published Mar 27, 2024 • 15
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published Jun 6, 2024 • 38
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation Paper • 2411.04967 • Published Nov 7, 2024 • 1
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training Paper • 2412.09619 • Published Dec 12, 2024 • 31
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device Paper • 2412.10494 • Published Dec 13, 2024 • 2
Sprint: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers Paper • 2510.21986 • Published Oct 24, 2025 • 6
H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models Paper • 2504.10567 • Published Apr 14, 2025 • 2
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization Paper • 2512.10955 • Published Dec 11, 2025 • 7
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices Paper • 2601.08303 • Published Jan 13 • 19
S2DiT: Sandwich Diffusion Transformer for Mobile Streaming Video Generation Paper • 2601.12719 • Published Jan 19 • 1
One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers Paper • 2603.12245 • Published Mar 12 • 18