-
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper • 2512.16093 • Published • 97 -
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 222 -
Sharp Monocular View Synthesis in Less Than a Second
Paper • 2512.10685 • Published • 29
Collections
Discover the best community collections!
Collections including paper arxiv:2511.22677
-
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Paper • 2511.22677 • Published • 35 -
DiP: Taming Diffusion Models in Pixel Space
Paper • 2511.18822 • Published • 29 -
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
Paper • 2512.00425 • Published • 53 -
Learning Eigenstructures of Unstructured Data Manifolds
Paper • 2512.01103 • Published • 6
-
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Paper • 2503.09641 • Published • 42 -
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Paper • 2511.22677 • Published • 35 -
One-step Diffusion with Distribution Matching Distillation
Paper • 2311.18828 • Published • 3 -
Improved Distribution Matching Distillation for Fast Image Synthesis
Paper • 2405.14867 • Published • 15
-
Hot Or Not
🏢9Evaluate hotness, beauty, and attractiveness of an image
-
Audioldm Text To Audio Generation
🔊816Generate audio from text descriptions
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.86M • • 5.6k -
Whisper Large V3
🤫827Transcribe or translate audio and YouTube videos to text
-
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models
Paper • 2511.22625 • Published • 48 -
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Paper • 2511.22677 • Published • 35 -
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper • 2512.04677 • Published • 177
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 75 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 118 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 61 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 107
-
Drivable 3D Gaussian Avatars
Paper • 2311.08581 • Published • 47 -
Single-Image 3D Human Digitization with Shape-Guided Diffusion
Paper • 2311.09221 • Published • 22 -
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Paper • 2311.07885 • Published • 40 -
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Paper • 2511.22677 • Published • 35
-
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper • 2512.16093 • Published • 97 -
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 222 -
Sharp Monocular View Synthesis in Less Than a Second
Paper • 2512.10685 • Published • 29
-
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models
Paper • 2511.22625 • Published • 48 -
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Paper • 2511.22677 • Published • 35 -
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper • 2512.04677 • Published • 177
-
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Paper • 2511.22677 • Published • 35 -
DiP: Taming Diffusion Models in Pixel Space
Paper • 2511.18822 • Published • 29 -
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
Paper • 2512.00425 • Published • 53 -
Learning Eigenstructures of Unstructured Data Manifolds
Paper • 2512.01103 • Published • 6
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 75 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 118 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 61 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 107
-
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Paper • 2503.09641 • Published • 42 -
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Paper • 2511.22677 • Published • 35 -
One-step Diffusion with Distribution Matching Distillation
Paper • 2311.18828 • Published • 3 -
Improved Distribution Matching Distillation for Fast Image Synthesis
Paper • 2405.14867 • Published • 15
-
Drivable 3D Gaussian Avatars
Paper • 2311.08581 • Published • 47 -
Single-Image 3D Human Digitization with Shape-Guided Diffusion
Paper • 2311.09221 • Published • 22 -
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Paper • 2311.07885 • Published • 40 -
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Paper • 2511.22677 • Published • 35
-
Hot Or Not
🏢9Evaluate hotness, beauty, and attractiveness of an image
-
Audioldm Text To Audio Generation
🔊816Generate audio from text descriptions
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.86M • • 5.6k -
Whisper Large V3
🤫827Transcribe or translate audio and YouTube videos to text