-
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper • 2512.24618 • Published • 154 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66 -
Yume-1.5: A Text-Controlled Interactive World Generation Model
Paper • 2512.22096 • Published • 61 -
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion
Paper • 2512.23709 • Published • 51
Collections
Discover the best community collections!
Collections including paper arxiv:2512.23576
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper • 2512.16093 • Published • 97 -
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 222 -
Sharp Monocular View Synthesis in Less Than a Second
Paper • 2512.10685 • Published • 29
-
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance
Paper • 2412.05355 • Published • 8 -
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
Paper • 2412.04301 • Published • 40 -
PanoDreamer: 3D Panorama Synthesis from a Single Image
Paper • 2412.04827 • Published • 10 -
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Paper • 2412.06781 • Published • 23
-
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper • 2512.24618 • Published • 154 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66 -
Yume-1.5: A Text-Controlled Interactive World Generation Model
Paper • 2512.22096 • Published • 61 -
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion
Paper • 2512.23709 • Published • 51
-
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper • 2512.16093 • Published • 97 -
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 222 -
Sharp Monocular View Synthesis in Less Than a Second
Paper • 2512.10685 • Published • 29
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance
Paper • 2412.05355 • Published • 8 -
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
Paper • 2412.04301 • Published • 40 -
PanoDreamer: 3D Panorama Synthesis from a Single Image
Paper • 2412.04827 • Published • 10 -
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Paper • 2412.06781 • Published • 23