Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published 5 days ago • 41
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published 6 days ago • 272
AvatarPointillist: AutoRegressive 4D Gaussian Avatarization Paper • 2604.04787 • Published 9 days ago • 12
Token Warping Helps MLLMs Look from Nearby Viewpoints Paper • 2604.02870 • Published 12 days ago • 33
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published 19 days ago • 154
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published 19 days ago • 155
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published 19 days ago • 52
AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published 20 days ago • 26
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 20 days ago • 96
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 22 days ago • 123
GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering Paper • 2603.15616 • Published 29 days ago • 5
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 29 days ago • 153
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation Paper • 2603.11647 • Published Mar 12 • 31
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation Paper • 2603.11421 • Published Mar 12 • 34
ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA Paper • 2603.10256 • Published Mar 10 • 22