-
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Paper • 2511.08521 • Published • 39 -
Black-Box On-Policy Distillation of Large Language Models
Paper • 2511.10643 • Published • 52 -
Depth Anything 3: Recovering the Visual Space from Any Views
Paper • 2511.10647 • Published • 101 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 37
Collections
Discover the best community collections!
Collections including paper arxiv:2511.08521
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 20 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Paper • 2412.11100 • Published • 7 -
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Paper • 2412.09856 • Published • 11 -
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Paper • 2412.09349 • Published • 8 -
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
Paper • 2412.04448 • Published • 10
-
Qwen Image Edit Next Scene
🎥162Fast 4 step inference with Qwen Image Edit 2509
-
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation
Paper • 2506.18899 • Published • 6 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 30 -
Shakker-Labs/FilmPortrait
Text-to-Image • Updated • 113 • 224
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 31 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 29 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 34 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
-
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Paper • 2511.08521 • Published • 39 -
Black-Box On-Policy Distillation of Large Language Models
Paper • 2511.10643 • Published • 52 -
Depth Anything 3: Recovering the Visual Space from Any Views
Paper • 2511.10647 • Published • 101 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 37
-
Qwen Image Edit Next Scene
🎥162Fast 4 step inference with Qwen Image Edit 2509
-
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation
Paper • 2506.18899 • Published • 6 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 30 -
Shakker-Labs/FilmPortrait
Text-to-Image • Updated • 113 • 224
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 20 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 31 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Paper • 2412.11100 • Published • 7 -
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Paper • 2412.09856 • Published • 11 -
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Paper • 2412.09349 • Published • 8 -
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
Paper • 2412.04448 • Published • 10
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 29 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 34 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7