Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.08521

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11, 2025 • 39
Black-Box On-Policy Distillation of Large Language Models

Paper • 2511.10643 • Published Nov 13, 2025 • 52
Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published Nov 13, 2025 • 101
VGGT: Visual Geometry Grounded Transformer

Paper • 2503.11651 • Published Mar 14, 2025 • 37

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 20
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

Video Generation

Video Generation

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Paper • 2412.11100 • Published Dec 15, 2024 • 7
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

Paper • 2412.09856 • Published Dec 13, 2024 • 11
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Paper • 2412.09349 • Published Dec 12, 2024 • 8
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation

Paper • 2412.04448 • Published Dec 5, 2024 • 10

media-generated

Paused

Agents

Featured

162

Qwen Image Edit Next Scene

🎥

162

Fast 4 step inference with Qwen Image Edit 2509
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation

Paper • 2506.18899 • Published Jun 23, 2025 • 6
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 30
Shakker-Labs/FilmPortrait

Text-to-Image • Updated Sep 9, 2024 • 113 • 224

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25, 2025 • 31
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18, 2025 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Paper • 2402.15506 • Published Feb 23, 2024 • 18
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

Paper • 2404.03648 • Published Apr 4, 2024 • 29
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Paper • 2405.19893 • Published May 30, 2024 • 34
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Paper • 2405.19888 • Published May 30, 2024 • 7

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11, 2025 • 39
Black-Box On-Policy Distillation of Large Language Models

Paper • 2511.10643 • Published Nov 13, 2025 • 52
Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published Nov 13, 2025 • 101
VGGT: Visual Geometry Grounded Transformer

Paper • 2503.11651 • Published Mar 14, 2025 • 37

media-generated

Paused

Agents

Featured

162

Qwen Image Edit Next Scene

🎥

162

Fast 4 step inference with Qwen Image Edit 2509
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation

Paper • 2506.18899 • Published Jun 23, 2025 • 6
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 30
Shakker-Labs/FilmPortrait

Text-to-Image • Updated Sep 9, 2024 • 113 • 224

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 20
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25, 2025 • 31
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18, 2025 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

Video Generation

Video Generation

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Paper • 2412.11100 • Published Dec 15, 2024 • 7
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

Paper • 2412.09856 • Published Dec 13, 2024 • 11
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Paper • 2412.09349 • Published Dec 12, 2024 • 8
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation

Paper • 2412.04448 • Published Dec 5, 2024 • 10

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Paper • 2402.15506 • Published Feb 23, 2024 • 18
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

Paper • 2404.03648 • Published Apr 4, 2024 • 29
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Paper • 2405.19893 • Published May 30, 2024 • 34
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Paper • 2405.19888 • Published May 30, 2024 • 7

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs