Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.07966

Video Generation

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Paper • 2510.02283 • Published Oct 2, 2025 • 98
Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6, 2025 • 120
LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 189
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10, 2025 • 130

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 162

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30, 2025 • 90
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22, 2025 • 37
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30, 2025 • 82
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1, 2025 • 254

Images are Worth Variable Length of Representations

Paper • 2506.03643 • Published Jun 4, 2025 • 4
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10, 2025 • 33
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 162

Vision Reasoning

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21, 2025 • 53
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21, 2025 • 13
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22, 2025 • 11
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22, 2025 • 12

Boost AI's Long ability, while keeping Efficient. Models in this collection includes LongVILA, LongVILA-R1, LongLive.

Efficient-Large-Model/LongVILA-R1-7B

Updated Jul 31, 2025 • 235 • 15
Efficient-Large-Model/qwen2-7b-longvila-256f

Updated Nov 28, 2024 • 1
Efficient-Large-Model/qwen2-1.5b-longvila-256f

Updated Nov 28, 2024 • 10
Efficient-Large-Model/qwen2-7b-longvila-1M

Updated Jan 14, 2025 • 2 • 2

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 162
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Paper • 2507.14111 • Published Jul 18, 2025 • 25
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 15

vrgamedevgirl84/Wan14BT2VFusioniX

Text-to-Video • Updated Jun 21, 2025 • 605
TheStageAI/Elastic-mochi-1-preview

Text-to-Video • Updated Oct 12, 2025 • 23 • 2
nesaorg/animatediff-base

Text-to-Video • Updated Jun 22, 2025 • 132
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

Paper • 2506.18839 • Published Jun 18, 2025 • 13

reinforcement-learning

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4, 2025 • 48
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10, 2025 • 30
RLPR: Extrapolating RLVR to General Domains without Verifiers

Paper • 2506.18254 • Published Jun 23, 2025 • 33
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7, 2025 • 75

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18, 2025 • 7
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 158
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19, 2025 • 89

Video Generation

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Paper • 2510.02283 • Published Oct 2, 2025 • 98
Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6, 2025 • 120
LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 189
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10, 2025 • 130

Boost AI's Long ability, while keeping Efficient. Models in this collection includes LongVILA, LongVILA-R1, LongLive.

Efficient-Large-Model/LongVILA-R1-7B

Updated Jul 31, 2025 • 235 • 15
Efficient-Large-Model/qwen2-7b-longvila-256f

Updated Nov 28, 2024 • 1
Efficient-Large-Model/qwen2-1.5b-longvila-256f

Updated Nov 28, 2024 • 10
Efficient-Large-Model/qwen2-7b-longvila-1M

Updated Jan 14, 2025 • 2 • 2

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 162

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 162
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Paper • 2507.14111 • Published Jul 18, 2025 • 25
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 15

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30, 2025 • 90
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22, 2025 • 37
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30, 2025 • 82
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1, 2025 • 254

vrgamedevgirl84/Wan14BT2VFusioniX

Text-to-Video • Updated Jun 21, 2025 • 605
TheStageAI/Elastic-mochi-1-preview

Text-to-Video • Updated Oct 12, 2025 • 23 • 2
nesaorg/animatediff-base

Text-to-Video • Updated Jun 22, 2025 • 132
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

Paper • 2506.18839 • Published Jun 18, 2025 • 13

Images are Worth Variable Length of Representations

Paper • 2506.03643 • Published Jun 4, 2025 • 4
PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10, 2025 • 33
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 162

reinforcement-learning

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4, 2025 • 48
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10, 2025 • 30
RLPR: Extrapolating RLVR to General Domains without Verifiers

Paper • 2506.18254 • Published Jun 23, 2025 • 33
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7, 2025 • 75

Vision Reasoning

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21, 2025 • 53
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21, 2025 • 13
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22, 2025 • 11
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22, 2025 • 12

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18, 2025 • 7
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 158
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19, 2025 • 89

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs