Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.21539

Continual Learning - VLA

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11, 2025 • 254
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Paper • 2510.01623 • Published Oct 2, 2025 • 12
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 238
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Paper • 2511.09515 • Published Nov 12, 2025 • 20

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5, 2025 • 11
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published Oct 1, 2025 • 67
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22, 2025 • 53

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24, 2025 • 28
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 158
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 12
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning

Paper • 2504.09081 • Published Apr 12, 2025 • 16
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17, 2025 • 20
Sekai: A Video Dataset towards World Exploration

Paper • 2506.15675 • Published Jun 18, 2025 • 66
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25, 2025 • 31
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18, 2025 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3, 2025 • 25
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published Jul 2, 2025 • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published Jul 1, 2025 • 24

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

Dynamic View Synthesis as an Inverse Problem

Paper • 2506.08004 • Published Jun 9, 2025 • 5
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

Paper • 2506.17201 • Published Jun 20, 2025 • 55
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations

Paper • 2504.07830 • Published Apr 10, 2025 • 18
WORLDMEM: Long-term Consistent World Simulation with Memory

Paper • 2504.12369 • Published Apr 16, 2025 • 35
Towards a Unified Copernicus Foundation Model for Earth Vision

Paper • 2503.11849 • Published Mar 14, 2025 • 5
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Paper • 2506.18903 • Published Jun 23, 2025 • 22

FM4 EmbodiedAI/Robotics/DecisionMaking

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Paper • 2502.16707 • Published Feb 23, 2025 • 14
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

Continual Learning - VLA

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11, 2025 • 254
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Paper • 2510.01623 • Published Oct 2, 2025 • 12
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 238
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Paper • 2511.09515 • Published Nov 12, 2025 • 20

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3, 2025 • 25
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published Jul 2, 2025 • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published Jul 1, 2025 • 24

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5, 2025 • 11
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published Oct 1, 2025 • 67
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22, 2025 • 53

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24, 2025 • 28
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 158
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 12
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

Dynamic View Synthesis as an Inverse Problem

Paper • 2506.08004 • Published Jun 9, 2025 • 5
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

Paper • 2506.17201 • Published Jun 20, 2025 • 55
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning

Paper • 2504.09081 • Published Apr 12, 2025 • 16
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17, 2025 • 20
Sekai: A Video Dataset towards World Exploration

Paper • 2506.15675 • Published Jun 18, 2025 • 66
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations

Paper • 2504.07830 • Published Apr 10, 2025 • 18
WORLDMEM: Long-term Consistent World Simulation with Memory

Paper • 2504.12369 • Published Apr 16, 2025 • 35
Towards a Unified Copernicus Foundation Model for Earth Vision

Paper • 2503.11849 • Published Mar 14, 2025 • 5
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Paper • 2506.18903 • Published Jun 23, 2025 • 22

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25, 2025 • 31
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18, 2025 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

FM4 EmbodiedAI/Robotics/DecisionMaking

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Paper • 2502.16707 • Published Feb 23, 2025 • 14
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs