Collections
Discover the best community collections!
Collections including paper arxiv:2601.15282
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 40 -
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Paper • 2509.05263 • Published • 11 -
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
Paper • 2510.00406 • Published • 67 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 53
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published • 1 -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control
Paper • 2508.21112 • Published • 78 -
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 254 -
Robot Learning: A Tutorial
Paper • 2510.12403 • Published • 129 -
Rethinking Video Generation Model for the Embodied World
Paper • 2601.15282 • Published • 45
-
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
Paper • 2506.06205 • Published • 30 -
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
Paper • 2506.07530 • Published • 20 -
Ark: An Open-source Python-based Framework for Robot Learning
Paper • 2506.21628 • Published • 16 -
RoboBrain 2.0 Technical Report
Paper • 2507.02029 • Published • 35
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
-
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control
Paper • 2508.21112 • Published • 78 -
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 254 -
Robot Learning: A Tutorial
Paper • 2510.12403 • Published • 129 -
Rethinking Video Generation Model for the Embodied World
Paper • 2601.15282 • Published • 45
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 40 -
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Paper • 2509.05263 • Published • 11 -
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
Paper • 2510.00406 • Published • 67 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 53
-
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
Paper • 2506.06205 • Published • 30 -
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
Paper • 2506.07530 • Published • 20 -
Ark: An Open-source Python-based Framework for Robot Learning
Paper • 2506.21628 • Published • 16 -
RoboBrain 2.0 Technical Report
Paper • 2507.02029 • Published • 35
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published • 1 -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14