Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2601.15282

A Pragmatic VLA Foundation Model

Paper • 2601.18692 • Published Jan 26 • 49
Rethinking Video Generation Model for the Embodied World

Paper • 2601.15282 • Published Jan 21 • 45
Advancing Open-source World Models

Paper • 2601.20540 • Published Jan 28 • 135

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5, 2025 • 11
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published Oct 1, 2025 • 67
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22, 2025 • 53

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021 • 1
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published Aug 28, 2025 • 78
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11, 2025 • 254
Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14, 2025 • 129
Rethinking Video Generation Model for the Embodied World

Paper • 2601.15282 • Published Jan 21 • 45

Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

Paper • 2506.06205 • Published Jun 6, 2025 • 30
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Paper • 2506.07530 • Published Jun 9, 2025 • 20
Ark: An Open-source Python-based Framework for Robot Learning

Paper • 2506.21628 • Published Jun 24, 2025 • 16
RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published Jul 2, 2025 • 35

Video understanding

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29, 2024 • 37
TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12, 2025 • 45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9, 2025 • 14

A Pragmatic VLA Foundation Model

Paper • 2601.18692 • Published Jan 26 • 49
Rethinking Video Generation Model for the Embodied World

Paper • 2601.15282 • Published Jan 21 • 45
Advancing Open-source World Models

Paper • 2601.20540 • Published Jan 28 • 135

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published Aug 28, 2025 • 78
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11, 2025 • 254
Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14, 2025 • 129
Rethinking Video Generation Model for the Embodied World

Paper • 2601.15282 • Published Jan 21 • 45

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5, 2025 • 11
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published Oct 1, 2025 • 67
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22, 2025 • 53

Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

Paper • 2506.06205 • Published Jun 6, 2025 • 30
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Paper • 2506.07530 • Published Jun 9, 2025 • 20
Ark: An Open-source Python-based Framework for Robot Learning

Paper • 2506.21628 • Published Jun 24, 2025 • 16
RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published Jul 2, 2025 • 35

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021 • 1
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

Video understanding

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29, 2024 • 37
TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12, 2025 • 45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9, 2025 • 14

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs