-
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 102 -
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Paper • 2601.19325 • Published • 81 -
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
Paper • 2601.14133 • Published • 61 -
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Paper • 2601.21821 • Published • 62
Collections
Discover the best community collections!
Collections including paper arxiv:2601.14133
-
World-in-World: World Models in a Closed-Loop World
Paper • 2510.18135 • Published • 78 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 53 -
World Simulation with Video Foundation Models for Physical AI
Paper • 2511.00062 • Published • 46 -
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
Paper • 2601.14133 • Published • 61
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 60 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence
Paper • 2512.16793 • Published • 76 -
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
Paper • 2601.14133 • Published • 61 -
BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
Paper • 2601.15197 • Published • 55
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 31 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 102 -
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Paper • 2601.19325 • Published • 81 -
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
Paper • 2601.14133 • Published • 61 -
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Paper • 2601.21821 • Published • 62
-
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence
Paper • 2512.16793 • Published • 76 -
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
Paper • 2601.14133 • Published • 61 -
BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
Paper • 2601.15197 • Published • 55
-
World-in-World: World Models in a Closed-Loop World
Paper • 2510.18135 • Published • 78 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 53 -
World Simulation with Video Foundation Models for Physical AI
Paper • 2511.00062 • Published • 46 -
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
Paper • 2601.14133 • Published • 61
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 31 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 60 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64