-
Visual Spatial Tuning
Paper • 2511.05491 • Published • 53 -
Adam's Law: Textual Frequency Law on Large Language Models
Paper • 2604.02176 • Published • 481 -
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
Paper • 2604.10098 • Published • 74 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 79
Collections
Discover the best community collections!
Collections including paper arxiv:2511.05491
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 193 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 138 -
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Paper • 2510.23607 • Published • 181 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 127
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published • 1 -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
V-Thinker: Interactive Thinking with Images
Paper • 2511.04460 • Published • 98 -
Visual Spatial Tuning
Paper • 2511.05491 • Published • 53 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 201 -
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text
Paper • 2601.22975 • Published • 110
-
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
Paper • 2510.15110 • Published • 18 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 124 -
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Paper • 2510.13795 • Published • 59 -
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper • 2510.13515 • Published • 12
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 20 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 31 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
Visual Spatial Tuning
Paper • 2511.05491 • Published • 53 -
Adam's Law: Textual Frequency Law on Large Language Models
Paper • 2604.02176 • Published • 481 -
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
Paper • 2604.10098 • Published • 74 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 79
-
V-Thinker: Interactive Thinking with Images
Paper • 2511.04460 • Published • 98 -
Visual Spatial Tuning
Paper • 2511.05491 • Published • 53 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 201 -
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text
Paper • 2601.22975 • Published • 110
-
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
Paper • 2510.15110 • Published • 18 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 124 -
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Paper • 2510.13795 • Published • 59 -
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper • 2510.13515 • Published • 12
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 193 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 138 -
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Paper • 2510.23607 • Published • 181 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 127
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 20 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published • 1 -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 31 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49