-
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10 -
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Paper • 2507.13348 • Published • 79 -
RewardDance: Reward Scaling in Visual Generation
Paper • 2509.08826 • Published • 73 -
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Paper • 2510.18876 • Published • 37
Collections
Discover the best community collections!
Collections including paper arxiv:2511.13720
-
Diffusion Transformers with Representation Autoencoders
Paper • 2510.11690 • Published • 170 -
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 70 -
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
Paper • 2512.04926 • Published • 42
-
General Agentic Memory Via Deep Research
Paper • 2511.18423 • Published • 170 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 132 -
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 135 -
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 70
-
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 70 -
Virtual Width Networks
Paper • 2511.11238 • Published • 39 -
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Paper • 2511.07419 • Published • 27 -
When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
Paper • 2511.02243 • Published • 25
-
TiDAR: Think in Diffusion, Talk in Autoregression
Paper • 2511.08923 • Published • 128 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 132 -
What Makes Diffusion Language Models Super Data Learners?
Paper • 2510.04071 • Published -
LLaDA2.0: Scaling Up Diffusion Language Models to 100B
Paper • 2512.15745 • Published • 88
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 75 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 118 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 61 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 107
-
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10 -
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Paper • 2507.13348 • Published • 79 -
RewardDance: Reward Scaling in Visual Generation
Paper • 2509.08826 • Published • 73 -
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Paper • 2510.18876 • Published • 37
-
General Agentic Memory Via Deep Research
Paper • 2511.18423 • Published • 170 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 132 -
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 135 -
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 70
-
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 70 -
Virtual Width Networks
Paper • 2511.11238 • Published • 39 -
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Paper • 2511.07419 • Published • 27 -
When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
Paper • 2511.02243 • Published • 25
-
TiDAR: Think in Diffusion, Talk in Autoregression
Paper • 2511.08923 • Published • 128 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 132 -
What Makes Diffusion Language Models Super Data Learners?
Paper • 2510.04071 • Published -
LLaDA2.0: Scaling Up Diffusion Language Models to 100B
Paper • 2512.15745 • Published • 88
-
Diffusion Transformers with Representation Autoencoders
Paper • 2510.11690 • Published • 170 -
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 70 -
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
Paper • 2512.04926 • Published • 42
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 75 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 118 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 61 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 107