Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.10291

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Paper • 2503.12329 • Published Mar 16, 2025 • 27
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13, 2025 • 36

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13, 2025 • 36
OpenGVLab/VisualPRM-8B

Image-Text-to-Text • 8B • Updated May 6, 2025 • 58 • 18
OpenGVLab/VisualPRM-8B-v1_1

Image-Text-to-Text • 8B • Updated May 29, 2025 • 177 • 9
OpenGVLab/VisualPRM400K

Preview • Updated Apr 15, 2025 • 149 • 15

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 32
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1, 2025 • 110
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 107

Multimodal Alignment

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 35
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Paper • 2411.18203 • Published Nov 27, 2024 • 40
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published Apr 7, 2025 • 26
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models

Paper • 2504.04718 • Published Apr 7, 2025 • 43
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement

Paper • 2504.03561 • Published Apr 4, 2025 • 18
Concept Lancet: Image Editing with Compositional Representation Transplant

Paper • 2504.02828 • Published Apr 3, 2025 • 16

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13, 2025 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13, 2025 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 39
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10, 2025 • 88

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17, 2025 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17, 2025 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

Reasoning, Thinking, RL and Test-Time Scaling

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 36
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

Multimodal Benchmarks

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 47
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16, 2024 • 16
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 62

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published Apr 7, 2025 • 26
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models

Paper • 2504.04718 • Published Apr 7, 2025 • 43
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement

Paper • 2504.03561 • Published Apr 4, 2025 • 18
Concept Lancet: Image Editing with Compositional Representation Transplant

Paper • 2504.02828 • Published Apr 3, 2025 • 16

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Paper • 2503.12329 • Published Mar 16, 2025 • 27
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13, 2025 • 36

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13, 2025 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13, 2025 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 39
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10, 2025 • 88

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13, 2025 • 36
OpenGVLab/VisualPRM-8B

Image-Text-to-Text • 8B • Updated May 6, 2025 • 58 • 18
OpenGVLab/VisualPRM-8B-v1_1

Image-Text-to-Text • 8B • Updated May 29, 2025 • 177 • 9
OpenGVLab/VisualPRM400K

Preview • Updated Apr 15, 2025 • 149 • 15

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17, 2025 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17, 2025 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 32
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1, 2025 • 110
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 107

Reasoning, Thinking, RL and Test-Time Scaling

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 36
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

Multimodal Alignment

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 35
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Paper • 2411.18203 • Published Nov 27, 2024 • 40
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

Multimodal Benchmarks

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 47
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16, 2024 • 16
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 62

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs