-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 122
Collections
Discover the best community collections!
Collections including paper arxiv:2512.05145
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 154 -
Self-Improving VLM Judges Without Human Annotations
Paper • 2512.05145 • Published • 20 -
FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing
Paper • 2601.01720 • Published • 6 -
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique
Paper • 2511.09067 • Published • 2
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 55 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 148 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 8 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 277
-
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
Paper • 2510.08540 • Published • 110 -
Diffusion Transformers with Representation Autoencoders
Paper • 2510.11690 • Published • 170 -
Spotlight on Token Perception for Multimodal Reinforcement Learning
Paper • 2510.09285 • Published • 37 -
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation
Paper • 2510.17354 • Published • 35
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 208 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 93 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
Paper • 2512.04763 • Published • 5 -
VisPlay: Self-Evolving Vision-Language Models from Images
Paper • 2511.15661 • Published • 44 -
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse
Paper • 2512.14531 • Published • 15 -
Improving Recursive Transformers with Mixture of LoRAs
Paper • 2512.12880 • Published • 6
-
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 32 -
Self-Improving VLM Judges Without Human Annotations
Paper • 2512.05145 • Published • 20 -
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 63 -
Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation
Paper • 2604.02368 • Published • 11
-
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Paper • 2410.17637 • Published • 35 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Paper • 2411.18203 • Published • 40 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 122
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 93 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 154 -
Self-Improving VLM Judges Without Human Annotations
Paper • 2512.05145 • Published • 20 -
FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing
Paper • 2601.01720 • Published • 6 -
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique
Paper • 2511.09067 • Published • 2
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 55 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 148 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 8 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 277
-
MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
Paper • 2512.04763 • Published • 5 -
VisPlay: Self-Evolving Vision-Language Models from Images
Paper • 2511.15661 • Published • 44 -
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse
Paper • 2512.14531 • Published • 15 -
Improving Recursive Transformers with Mixture of LoRAs
Paper • 2512.12880 • Published • 6
-
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
Paper • 2510.08540 • Published • 110 -
Diffusion Transformers with Representation Autoencoders
Paper • 2510.11690 • Published • 170 -
Spotlight on Token Perception for Multimodal Reinforcement Learning
Paper • 2510.09285 • Published • 37 -
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation
Paper • 2510.17354 • Published • 35
-
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 32 -
Self-Improving VLM Judges Without Human Annotations
Paper • 2512.05145 • Published • 20 -
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 63 -
Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation
Paper • 2604.02368 • Published • 11
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 208 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Paper • 2410.17637 • Published • 35 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Paper • 2411.18203 • Published • 40 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25