-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 122
Collections
Discover the best community collections!
Collections including paper arxiv:2601.19897
-
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
Paper • 2601.23143 • Published • 39 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 224 -
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 204 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 201
-
Self-Distillation Enables Continual Learning
Paper • 2601.19897 • Published • 29 -
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
Paper • 2601.21468 • Published • 25 -
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Paper • 2509.23040 • Published • 12
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 93 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 102 -
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Paper • 2601.19325 • Published • 81 -
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
Paper • 2601.14133 • Published • 61 -
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Paper • 2601.21821 • Published • 62
-
Self-Distillation Enables Continual Learning
Paper • 2601.19897 • Published • 29 -
XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Paper • 2603.12056 • Published • 33 -
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
Paper • 2604.02268 • Published • 94
-
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Paper • 2601.14249 • Published • 13 -
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Paper • 2402.07033 • Published • 19 -
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences
Paper • 2601.07251 • Published • 11 -
GameTalk: Training LLMs for Strategic Conversation
Paper • 2601.16276 • Published • 13
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 122
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 93 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
Paper • 2601.23143 • Published • 39 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 224 -
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 204 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 201
-
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 102 -
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Paper • 2601.19325 • Published • 81 -
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
Paper • 2601.14133 • Published • 61 -
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Paper • 2601.21821 • Published • 62
-
Self-Distillation Enables Continual Learning
Paper • 2601.19897 • Published • 29 -
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
Paper • 2601.21468 • Published • 25 -
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Paper • 2509.23040 • Published • 12
-
Self-Distillation Enables Continual Learning
Paper • 2601.19897 • Published • 29 -
XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Paper • 2603.12056 • Published • 33 -
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
Paper • 2604.02268 • Published • 94
-
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Paper • 2601.14249 • Published • 13 -
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Paper • 2402.07033 • Published • 19 -
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences
Paper • 2601.07251 • Published • 11 -
GameTalk: Training LLMs for Strategic Conversation
Paper • 2601.16276 • Published • 13