Collections
Discover the best community collections!
Collections including paper arxiv:2602.03143
-
Behavior Knowledge Merge in Reinforced Agentic Models
Paper • 2601.13572 • Published • 27 -
Language of Thought Shapes Output Diversity in Large Language Models
Paper • 2601.11227 • Published • 9 -
Agentic-R: Learning to Retrieve for Agentic Search
Paper • 2601.11888 • Published • 19 -
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
Paper • 2602.02488 • Published • 36
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 76 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 110 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 31
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 107 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 80 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45
-
Beyond Imitation: Reinforcement Learning for Active Latent Planning
Paper • 2601.21598 • Published • 10 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42 -
Self-Hinting Language Models Enhance Reinforcement Learning
Paper • 2602.03143 • Published • 31 -
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
Paper • 2602.12099 • Published • 61
-
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 105 -
Tree Search for LLM Agent Reinforcement Learning
Paper • 2509.21240 • Published • 92 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106 -
How Far Are We from Genuinely Useful Deep Research Agents?
Paper • 2512.01948 • Published • 57
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 40 -
Fast and Simplex: 2-Simplicial Attention in Triton
Paper • 2507.02754 • Published • 25 -
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction
Paper • 2507.02025 • Published • 35 -
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 24
-
facebook/w2v-bert-2.0
Feature Extraction • 0.6B • Updated • 3.1M • 213 -
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 10.8k • 49 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 21.1M • 1.99k -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.4M • 1.47k
-
Beyond Imitation: Reinforcement Learning for Active Latent Planning
Paper • 2601.21598 • Published • 10 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42 -
Self-Hinting Language Models Enhance Reinforcement Learning
Paper • 2602.03143 • Published • 31 -
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
Paper • 2602.12099 • Published • 61
-
Behavior Knowledge Merge in Reinforced Agentic Models
Paper • 2601.13572 • Published • 27 -
Language of Thought Shapes Output Diversity in Large Language Models
Paper • 2601.11227 • Published • 9 -
Agentic-R: Learning to Retrieve for Agentic Search
Paper • 2601.11888 • Published • 19 -
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
Paper • 2602.02488 • Published • 36
-
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 105 -
Tree Search for LLM Agent Reinforcement Learning
Paper • 2509.21240 • Published • 92 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 106 -
How Far Are We from Genuinely Useful Deep Research Agents?
Paper • 2512.01948 • Published • 57
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 76 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 110 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 31
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 40 -
Fast and Simplex: 2-Simplicial Attention in Triton
Paper • 2507.02754 • Published • 25 -
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction
Paper • 2507.02025 • Published • 35 -
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 24
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
facebook/w2v-bert-2.0
Feature Extraction • 0.6B • Updated • 3.1M • 213 -
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 10.8k • 49 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 21.1M • 1.99k -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.4M • 1.47k
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 107 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 80 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45