-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 93 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
Collections
Discover the best community collections!
Collections including paper arxiv:2512.15560
-
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
Paper • 2507.13158 • Published • 24 -
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering
Paper • 2507.11527 • Published • 35 -
Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models
Paper • 2507.14241 • Published • 18 -
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper • 2406.06608 • Published • 68
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 31 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
Zero Bubble Pipeline Parallelism
Paper • 2401.10241 • Published • 25
-
openai/gpt-oss-120b
Text Generation • 120B • Updated • 3.49M • • 4.71k -
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Paper • 2512.20605 • Published • 62 -
Nested Browser-Use Learning for Agentic Information Seeking
Paper • 2512.23647 • Published • 19 -
TimeBill: Time-Budgeted Inference for Large Language Models
Paper • 2512.21859 • Published • 25
-
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Paper • 2505.13227 • Published • 45 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 1.42k • 562 -
nvidia/OpenMathReasoning
Viewer • Updated • 5.68M • 17.6k • 453 -
Search Arena: Analyzing Search-Augmented LLMs
Paper • 2506.05334 • Published • 18
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 93 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
openai/gpt-oss-120b
Text Generation • 120B • Updated • 3.49M • • 4.71k -
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Paper • 2512.20605 • Published • 62 -
Nested Browser-Use Learning for Agentic Information Seeking
Paper • 2512.23647 • Published • 19 -
TimeBill: Time-Budgeted Inference for Large Language Models
Paper • 2512.21859 • Published • 25
-
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
Paper • 2507.13158 • Published • 24 -
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering
Paper • 2507.11527 • Published • 35 -
Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models
Paper • 2507.14241 • Published • 18 -
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper • 2406.06608 • Published • 68
-
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Paper • 2505.13227 • Published • 45 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 1.42k • 562 -
nvidia/OpenMathReasoning
Viewer • Updated • 5.68M • 17.6k • 453 -
Search Arena: Analyzing Search-Augmented LLMs
Paper • 2506.05334 • Published • 18
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 31 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
Zero Bubble Pipeline Parallelism
Paper • 2401.10241 • Published • 25