CocoaBench: Evaluating Unified Digital Agents in the Wild Paper • 2604.11201 • Published 1 day ago • 23
CocoaBench: Evaluating Unified Digital Agents in the Wild Paper • 2604.11201 • Published 1 day ago • 23
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17, 2025 • 50
Linear Correlation in LM's Compositional Generalization and Hallucination Paper • 2502.04520 • Published Feb 6, 2025 • 10
Linear Correlation in LM's Compositional Generalization and Hallucination Paper • 2502.04520 • Published Feb 6, 2025 • 10
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38 • 6
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 94 • 7
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 94
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 94
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 94 • 7
Runtime error 26 Decentralized Arena Leaderboard 🥇 26 View and compare LLM evaluations across various domains
Running on Zero 31 Gpt2 Multiplication Predictor 📈 31 Multiply large numbers using different reasoning methods
Reasoning with Language Model is Planning with World Model Paper • 2305.14992 • Published May 24, 2023 • 4
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings Paper • 2305.11554 • Published May 19, 2023 • 2
Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking Paper • 2406.05673 • Published Jun 9, 2024 • 3