-
CUGA Agent
🤖98Configurable Generalist Agent, leader in AppWorld Benchmark
-
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome
Paper • 2603.28407 • Published • 68 -
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings
Paper • 2604.04323 • Published • 40
Collections
Discover the best community collections!
Collections including paper arxiv:2604.04323
-
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Paper • 2603.25158 • Published • 50 -
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Paper • 2604.08377 • Published • 278 -
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings
Paper • 2604.04323 • Published • 40 -
SkillX: Automatically Constructing Skill Knowledge Bases for Agents
Paper • 2604.04804 • Published • 32
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 76 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 110 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 31
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 195 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100
-
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Paper • 2603.03276 • Published • 103 -
Qwen3-Coder-Next Technical Report
Paper • 2603.00729 • Published • 64 -
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
Paper • 2603.03205 • Published • 13 -
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
Paper • 2602.23166 • Published • 45
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 208 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 40 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 57 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 37 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25
-
CUGA Agent
🤖98Configurable Generalist Agent, leader in AppWorld Benchmark
-
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome
Paper • 2603.28407 • Published • 68 -
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings
Paper • 2604.04323 • Published • 40
-
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Paper • 2603.25158 • Published • 50 -
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Paper • 2604.08377 • Published • 278 -
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings
Paper • 2604.04323 • Published • 40 -
SkillX: Automatically Constructing Skill Knowledge Bases for Agents
Paper • 2604.04804 • Published • 32
-
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Paper • 2603.03276 • Published • 103 -
Qwen3-Coder-Next Technical Report
Paper • 2603.00729 • Published • 64 -
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
Paper • 2603.03205 • Published • 13 -
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
Paper • 2602.23166 • Published • 45
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 76 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 110 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 31
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 208 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 195 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 40 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 57 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 37 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25