Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2604.04323

Running

Featured

98

CUGA Agent

🤖

98

Configurable Generalist Agent, leader in AppWorld Benchmark
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

Paper • 2603.28407 • Published 20 days ago • 68
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published 14 days ago • 40

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Paper • 2603.25158 • Published 24 days ago • 50
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

Paper • 2604.08377 • Published 11 days ago • 278
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published 14 days ago • 40
SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Paper • 2604.04804 • Published 14 days ago • 32

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7, 2025 • 110
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6, 2025 • 31

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 69
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9, 2025 • 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 195
SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20, 2025 • 100

How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published 14 days ago • 40
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

Paper • 2604.08377 • Published 11 days ago • 278

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Paper • 2603.03276 • Published Mar 3 • 103
Qwen3-Coder-Next Technical Report

Paper • 2603.00729 • Published Feb 28 • 64
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Paper • 2603.03205 • Published Mar 3 • 13
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

Paper • 2602.23166 • Published Feb 26 • 45

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 208 • 99
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 39
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88

Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 40
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 57
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

Running

Featured

98

CUGA Agent

🤖

98

Configurable Generalist Agent, leader in AppWorld Benchmark
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

Paper • 2603.28407 • Published 20 days ago • 68
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published 14 days ago • 40

How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published 14 days ago • 40
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

Paper • 2604.08377 • Published 11 days ago • 278

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Paper • 2603.25158 • Published 24 days ago • 50
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

Paper • 2604.08377 • Published 11 days ago • 278
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published 14 days ago • 40
SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Paper • 2604.04804 • Published 14 days ago • 32

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Paper • 2603.03276 • Published Mar 3 • 103
Qwen3-Coder-Next Technical Report

Paper • 2603.00729 • Published Feb 28 • 64
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Paper • 2603.03205 • Published Mar 3 • 13
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

Paper • 2602.23166 • Published Feb 26 • 45

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7, 2025 • 110
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6, 2025 • 31

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 208 • 99
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 39
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 69
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9, 2025 • 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 195
SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20, 2025 • 100

Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 40
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 57
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs