Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2512.01374

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Paper • 2511.16043 • Published Nov 20, 2025 • 111
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16, 2025 • 108

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

reasoning_model

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 96
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9, 2025 • 105
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

Paper • 2509.04475 • Published Aug 30, 2025 • 3
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields

Paper • 2601.03252 • Published Jan 6 • 104
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106
Helios: Real Real-Time Long Video Generation Model

Paper • 2603.04379 • Published Mar 4 • 186
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513

PretrainZero: Reinforcement Active Pretraining

Paper • 2512.03442 • Published Dec 3, 2025 • 49
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

Paper • 2512.03383 • Published Dec 3, 2025 • 5
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Paper • 2511.21689 • Published Nov 26, 2025 • 126
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

Paper • 2511.18890 • Published Nov 24, 2025 • 35

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9, 2025 • 105
Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25, 2025 • 92
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106
How Far Are We from Genuinely Useful Deep Research Agents?

Paper • 2512.01948 • Published Dec 1, 2025 • 57

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Paper • 2511.16664 • Published Nov 20, 2025 • 29
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields

Paper • 2601.03252 • Published Jan 6 • 104
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106
Helios: Real Real-Time Long Video Generation Model

Paper • 2603.04379 • Published Mar 4 • 186
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Paper • 2511.16043 • Published Nov 20, 2025 • 111
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16, 2025 • 108

PretrainZero: Reinforcement Active Pretraining

Paper • 2512.03442 • Published Dec 3, 2025 • 49
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

Paper • 2512.03383 • Published Dec 3, 2025 • 5
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Paper • 2511.21689 • Published Nov 26, 2025 • 126
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

Paper • 2511.18890 • Published Nov 24, 2025 • 35

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9, 2025 • 105
Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25, 2025 • 92
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106
How Far Are We from Genuinely Useful Deep Research Agents?

Paper • 2512.01948 • Published Dec 1, 2025 • 57

reasoning_model

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 96
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9, 2025 • 105
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

Paper • 2509.04475 • Published Aug 30, 2025 • 3
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Paper • 2511.16664 • Published Nov 20, 2025 • 29
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs