Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.28407

Running

Featured

98

CUGA Agent

🤖

98

Configurable Generalist Agent, leader in AppWorld Benchmark
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

Paper • 2603.28407 • Published 20 days ago • 68
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published 13 days ago • 40

about 18 hours ago

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published 24 days ago • 155
TAPS: Task Aware Proposal Distributions for Speculative Sampling

Paper • 2603.27027 • Published 22 days ago • 142
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Paper • 2603.25716 • Published 24 days ago • 154
LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published 21 days ago • 143

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19, 2025 • 45
facebook/natural_reasoning

Viewer • Updated Feb 21, 2025 • 1.15M • 1.46k • 561
nvidia/OpenMathReasoning

Viewer • Updated May 27, 2025 • 5.68M • 17.6k • 453
Search Arena: Analyzing Search-Augmented LLMs

Paper • 2506.05334 • Published Jun 5, 2025 • 18

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 107
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 80
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 45

MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

Paper • 2603.28407 • Published 20 days ago • 68

Coding Benchmarks

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 304
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

Paper • 2511.05459 • Published Nov 7, 2025 • 4
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

Paper • 2512.18470 • Published Dec 20, 2025 • 12
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Paper • 2601.09688 • Published Jan 14 • 127

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

Running

Featured

98

CUGA Agent

🤖

98

Configurable Generalist Agent, leader in AppWorld Benchmark
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

Paper • 2603.28407 • Published 20 days ago • 68
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published 13 days ago • 40

MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

Paper • 2603.28407 • Published 20 days ago • 68

about 18 hours ago

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published 24 days ago • 155
TAPS: Task Aware Proposal Distributions for Speculative Sampling

Paper • 2603.27027 • Published 22 days ago • 142
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Paper • 2603.25716 • Published 24 days ago • 154
LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published 21 days ago • 143

Coding Benchmarks

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 304
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

Paper • 2511.05459 • Published Nov 7, 2025 • 4
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

Paper • 2512.18470 • Published Dec 20, 2025 • 12
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Paper • 2601.09688 • Published Jan 14 • 127

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19, 2025 • 45
facebook/natural_reasoning

Viewer • Updated Feb 21, 2025 • 1.15M • 1.46k • 561
nvidia/OpenMathReasoning

Viewer • Updated May 27, 2025 • 5.68M • 17.6k • 453
Search Arena: Analyzing Search-Augmented LLMs

Paper • 2506.05334 • Published Jun 5, 2025 • 18

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 107
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 80
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 45

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs