Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.12572

LMEB: Long-horizon Memory Embedding Benchmark

Paper • 2603.12572 • Published Mar 13 • 73

Agentic Reasoning for Large Language Models

Paper • 2601.12538 • Published Jan 18 • 204
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 304
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 277
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Paper • 2602.08222 • Published Feb 9 • 290

Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

Paper • 2506.14234 • Published Jun 17, 2025 • 41
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models

Paper • 2506.14435 • Published Jun 17, 2025 • 7
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Paper • 2504.19413 • Published Apr 28, 2025 • 52
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4, 2025 • 166

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Paper • 2410.10819 • Published Oct 14, 2024 • 7
LLMtimesMapReduce: Simplified Long-Sequence Processing using Large Language Models

Paper • 2410.09342 • Published Oct 12, 2024 • 39
PDFTriage: Question Answering over Long, Structured Documents

Paper • 2309.08872 • Published Sep 16, 2023 • 55

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 107
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 80
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 45

🔥Hot Benchmarks

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

Paper • 2602.12783 • Published Feb 13 • 216
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

Paper • 2602.22638 • Published Feb 26 • 107
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Paper • 2601.22027 • Published Jan 29 • 85
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

Paper • 2601.11077 • Published Jan 16 • 67

Lychee-KaLM-LMEB

KaLM-Embedding/LMEB

Viewer • Updated Mar 19 • 840 • 2.25k • 26
LMEB: Long-horizon Memory Embedding Benchmark

Paper • 2603.12572 • Published Mar 13 • 73

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19, 2025 • 45
facebook/natural_reasoning

Viewer • Updated Feb 21, 2025 • 1.15M • 1.42k • 562
nvidia/OpenMathReasoning

Viewer • Updated May 27, 2025 • 5.68M • 17.6k • 453
Search Arena: Analyzing Search-Augmented LLMs

Paper • 2506.05334 • Published Jun 5, 2025 • 18

Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs

Paper • 2404.15676 • Published Apr 24, 2024
How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior

Paper • 2404.10198 • Published Apr 16, 2024 • 8
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 72
FaaF: Facts as a Function for the evaluation of RAG systems

Paper • 2403.03888 • Published Mar 6, 2024

LMEB: Long-horizon Memory Embedding Benchmark

Paper • 2603.12572 • Published Mar 13 • 73

🔥Hot Benchmarks

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

Paper • 2602.12783 • Published Feb 13 • 216
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

Paper • 2602.22638 • Published Feb 26 • 107
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Paper • 2601.22027 • Published Jan 29 • 85
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

Paper • 2601.11077 • Published Jan 16 • 67

Agentic Reasoning for Large Language Models

Paper • 2601.12538 • Published Jan 18 • 204
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 304
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 277
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Paper • 2602.08222 • Published Feb 9 • 290

Lychee-KaLM-LMEB

KaLM-Embedding/LMEB

Viewer • Updated Mar 19 • 840 • 2.25k • 26
LMEB: Long-horizon Memory Embedding Benchmark

Paper • 2603.12572 • Published Mar 13 • 73

Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

Paper • 2506.14234 • Published Jun 17, 2025 • 41
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models

Paper • 2506.14435 • Published Jun 17, 2025 • 7
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Paper • 2504.19413 • Published Apr 28, 2025 • 52
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4, 2025 • 166

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19, 2025 • 45
facebook/natural_reasoning

Viewer • Updated Feb 21, 2025 • 1.15M • 1.42k • 562
nvidia/OpenMathReasoning

Viewer • Updated May 27, 2025 • 5.68M • 17.6k • 453
Search Arena: Analyzing Search-Augmented LLMs

Paper • 2506.05334 • Published Jun 5, 2025 • 18

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Paper • 2410.10819 • Published Oct 14, 2024 • 7
LLMtimesMapReduce: Simplified Long-Sequence Processing using Large Language Models

Paper • 2410.09342 • Published Oct 12, 2024 • 39
PDFTriage: Question Answering over Long, Structured Documents

Paper • 2309.08872 • Published Sep 16, 2023 • 55

Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs

Paper • 2404.15676 • Published Apr 24, 2024
How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior

Paper • 2404.10198 • Published Apr 16, 2024 • 8
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 72
FaaF: Facts as a Function for the evaluation of RAG systems

Paper • 2403.03888 • Published Mar 6, 2024

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 107
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 80
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 45

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs