-
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2511.18538
-
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 204 -
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 277 -
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger
Paper • 2602.08222 • Published • 290
-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Paper • 2309.06497 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 628 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 47 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 665 -
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
Memory in the Age of AI Agents
Paper • 2512.13564 • Published • 157
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper • 2511.14993 • Published • 233 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513
-
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
Agentic Reasoning for Large Language Models
Paper • 2601.12538 • Published • 204 -
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 277 -
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger
Paper • 2602.08222 • Published • 290
-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Paper • 2309.06497 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 628 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper • 2511.14993 • Published • 233 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513
-
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 47 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 665 -
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
Memory in the Age of AI Agents
Paper • 2512.13564 • Published • 157