-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2509.26507
-
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 190 -
Seriki/FastHTML
Updated • 3 • 1 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
AI Can Learn Scientific Taste
Paper • 2603.14473 • Published • 424
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper • 2511.14993 • Published • 233 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 63 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Paper • 2309.06497 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 628 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 133 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 176
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Paper2Video: Automatic Video Generation from Scientific Papers
Paper • 2510.05096 • Published • 120 -
TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance
Paper • 2309.03736 • Published
-
DoPE: Denoising Rotary Position Embedding
Paper • 2511.09146 • Published • 98 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 66 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 127 -
Video Generation Models Are Good Latent Reward Models
Paper • 2511.21541 • Published • 47
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 63 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 190 -
Seriki/FastHTML
Updated • 3 • 1 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
AI Can Learn Scientific Taste
Paper • 2603.14473 • Published • 424
-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Paper • 2309.06497 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 628 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper • 2511.14993 • Published • 233 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 133 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 176
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Paper2Video: Automatic Video Generation from Scientific Papers
Paper • 2510.05096 • Published • 120 -
TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance
Paper • 2309.03736 • Published
-
DoPE: Denoising Rotary Position Embedding
Paper • 2511.09146 • Published • 98 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 66 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 127 -
Video Generation Models Are Good Latent Reward Models
Paper • 2511.21541 • Published • 47