-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2508.15763
-
internlm/Intern-S1
Image-Text-to-Text • 241B • Updated • 48.3k • 258 -
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 273 -
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Paper • 2408.01800 • Published • 94 -
openbmb/MiniCPM-V-4_5
Image-Text-to-Text • 9B • Updated • 133k • 1.08k
-
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Paper • 2508.16949 • Published • 24 -
Diffusion Language Models Know the Answer Before Decoding
Paper • 2508.19982 • Published • 27 -
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Paper • 2508.18773 • Published • 16 -
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 273
-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Paper • 2309.06497 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 628 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 350 -
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 273
-
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 273 -
MemMamba: Rethinking Memory Patterns in State Space Model
Paper • 2510.03279 • Published • 74 -
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Paper • 2510.05034 • Published • 51 -
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 182
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 20 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Paper • 2309.06497 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 628 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
internlm/Intern-S1
Image-Text-to-Text • 241B • Updated • 48.3k • 258 -
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 273 -
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Paper • 2408.01800 • Published • 94 -
openbmb/MiniCPM-V-4_5
Image-Text-to-Text • 9B • Updated • 133k • 1.08k
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 350 -
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 273
-
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Paper • 2508.16949 • Published • 24 -
Diffusion Language Models Know the Answer Before Decoding
Paper • 2508.19982 • Published • 27 -
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Paper • 2508.18773 • Published • 16 -
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 273
-
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 273 -
MemMamba: Rethinking Memory Patterns in State Space Model
Paper • 2510.03279 • Published • 74 -
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Paper • 2510.05034 • Published • 51 -
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 182
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 20 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48