Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2204.02311

Toolkit - AI Papers

Neural Machine Translation by Jointly Learning to Align and Translate

Paper • 1409.0473 • Published Sep 1, 2014 • 7
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 50

PaLM: Scaling Language Modeling with Pathways

Paper • 2204.02311 • Published Apr 5, 2022 • 3
meta-llama/Llama-2-13b

Text Generation • Updated Apr 17, 2024 • 39 • 352
deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27, 2025 • 612k • • 3.1k
meta-llama/Llama-3.3-70B-Instruct

Text Generation • 71B • Updated Dec 21, 2024 • 494k • • 2.73k

Running

3.21k

AnyCoder

🏆

3.21k

Generate code snippets with AI
Running

Agents

Featured

272

Qwen2.5 Coder Artifacts

🐢

272

Generate and preview web app code from a text description
Running

Agents

Featured

921

QwQ-32B-Preview

🔍

921

QwQ-32B-Preview
Running on CPU Upgrade

14k

Open LLM Leaderboard

🏆

14k

Track, rank and evaluate open LLMs and chatbots

Fundamentals NLP

Distributed Representations of Sentences and Documents

Paper • 1405.4053 • Published May 16, 2014
Sequence to Sequence Learning with Neural Networks

Paper • 1409.3215 • Published Sep 10, 2014 • 3
PaLM: Scaling Language Modeling with Pathways

Paper • 2204.02311 • Published Apr 5, 2022 • 3
Recent Trends in Deep Learning Based Natural Language Processing

Paper • 1708.02709 • Published Aug 9, 2017

#MustRead Papers

Signature papers in AI/ML with focus on generative AI or large language models that bring unique perspectives and/or are highly cited by peers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Paper • 2201.11903 • Published Jan 28, 2022 • 15
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 77

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Paper • 2006.03654 • Published Jun 5, 2020 • 3
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 10
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 2
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 22
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 10
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 22

PaLM: Scaling Language Modeling with Pathways

Paper • 2204.02311 • Published Apr 5, 2022 • 3

Toolkit - AI Papers

Neural Machine Translation by Jointly Learning to Align and Translate

Paper • 1409.0473 • Published Sep 1, 2014 • 7
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 50

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Paper • 2006.03654 • Published Jun 5, 2020 • 3
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 10
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20

PaLM: Scaling Language Modeling with Pathways

Paper • 2204.02311 • Published Apr 5, 2022 • 3
meta-llama/Llama-2-13b

Text Generation • Updated Apr 17, 2024 • 39 • 352
deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27, 2025 • 612k • • 3.1k
meta-llama/Llama-3.3-70B-Instruct

Text Generation • 71B • Updated Dec 21, 2024 • 494k • • 2.73k

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 2
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Running

3.21k

AnyCoder

🏆

3.21k

Generate code snippets with AI
Running

Agents

Featured

272

Qwen2.5 Coder Artifacts

🐢

272

Generate and preview web app code from a text description
Running

Agents

Featured

921

QwQ-32B-Preview

🔍

921

QwQ-32B-Preview
Running on CPU Upgrade

14k

Open LLM Leaderboard

🏆

14k

Track, rank and evaluate open LLMs and chatbots

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 22
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20

Fundamentals NLP

Distributed Representations of Sentences and Documents

Paper • 1405.4053 • Published May 16, 2014
Sequence to Sequence Learning with Neural Networks

Paper • 1409.3215 • Published Sep 10, 2014 • 3
PaLM: Scaling Language Modeling with Pathways

Paper • 2204.02311 • Published Apr 5, 2022 • 3
Recent Trends in Deep Learning Based Natural Language Processing

Paper • 1708.02709 • Published Aug 9, 2017

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 10
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 22

#MustRead Papers

Signature papers in AI/ML with focus on generative AI or large language models that bring unique perspectives and/or are highly cited by peers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Paper • 2201.11903 • Published Jan 28, 2022 • 15
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 77

PaLM: Scaling Language Modeling with Pathways

Paper • 2204.02311 • Published Apr 5, 2022 • 3

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs