Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2307.09288

Foundational & Modern AI Research (Curated)

A curated selection of foundational and modern AI research papers that meaningfully influence how real-world AI systems are designed, evaluated, and g

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 10
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT

Paper • 2210.04186 • Published Oct 9, 2022

Open-Source Foundations for Modern AI Systems

open-source libraries that form the infrastructure layer of modern AI systems, spanning model dev, retrieval, orchestration, evaluation, and MLOPS.

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Paper • 2309.06497 • Published Sep 12, 2023 • 7
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 628
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251
Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 950

Language Models - Essential Research Papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 23
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

Milestone Papers in LLM

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

Qwen/Qwen3-8B

Text Generation • 8B • Updated Jul 26, 2025 • 8.64M • • 1.05k
Qwen/Qwen3-4B

Text Generation • Updated Jul 26, 2025 • 7.56M • 600
Qwen/Qwen3-0.6B

Text Generation • 0.8B • Updated Jul 26, 2025 • 15.6M • 1.2k
google/gemma-3-4b-it

Image-Text-to-Text • Updated Mar 21, 2025 • 1.82M • 1.31k

Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
Evaluating Large Language Models Trained on Code

Paper • 2107.03374 • Published Jul 7, 2021 • 10
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 7

2026 - Reading AI Research Papers with Ajinkya

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 245
A Survey on Diffusion Language Models

Paper • 2508.10875 • Published Aug 14, 2025 • 34
Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 17
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5, 2024 • 71

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 118

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 44
Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 2
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Foundational & Modern AI Research (Curated)

A curated selection of foundational and modern AI research papers that meaningfully influence how real-world AI systems are designed, evaluated, and g

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 10
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT

Paper • 2210.04186 • Published Oct 9, 2022

Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
Evaluating Large Language Models Trained on Code

Paper • 2107.03374 • Published Jul 7, 2021 • 10
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 7

Open-Source Foundations for Modern AI Systems

open-source libraries that form the infrastructure layer of modern AI systems, spanning model dev, retrieval, orchestration, evaluation, and MLOPS.

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Paper • 2309.06497 • Published Sep 12, 2023 • 7
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 628
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251
Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 950

2026 - Reading AI Research Papers with Ajinkya

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 245
A Survey on Diffusion Language Models

Paper • 2508.10875 • Published Aug 14, 2025 • 34
Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 17
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5, 2024 • 71

Language Models - Essential Research Papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 23
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 118

Milestone Papers in LLM

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 44
Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

Qwen/Qwen3-8B

Text Generation • 8B • Updated Jul 26, 2025 • 8.64M • • 1.05k
Qwen/Qwen3-4B

Text Generation • Updated Jul 26, 2025 • 7.56M • 600
Qwen/Qwen3-0.6B

Text Generation • 0.8B • Updated Jul 26, 2025 • 15.6M • 1.2k
google/gemma-3-4b-it

Image-Text-to-Text • Updated Mar 21, 2025 • 1.82M • 1.31k

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 2
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Previous
1
2
3
...
9
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs