Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2306.11644

synthetic-data-generation

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
Self-Improving VLM Judges Without Human Annotations

Paper • 2512.05145 • Published Dec 2, 2025 • 20
FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing

Paper • 2601.01720 • Published Jan 5 • 6
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique

Paper • 2511.09067 • Published Nov 12, 2025 • 2

Language Models - Essential Research Papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 23
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 2
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Synthetic Data papers

Papers and important approraches for generation of synthetic data

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3, 2024 • 51
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 72
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 259
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 31

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 102

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 44
Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

Running

3.21k

AnyCoder

🏆

3.21k

Generate code snippets with AI
Running

Agents

Featured

272

Qwen2.5 Coder Artifacts

🐢

272

Generate and preview web app code from a text description
Running

Agents

Featured

921

QwQ-32B-Preview

🔍

921

QwQ-32B-Preview
Running on CPU Upgrade

14k

Open LLM Leaderboard

🏆

14k

Track, rank and evaluate open LLMs and chatbots

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154

synthetic-data-generation

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
Self-Improving VLM Judges Without Human Annotations

Paper • 2512.05145 • Published Dec 2, 2025 • 20
FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing

Paper • 2601.01720 • Published Jan 5 • 6
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique

Paper • 2511.09067 • Published Nov 12, 2025 • 2

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 102

Language Models - Essential Research Papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 23
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 44
Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 2
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Running

3.21k

AnyCoder

🏆

3.21k

Generate code snippets with AI
Running

Agents

Featured

272

Qwen2.5 Coder Artifacts

🐢

272

Generate and preview web app code from a text description
Running

Agents

Featured

921

QwQ-32B-Preview

🔍

921

QwQ-32B-Preview
Running on CPU Upgrade

14k

Open LLM Leaderboard

🏆

14k

Track, rank and evaluate open LLMs and chatbots

Synthetic Data papers

Papers and important approraches for generation of synthetic data

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3, 2024 • 51
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 72
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 259
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 31

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154

Previous
1
2
3
4
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs