Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2305.07759

Tiny LLMs and Datasets

A Benchmark for Learning to Translate a New Language from One Grammar Book

Paper • 2309.16575 • Published Sep 28, 2023 • 1
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Paper • 2409.19151 • Published Sep 27, 2024
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45
TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4, 2024 • 95

vincentkoc/tiny_qa_benchmark

Viewer • Updated May 20, 2025 • 52 • 41 • 1
vincentkoc/tiny_qa_benchmark_pp

Viewer • Updated Aug 25, 2025 • 662 • 383 • 2
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation

Paper • 2505.12058 • Published May 17, 2025 • 6
roneneldan/TinyStories

Viewer • Updated Aug 12, 2024 • 2.14M • 97.5k • 956

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45

Synthetic Data papers

Papers and important approraches for generation of synthetic data

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3, 2024 • 51
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 72
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 259
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 31

Synthetic Data Generation

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 90
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45
Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 107

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45
roneneldan/TinyStories-33M

Text Generation • Updated Aug 12, 2024 • 29.2k • 109

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45

Smol, but feisty

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45
roneneldan/TinyStories

Viewer • Updated Aug 12, 2024 • 2.14M • 97.5k • 956

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45

Tiny LLMs and Datasets

A Benchmark for Learning to Translate a New Language from One Grammar Book

Paper • 2309.16575 • Published Sep 28, 2023 • 1
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Paper • 2409.19151 • Published Sep 27, 2024
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45
TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4, 2024 • 95

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45
roneneldan/TinyStories-33M

Text Generation • Updated Aug 12, 2024 • 29.2k • 109

vincentkoc/tiny_qa_benchmark

Viewer • Updated May 20, 2025 • 52 • 41 • 1
vincentkoc/tiny_qa_benchmark_pp

Viewer • Updated Aug 25, 2025 • 662 • 383 • 2
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation

Paper • 2505.12058 • Published May 17, 2025 • 6
roneneldan/TinyStories

Viewer • Updated Aug 12, 2024 • 2.14M • 97.5k • 956

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45

Synthetic Data papers

Papers and important approraches for generation of synthetic data

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3, 2024 • 51
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 72
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 259
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 31

Smol, but feisty

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45
roneneldan/TinyStories

Viewer • Updated Aug 12, 2024 • 2.14M • 97.5k • 956

Synthetic Data Generation

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 154
Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 90
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45
Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 107

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs