Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2306.05087

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Paper • 2306.05087 • Published Jun 8, 2023 • 7

Hyperparameters

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Paper • 2306.05087 • Published Jun 8, 2023 • 7

KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval

Paper • 2310.15511 • Published Oct 24, 2023 • 5
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 27
SmartPlay : A Benchmark for LLMs as Intelligent Agents

Paper • 2310.01557 • Published Oct 2, 2023 • 13
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

Paper • 2310.03214 • Published Oct 5, 2023 • 20

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 2
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

Paper • 2310.13961 • Published Oct 21, 2023 • 5
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs

Paper • 2309.09582 • Published Sep 18, 2023 • 4
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

Paper • 2310.13127 • Published Oct 19, 2023 • 12
Evaluating the Robustness to Instructions of Large Language Models

Paper • 2308.14306 • Published Aug 28, 2023 • 1

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Paper • 2306.05087 • Published Jun 8, 2023 • 7

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 2
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Hyperparameters

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Paper • 2306.05087 • Published Jun 8, 2023 • 7

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

Paper • 2310.13961 • Published Oct 21, 2023 • 5
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs

Paper • 2309.09582 • Published Sep 18, 2023 • 4
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

Paper • 2310.13127 • Published Oct 19, 2023 • 12
Evaluating the Robustness to Instructions of Large Language Models

Paper • 2308.14306 • Published Aug 28, 2023 • 1

KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval

Paper • 2310.15511 • Published Oct 24, 2023 • 5
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 27
SmartPlay : A Benchmark for LLMs as Intelligent Agents

Paper • 2310.01557 • Published Oct 2, 2023 • 13
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

Paper • 2310.03214 • Published Oct 5, 2023 • 20

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs