Lost in the Middle - Reproduction Code

This repository contains self-contained code to reproduce the key-value retrieval experiment from the paper "Lost in the Middle: How Language Models Use Long Contexts" (Liu et al., 2023, arXiv:2307.03172).

Paper Summary

The paper demonstrates that LLMs exhibit a U-shaped performance curve when asked to retrieve information from long contexts: they perform best when the relevant information is at the beginning (primacy bias) or end (recency bias) of the context, and worst when it is in the middle.

Tasks Implemented

1. Key-Value Retrieval (Synthetic)

Generate random UUID key-value pairs
Place the "gold" key-value pair at controlled positions
Prompt the model to extract the value for a given key
Measure exact-match accuracy
Plot the U-shaped accuracy curve

Quick Start

1. Generate Data

python gpu_experiment.py /app/litm/results_gpu Qwen/Qwen2.5-0.5B-Instruct 50 50

2. Run Experiment (Local Model - GPU)

python run_kv_gpu.py \
  --data-path kv_data_50.jsonl \
  --num-keys 50 \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --output-dir results_gpu \
  --max-examples 50

3. Run Experiment (OpenAI API)

export OPENAI_API_KEY=sk-...
python run_openai.py \
  --data-path kv_data_50.jsonl \
  --num-keys 50 \
  --model gpt-3.5-turbo \
  --max-examples 50

Example Result

From a CPU run with Qwen2.5-0.5B-Instruct (10 keys, 5 examples per position):

Gold Position	Accuracy
0 (start)	1.00
2	0.60
5 (middle)	1.00
7	0.80
9 (end)	0.60

With more examples and longer contexts, the U-shaped pattern becomes more pronounced (per the paper).

Prompt Template

The exact prompt from the paper:

Extract the value corresponding to the specified key in the JSON object below.

JSON data:
{"<key1>": "<value1>",
 "<key2>": "<value2>",
 ...}

Key: "<query_key>"
Corresponding value:

Citation

@article{liu2023lost,
  title={Lost in the Middle: How Language Models Use Long Contexts},
  author={Liu, Nelson F and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy},
  journal={arXiv preprint arXiv:2307.03172},
  year={2023}
}

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "abhshkp/lost-in-the-middle-reproduction"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for abhshkp/lost-in-the-middle-reproduction

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 44