Lost in the Middle - Reproduction Code
This repository contains self-contained code to reproduce the key-value retrieval experiment from the paper "Lost in the Middle: How Language Models Use Long Contexts" (Liu et al., 2023, arXiv:2307.03172).
Paper Summary
The paper demonstrates that LLMs exhibit a U-shaped performance curve when asked to retrieve information from long contexts: they perform best when the relevant information is at the beginning (primacy bias) or end (recency bias) of the context, and worst when it is in the middle.
Tasks Implemented
1. Key-Value Retrieval (Synthetic)
- Generate random UUID key-value pairs
- Place the "gold" key-value pair at controlled positions
- Prompt the model to extract the value for a given key
- Measure exact-match accuracy
- Plot the U-shaped accuracy curve
Quick Start
1. Generate Data
python gpu_experiment.py /app/litm/results_gpu Qwen/Qwen2.5-0.5B-Instruct 50 50
2. Run Experiment (Local Model - GPU)
python run_kv_gpu.py \
--data-path kv_data_50.jsonl \
--num-keys 50 \
--model Qwen/Qwen2.5-0.5B-Instruct \
--output-dir results_gpu \
--max-examples 50
3. Run Experiment (OpenAI API)
export OPENAI_API_KEY=sk-...
python run_openai.py \
--data-path kv_data_50.jsonl \
--num-keys 50 \
--model gpt-3.5-turbo \
--max-examples 50
Example Result
From a CPU run with Qwen2.5-0.5B-Instruct (10 keys, 5 examples per position):
| Gold Position | Accuracy |
|---|---|
| 0 (start) | 1.00 |
| 2 | 0.60 |
| 5 (middle) | 1.00 |
| 7 | 0.80 |
| 9 (end) | 0.60 |
With more examples and longer contexts, the U-shaped pattern becomes more pronounced (per the paper).
Prompt Template
The exact prompt from the paper:
Extract the value corresponding to the specified key in the JSON object below.
JSON data:
{"<key1>": "<value1>",
"<key2>": "<value2>",
...}
Key: "<query_key>"
Corresponding value:
Citation
@article{liu2023lost,
title={Lost in the Middle: How Language Models Use Long Contexts},
author={Liu, Nelson F and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy},
journal={arXiv preprint arXiv:2307.03172},
year={2023}
}
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "abhshkp/lost-in-the-middle-reproduction"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.