Position Bias Taxonomy: Cross-Task Framework
A unified framework that evaluates and classifies position bias across different cognitive task types, revealing that position bias is not a single scalar but a task-dependent phenomenon.
Research Question
Does a model's position bias on retrieval tasks predict its position bias on reasoning, summarization, or translation tasks? Or are these independent dimensions of model behavior?
Position Bias Taxonomy
| Type | Pattern | Description |
|---|---|---|
| Primacy | High β Low | Best at start |
| Recency | Low β High | Best at end |
| U-Shaped | High β Low β High | Classic LITM |
| Middle-Sag | Flat β Low β Flat | Only middle suffers |
| Flat | ~Constant | No position effect |
| Inverted-U | Low β High β Low | Middle best (rare) |
Position Bias Index (PBI)
PBI = (acc_start + acc_end) / 2 - acc_middle
- PBI > 0.3: Strong U-shape
- PBI β 0: Flat (good)
- PBI < 0: Inverted-U (rare)
Weighted PBI accounts for full curve shape using Simpson's rule for AUC difference between edge and middle regions.
5 Task Types
| Task | Cognitive Demand | Expected Bias |
|---|---|---|
| KV Retrieval | Simple lookup | Strong U-shape |
| Needle in Haystack | Text search | Strong U-shape |
| Fact-Dependent Reasoning | Reasoning + retrieval | Moderate U-shape |
| Summarization | Comprehension + compression | Weak/moderate |
| Translation | Understanding + generation | Task-dependent |
Usage
pip install -r requirements.txt
# Single model, all tasks
python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct
# Multiple models
python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct meta-llama/Llama-3.2-1B-Instruct
# Specific tasks only
python run_all.py --tasks kv_retrieval reasoning --num-examples 50
Expected Headline Result
"PBI on retrieval tasks correlates poorly with PBI on reasoning tasks (r=0.12), suggesting position bias is not a unified model property but a task-dependent phenomenon."
Output
results/
βββ taxonomy_Qwen_Qwen2.5-1.5B-Instruct.json
βββ taxonomy_meta-llama_Llama-3.2-1B-Instruct.json
βββ ...
Each JSON contains task-level accuracies, PBI, classification, and cross-task statistics.
Citation
@software{position_bias_taxonomy,
title={Position Bias Taxonomy: A Cross-Task Framework for Long-Context Evaluation},
author={abhshkp},
year={2026},
url={https://huggingface.co/abhshkp/position-bias-taxonomy}
}
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern