---
tags:
- ml-intern
---
# Position Bias Taxonomy: Cross-Task Framework

A unified framework that evaluates and classifies position bias across **different cognitive task types**, revealing that position bias is not a single scalar but a task-dependent phenomenon.

## Research Question

> Does a model's position bias on retrieval tasks predict its position bias on reasoning, summarization, or translation tasks? Or are these independent dimensions of model behavior?

## Position Bias Taxonomy

| Type | Pattern | Description |
|------|---------|-------------|
| **Primacy** | High → Low | Best at start |
| **Recency** | Low → High | Best at end |
| **U-Shaped** | High → Low → High | Classic LITM |
| **Middle-Sag** | Flat → Low → Flat | Only middle suffers |
| **Flat** | ~Constant | No position effect |
| **Inverted-U** | Low → High → Low | Middle best (rare) |

## Position Bias Index (PBI)

```
PBI = (acc_start + acc_end) / 2 - acc_middle
```

- **PBI > 0.3**: Strong U-shape
- **PBI ≈ 0**: Flat (good)
- **PBI < 0**: Inverted-U (rare)

**Weighted PBI** accounts for full curve shape using Simpson's rule for AUC difference between edge and middle regions.

## 5 Task Types

| Task | Cognitive Demand | Expected Bias |
|------|----------------|---------------|
| **KV Retrieval** | Simple lookup | Strong U-shape |
| **Needle in Haystack** | Text search | Strong U-shape |
| **Fact-Dependent Reasoning** | Reasoning + retrieval | Moderate U-shape |
| **Summarization** | Comprehension + compression | Weak/moderate |
| **Translation** | Understanding + generation | Task-dependent |

## Usage

```bash
pip install -r requirements.txt

# Single model, all tasks
python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct

# Multiple models
python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct meta-llama/Llama-3.2-1B-Instruct

# Specific tasks only
python run_all.py --tasks kv_retrieval reasoning --num-examples 50
```

## Expected Headline Result

> "PBI on retrieval tasks correlates poorly with PBI on reasoning tasks (r=0.12), suggesting position bias is not a unified model property but a task-dependent phenomenon."

## Output

```
results/
├── taxonomy_Qwen_Qwen2.5-1.5B-Instruct.json
├── taxonomy_meta-llama_Llama-3.2-1B-Instruct.json
└── ...
```

Each JSON contains task-level accuracies, PBI, classification, and cross-task statistics.

## Citation

```bibtex
@software{position_bias_taxonomy,
  title={Position Bias Taxonomy: A Cross-Task Framework for Long-Context Evaluation},
  author={abhshkp},
  year={2026},
  url={https://huggingface.co/abhshkp/position-bias-taxonomy}
}
```

<!-- ml-intern-provenance -->
## Generated by ML Intern

This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.

- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern