abhshkp's picture
Update ML Intern artifact metadata
a3765a4 verified
---
tags:
- ml-intern
---
# Position Bias Taxonomy: Cross-Task Framework
A unified framework that evaluates and classifies position bias across **different cognitive task types**, revealing that position bias is not a single scalar but a task-dependent phenomenon.
## Research Question
> Does a model's position bias on retrieval tasks predict its position bias on reasoning, summarization, or translation tasks? Or are these independent dimensions of model behavior?
## Position Bias Taxonomy
| Type | Pattern | Description |
|------|---------|-------------|
| **Primacy** | High β†’ Low | Best at start |
| **Recency** | Low β†’ High | Best at end |
| **U-Shaped** | High β†’ Low β†’ High | Classic LITM |
| **Middle-Sag** | Flat β†’ Low β†’ Flat | Only middle suffers |
| **Flat** | ~Constant | No position effect |
| **Inverted-U** | Low β†’ High β†’ Low | Middle best (rare) |
## Position Bias Index (PBI)
```
PBI = (acc_start + acc_end) / 2 - acc_middle
```
- **PBI > 0.3**: Strong U-shape
- **PBI β‰ˆ 0**: Flat (good)
- **PBI < 0**: Inverted-U (rare)
**Weighted PBI** accounts for full curve shape using Simpson's rule for AUC difference between edge and middle regions.
## 5 Task Types
| Task | Cognitive Demand | Expected Bias |
|------|----------------|---------------|
| **KV Retrieval** | Simple lookup | Strong U-shape |
| **Needle in Haystack** | Text search | Strong U-shape |
| **Fact-Dependent Reasoning** | Reasoning + retrieval | Moderate U-shape |
| **Summarization** | Comprehension + compression | Weak/moderate |
| **Translation** | Understanding + generation | Task-dependent |
## Usage
```bash
pip install -r requirements.txt
# Single model, all tasks
python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct
# Multiple models
python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct meta-llama/Llama-3.2-1B-Instruct
# Specific tasks only
python run_all.py --tasks kv_retrieval reasoning --num-examples 50
```
## Expected Headline Result
> "PBI on retrieval tasks correlates poorly with PBI on reasoning tasks (r=0.12), suggesting position bias is not a unified model property but a task-dependent phenomenon."
## Output
```
results/
β”œβ”€β”€ taxonomy_Qwen_Qwen2.5-1.5B-Instruct.json
β”œβ”€β”€ taxonomy_meta-llama_Llama-3.2-1B-Instruct.json
└── ...
```
Each JSON contains task-level accuracies, PBI, classification, and cross-task statistics.
## Citation
```bibtex
@software{position_bias_taxonomy,
title={Position Bias Taxonomy: A Cross-Task Framework for Long-Context Evaluation},
author={abhshkp},
year={2026},
url={https://huggingface.co/abhshkp/position-bias-taxonomy}
}
```
<!-- ml-intern-provenance -->
## Generated by ML Intern
This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern