| --- |
| tags: |
| - ml-intern |
| --- |
| # Position Bias Taxonomy: Cross-Task Framework |
|
|
| A unified framework that evaluates and classifies position bias across **different cognitive task types**, revealing that position bias is not a single scalar but a task-dependent phenomenon. |
|
|
| ## Research Question |
|
|
| > Does a model's position bias on retrieval tasks predict its position bias on reasoning, summarization, or translation tasks? Or are these independent dimensions of model behavior? |
|
|
| ## Position Bias Taxonomy |
|
|
| | Type | Pattern | Description | |
| |------|---------|-------------| |
| | **Primacy** | High β Low | Best at start | |
| | **Recency** | Low β High | Best at end | |
| | **U-Shaped** | High β Low β High | Classic LITM | |
| | **Middle-Sag** | Flat β Low β Flat | Only middle suffers | |
| | **Flat** | ~Constant | No position effect | |
| | **Inverted-U** | Low β High β Low | Middle best (rare) | |
|
|
| ## Position Bias Index (PBI) |
|
|
| ``` |
| PBI = (acc_start + acc_end) / 2 - acc_middle |
| ``` |
|
|
| - **PBI > 0.3**: Strong U-shape |
| - **PBI β 0**: Flat (good) |
| - **PBI < 0**: Inverted-U (rare) |
|
|
| **Weighted PBI** accounts for full curve shape using Simpson's rule for AUC difference between edge and middle regions. |
|
|
| ## 5 Task Types |
|
|
| | Task | Cognitive Demand | Expected Bias | |
| |------|----------------|---------------| |
| | **KV Retrieval** | Simple lookup | Strong U-shape | |
| | **Needle in Haystack** | Text search | Strong U-shape | |
| | **Fact-Dependent Reasoning** | Reasoning + retrieval | Moderate U-shape | |
| | **Summarization** | Comprehension + compression | Weak/moderate | |
| | **Translation** | Understanding + generation | Task-dependent | |
|
|
| ## Usage |
|
|
| ```bash |
| pip install -r requirements.txt |
| |
| # Single model, all tasks |
| python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct |
| |
| # Multiple models |
| python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct meta-llama/Llama-3.2-1B-Instruct |
| |
| # Specific tasks only |
| python run_all.py --tasks kv_retrieval reasoning --num-examples 50 |
| ``` |
|
|
| ## Expected Headline Result |
|
|
| > "PBI on retrieval tasks correlates poorly with PBI on reasoning tasks (r=0.12), suggesting position bias is not a unified model property but a task-dependent phenomenon." |
|
|
| ## Output |
|
|
| ``` |
| results/ |
| βββ taxonomy_Qwen_Qwen2.5-1.5B-Instruct.json |
| βββ taxonomy_meta-llama_Llama-3.2-1B-Instruct.json |
| βββ ... |
| ``` |
|
|
| Each JSON contains task-level accuracies, PBI, classification, and cross-task statistics. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @software{position_bias_taxonomy, |
| title={Position Bias Taxonomy: A Cross-Task Framework for Long-Context Evaluation}, |
| author={abhshkp}, |
| year={2026}, |
| url={https://huggingface.co/abhshkp/position-bias-taxonomy} |
| } |
| ``` |
|
|
| <!-- ml-intern-provenance --> |
| ## Generated by ML Intern |
|
|
| This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. |
|
|
| - Try ML Intern: https://smolagents-ml-intern.hf.space |
| - Source code: https://github.com/huggingface/ml-intern |
|
|