--- tags: - ml-intern --- # Position Bias Taxonomy: Cross-Task Framework A unified framework that evaluates and classifies position bias across **different cognitive task types**, revealing that position bias is not a single scalar but a task-dependent phenomenon. ## Research Question > Does a model's position bias on retrieval tasks predict its position bias on reasoning, summarization, or translation tasks? Or are these independent dimensions of model behavior? ## Position Bias Taxonomy | Type | Pattern | Description | |------|---------|-------------| | **Primacy** | High → Low | Best at start | | **Recency** | Low → High | Best at end | | **U-Shaped** | High → Low → High | Classic LITM | | **Middle-Sag** | Flat → Low → Flat | Only middle suffers | | **Flat** | ~Constant | No position effect | | **Inverted-U** | Low → High → Low | Middle best (rare) | ## Position Bias Index (PBI) ``` PBI = (acc_start + acc_end) / 2 - acc_middle ``` - **PBI > 0.3**: Strong U-shape - **PBI ≈ 0**: Flat (good) - **PBI < 0**: Inverted-U (rare) **Weighted PBI** accounts for full curve shape using Simpson's rule for AUC difference between edge and middle regions. ## 5 Task Types | Task | Cognitive Demand | Expected Bias | |------|----------------|---------------| | **KV Retrieval** | Simple lookup | Strong U-shape | | **Needle in Haystack** | Text search | Strong U-shape | | **Fact-Dependent Reasoning** | Reasoning + retrieval | Moderate U-shape | | **Summarization** | Comprehension + compression | Weak/moderate | | **Translation** | Understanding + generation | Task-dependent | ## Usage ```bash pip install -r requirements.txt # Single model, all tasks python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct # Multiple models python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct meta-llama/Llama-3.2-1B-Instruct # Specific tasks only python run_all.py --tasks kv_retrieval reasoning --num-examples 50 ``` ## Expected Headline Result > "PBI on retrieval tasks correlates poorly with PBI on reasoning tasks (r=0.12), suggesting position bias is not a unified model property but a task-dependent phenomenon." ## Output ``` results/ ├── taxonomy_Qwen_Qwen2.5-1.5B-Instruct.json ├── taxonomy_meta-llama_Llama-3.2-1B-Instruct.json └── ... ``` Each JSON contains task-level accuracies, PBI, classification, and cross-task statistics. ## Citation ```bibtex @software{position_bias_taxonomy, title={Position Bias Taxonomy: A Cross-Task Framework for Long-Context Evaluation}, author={abhshkp}, year={2026}, url={https://huggingface.co/abhshkp/position-bias-taxonomy} } ``` ## Generated by ML Intern This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. - Try ML Intern: https://smolagents-ml-intern.hf.space - Source code: https://github.com/huggingface/ml-intern