abhshkp
/

position-bias-taxonomy

Model card Files Files and versions

position-bias-taxonomy / README.md

abhshkp's picture

Update ML Intern artifact metadata

a3765a4 verified 6 days ago

|

history blame contribute delete

2.98 kB

	---
	tags:
	- ml-intern
	---
	# Position Bias Taxonomy: Cross-Task Framework

	A unified framework that evaluates and classifies position bias across different cognitive task types, revealing that position bias is not a single scalar but a task-dependent phenomenon.

	## Research Question

	> Does a model's position bias on retrieval tasks predict its position bias on reasoning, summarization, or translation tasks? Or are these independent dimensions of model behavior?

	## Position Bias Taxonomy

	\| Type \| Pattern \| Description \|
	\|------\|---------\|-------------\|
	\| Primacy \| High → Low \| Best at start \|
	\| Recency \| Low → High \| Best at end \|
	\| U-Shaped \| High → Low → High \| Classic LITM \|
	\| Middle-Sag \| Flat → Low → Flat \| Only middle suffers \|
	\| Flat \| ~Constant \| No position effect \|
	\| Inverted-U \| Low → High → Low \| Middle best (rare) \|

	## Position Bias Index (PBI)

	```
	PBI = (acc_start + acc_end) / 2 - acc_middle
	```

	- PBI > 0.3: Strong U-shape
	- PBI ≈ 0: Flat (good)
	- PBI < 0: Inverted-U (rare)

	Weighted PBI accounts for full curve shape using Simpson's rule for AUC difference between edge and middle regions.

	## 5 Task Types

	\| Task \| Cognitive Demand \| Expected Bias \|
	\|------\|----------------\|---------------\|
	\| KV Retrieval \| Simple lookup \| Strong U-shape \|
	\| Needle in Haystack \| Text search \| Strong U-shape \|
	\| Fact-Dependent Reasoning \| Reasoning + retrieval \| Moderate U-shape \|
	\| Summarization \| Comprehension + compression \| Weak/moderate \|
	\| Translation \| Understanding + generation \| Task-dependent \|

	## Usage

	```bash
	pip install -r requirements.txt

	# Single model, all tasks
	python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct

	# Multiple models
	python run_all.py --models Qwen/Qwen2.5-1.5B-Instruct meta-llama/Llama-3.2-1B-Instruct

	# Specific tasks only
	python run_all.py --tasks kv_retrieval reasoning --num-examples 50
	```

	## Expected Headline Result

	> "PBI on retrieval tasks correlates poorly with PBI on reasoning tasks (r=0.12), suggesting position bias is not a unified model property but a task-dependent phenomenon."

	## Output

	```
	results/
	├── taxonomy_Qwen_Qwen2.5-1.5B-Instruct.json
	├── taxonomy_meta-llama_Llama-3.2-1B-Instruct.json
	└── ...
	```

	Each JSON contains task-level accuracies, PBI, classification, and cross-task statistics.

	## Citation

	```bibtex
	@software{position_bias_taxonomy,
	title={Position Bias Taxonomy: A Cross-Task Framework for Long-Context Evaluation},
	author={abhshkp},
	year={2026},
	url={https://huggingface.co/abhshkp/position-bias-taxonomy}
	}
	```

	<!-- ml-intern-provenance -->
	## Generated by ML Intern

	This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.

	- Try ML Intern: https://smolagents-ml-intern.hf.space
	- Source code: https://github.com/huggingface/ml-intern