Structured Data Position Bias Benchmark
Tests position bias in structured formats (JSON, tables, logs) where formatting may mitigate or exacerbate the "Lost in the Middle" effect.
Research Question
Does structured formatting (JSON, tables, logs) reduce position bias compared to unstructured prose? Or does the visual/structural regularity make middle-position items harder to find?
Experiments
| # | Format | Target | Hypothesis |
|---|---|---|---|
| 1 | JSON Array | Key-value pair | Structured nesting may reduce bias |
| 2 | Markdown Table | Row value | Tabular structure provides visual anchors |
| 3 | Log File | Error code | Timestamp ordering may create temporal bias |
Usage
pip install -r requirements.txt
python run_all.py --model Qwen/Qwen2.5-1.5B-Instruct --num-items 100 --num-examples 50
Expected Finding
"Position Bias Index is significantly lower in tabular formats (PBI=0.18) than in JSON arrays (PBI=0.35) or prose (PBI=0.42), suggesting visual structure mitigates positional bias."
Citation
@software{structured_data_position_bias,
title={Structured Data Position Bias: How Format Affects Long-Context Retrieval},
author={abhshkp},
year={2026},
url={https://huggingface.co/abhshkp/structured-data-position-bias}
}
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support