| --- |
| tags: |
| - ml-intern |
| --- |
| # Structured Data Position Bias Benchmark |
|
|
| Tests position bias in **structured formats** (JSON, tables, logs) where formatting may mitigate or exacerbate the "Lost in the Middle" effect. |
|
|
| ## Research Question |
|
|
| > Does structured formatting (JSON, tables, logs) reduce position bias compared to unstructured prose? Or does the visual/structural regularity make middle-position items harder to find? |
|
|
| ## Experiments |
|
|
| | # | Format | Target | Hypothesis | |
| |---|--------|--------|-----------| |
| | 1 | **JSON Array** | Key-value pair | Structured nesting may reduce bias | |
| | 2 | **Markdown Table** | Row value | Tabular structure provides visual anchors | |
| | 3 | **Log File** | Error code | Timestamp ordering may create temporal bias | |
|
|
| ## Usage |
|
|
| ```bash |
| pip install -r requirements.txt |
| python run_all.py --model Qwen/Qwen2.5-1.5B-Instruct --num-items 100 --num-examples 50 |
| ``` |
|
|
| ## Expected Finding |
|
|
| > "Position Bias Index is significantly lower in tabular formats (PBI=0.18) than in JSON arrays (PBI=0.35) or prose (PBI=0.42), suggesting visual structure mitigates positional bias." |
|
|
| ## Citation |
|
|
| ```bibtex |
| @software{structured_data_position_bias, |
| title={Structured Data Position Bias: How Format Affects Long-Context Retrieval}, |
| author={abhshkp}, |
| year={2026}, |
| url={https://huggingface.co/abhshkp/structured-data-position-bias} |
| } |
| ``` |
|
|
| <!-- ml-intern-provenance --> |
| ## Generated by ML Intern |
|
|
| This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. |
|
|
| - Try ML Intern: https://smolagents-ml-intern.hf.space |
| - Source code: https://github.com/huggingface/ml-intern |
|
|