Contradictory Evidence Position Bias Benchmark

A novel benchmark testing how LLMs resolve conflicting information placed at different positions in long contexts.

Core Research Question

When documents contain contradictory evidence at different positions, does the classic "Lost in the Middle" U-shape still hold? Or do models exhibit a "first-answer dominance" bias instead?

Experiments

#	Experiment	Setup	What It Measures
1	Two-Document Contradiction	Fact A at position X, Fact B at position Y	Which position wins?
2	Three-Document Contradiction	Three variants at start/middle/end	Multi-way position dominance
3	Temporal Authority	Same fact with timestamps (2020 vs 2024)	Recency bias under contradiction

Metrics

Selection Accuracy: Does the model pick a valid answer (A/B/C)?
Position Dominance: Which position's fact is selected most often?
Uncertainty Rate: Does the model refuse to choose or express doubt?

Usage

pip install -r requirements.txt

# Run all experiments
python run_all.py --model Qwen/Qwen2.5-1.5B-Instruct --output ./results

# Run specific experiment
python run_all.py --experiments 2doc --num-examples 100

Expected Findings

Hypothesis	Prediction
Classic LITM	Middle-position facts are least likely to be selected
First-Answer Dominance	Start-position facts dominate regardless of correctness
Recency Bias	Newer timestamps override position effects
Middle-Vanishing	When contradictions are close (both middle), accuracy drops

Citation

@software{contradiction_position_bias,
  title={Contradictory Evidence Position Bias Benchmark},
  author={abhshkp},
  year={2026},
  url={https://huggingface.co/abhshkp/contradiction-position-bias}
}

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support