Contradictory Evidence Position Bias Benchmark

A novel benchmark testing how LLMs resolve conflicting information placed at different positions in long contexts.

Core Research Question

When documents contain contradictory evidence at different positions, does the classic "Lost in the Middle" U-shape still hold? Or do models exhibit a "first-answer dominance" bias instead?

Experiments

# Experiment Setup What It Measures
1 Two-Document Contradiction Fact A at position X, Fact B at position Y Which position wins?
2 Three-Document Contradiction Three variants at start/middle/end Multi-way position dominance
3 Temporal Authority Same fact with timestamps (2020 vs 2024) Recency bias under contradiction

Metrics

  • Selection Accuracy: Does the model pick a valid answer (A/B/C)?
  • Position Dominance: Which position's fact is selected most often?
  • Uncertainty Rate: Does the model refuse to choose or express doubt?

Usage

pip install -r requirements.txt

# Run all experiments
python run_all.py --model Qwen/Qwen2.5-1.5B-Instruct --output ./results

# Run specific experiment
python run_all.py --experiments 2doc --num-examples 100

Expected Findings

Hypothesis Prediction
Classic LITM Middle-position facts are least likely to be selected
First-Answer Dominance Start-position facts dominate regardless of correctness
Recency Bias Newer timestamps override position effects
Middle-Vanishing When contradictions are close (both middle), accuracy drops

Citation

@software{contradiction_position_bias,
  title={Contradictory Evidence Position Bias Benchmark},
  author={abhshkp},
  year={2026},
  url={https://huggingface.co/abhshkp/contradiction-position-bias}
}

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support