metadata
license: mit
task_categories:
- visual-question-answering
language:
- en
tags:
- hallucination-detection
- object-hallucination
- pope
- coco
- benchmark
size_categories:
- 1K<n<10K
RePOPE: Revisiting Partial Object Hallucination Evaluation
RePOPE is a re-annotated version of the POPE (Polling-based Object Probing Evaluation) benchmark with corrected ground-truth labels. It evaluates object hallucination in multimodal large language models (MLLMs) by asking yes/no questions about object existence in MSCOCO images.
Dataset Details
- Original Paper: RePOPE: Revisiting Partial Object Hallucination Evaluation
- Original Repository: https://github.com/YanNeu/RePOPE
- Images: MSCOCO 2014 (subset of 500 images)
Dataset Structure
Each row contains:
image: The MSCOCO image (struct withbytesandpath)image_id: COCO image identifier (e.g.,000000310196)question: A yes/no question about object presence (e.g., "Is there a snowboard in the image?")answer: Ground truth label (yesorno)category: Sampling strategy used to select the queried object (random,popular, oradversarial)
Splits
This dataset contains all three POPE sampling categories in a single split:
| Category | Count |
|---|---|
| random | 2,774 |
| popular | 2,727 |
| adversarial | 2,684 |
| Total | 8,185 |
Label Distribution
| Answer | Count |
|---|---|
| yes | 3,539 |
| no | 4,646 |
How to Use
from datasets import load_dataset
ds = load_dataset("MM-Hallu/RePOPE")
Citation
@misc{neuhaus2024repope,
title={RePOPE: Revisiting Partial Object Hallucination Evaluation},
author={Yannik Neuschwander and Selen Yu and Jordy Van Landeghem and Jan Van Loock and Lilian Ngweta and Rukiye Savran Kizildag and Desmond Elliott and Matthew B. Blaschko},
year={2024},
eprint={2405.14571},
archivePrefix={arXiv},
primaryClass={cs.CV}
}