Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# ChronoQA
|
| 6 |
+
|
| 7 |
+
ChronoQA is a **passage-grounded** benchmark that tests whether retrieval-augmented generation (RAG) systems can keep **temporal** and **causal** facts straight when reading long-form narratives (novels, scripts, etc.).
|
| 8 |
+
Instead of giving the entire book to the model, ChronoQA forces a RAG pipeline to *retrieve the right snippets* and reason about evolving characters and event sequences.
|
| 9 |
+
|
| 10 |
+
| | |
|
| 11 |
+
|-------------------------------|------------------------------------|
|
| 12 |
+
| **Instances** | 1,028 question–answer pairs |
|
| 13 |
+
| **Narratives** | 18 public-domain stories |
|
| 14 |
+
| **Reasoning facets** | 8 (causal, character, setting, …) |
|
| 15 |
+
| **Evidence** | Exact byte-offsets for each answer |
|
| 16 |
+
| **Language** | English |
|
| 17 |
+
| **Intended use** | Evaluate/train RAG systems that need chronology & causality |
|
| 18 |
+
| **License (annotations)** | CC-BY-NC-SA-4.0 |
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## Dataset Description
|
| 23 |
+
|
| 24 |
+
### Motivation
|
| 25 |
+
Standard RAG pipelines often lose chronological order and collapse every mention of an entity into a single node. ChronoQA highlights the failures that follow. Example:
|
| 26 |
+
|
| 27 |
+
*"Who was jinxing Harry's broom during his **first** Quidditch match?"* – a system that only retrieves early chapters may wrongly answer *Snape* instead of *Quirrell*.
|
| 28 |
+
|
| 29 |
+
### Source Stories
|
| 30 |
+
All texts come from Project Gutenberg (public domain in the US).
|
| 31 |
+
|
| 32 |
+
| ID | Title | # Q |
|
| 33 |
+
|----|-------|----|
|
| 34 |
+
| 1 | *A Study in Scarlet* | 67 |
|
| 35 |
+
| 2 | *The Hound of the Baskervilles* | 55 |
|
| 36 |
+
| 3 | *Harry Potter and the Chamber of Secrets* | 30 |
|
| 37 |
+
| 4 | *Harry Potter and the Sorcerer's Stone* | 25 |
|
| 38 |
+
| 5 | *Les Misérables* | 72 |
|
| 39 |
+
| 6 | *The Phantom of the Opera* | 70 |
|
| 40 |
+
| 7 | *The Sign of the Four* | 62 |
|
| 41 |
+
| 8 | *The Wonderful Wizard of Oz* | 82 |
|
| 42 |
+
| 9 | *The Adventures of Sherlock Holmes* | 34 |
|
| 43 |
+
| 10 | *Lady Susan* | 88 |
|
| 44 |
+
| 11 | *Dangerous Connections* | 111 |
|
| 45 |
+
| 12 | *The Picture of Dorian Gray* | 27 |
|
| 46 |
+
| 13 | *The Diary of a Nobody* | 39 |
|
| 47 |
+
| 14 | *The Sorrows of Young Werther* | 58 |
|
| 48 |
+
| 15 | *The Mysterious Affair at Styles* | 69 |
|
| 49 |
+
| 16 | *Pride and Prejudice* | 54 |
|
| 50 |
+
| 17 | *The Secret Garden* | 61 |
|
| 51 |
+
| 18 | *Anne of Green Gables* | 24 |
|
| 52 |
+
|
| 53 |
+
### Reasoning Facets
|
| 54 |
+
1. **Causal Consistency**
|
| 55 |
+
2. **Character & Behavioural Consistency**
|
| 56 |
+
3. **Setting, Environment & Atmosphere**
|
| 57 |
+
4. **Symbolism, Imagery & Motifs**
|
| 58 |
+
5. **Thematic, Philosophical & Moral**
|
| 59 |
+
6. **Narrative & Plot Structure**
|
| 60 |
+
7. **Social, Cultural & Political**
|
| 61 |
+
8. **Emotional & Psychological**
|
| 62 |
+
|
| 63 |
+
---
|
| 64 |
+
|
| 65 |
+
## Dataset Structure
|
| 66 |
+
|
| 67 |
+
| Field | Type | Description |
|
| 68 |
+
|-------|------|-------------|
|
| 69 |
+
| `story_id` | `string` | ID of the narrative |
|
| 70 |
+
| `question_id` | `int32` | QA index within that story |
|
| 71 |
+
| `category` | `string` | One of the 8 reasoning facets |
|
| 72 |
+
| `query` | `string` | Natural-language question |
|
| 73 |
+
| `ground_truth` | `string` | Gold answer |
|
| 74 |
+
| `passages` | **`sequence` of objects** | Each object contains: <br> • `start_sentence` `string` <br> • `end_sentence` `string` <br> • `start_byte` `int32` <br> • `end_byte` `int32` <br> • `excerpt` `string` |
|
| 75 |
+
| `story_title`* | `string` | Human-readable title (optional, present in processed splits) |
|
| 76 |
+
|
| 77 |
+
\*The raw JSONL released with the paper does **not** include `story_title`; it is added automatically in the hosted HF dataset for convenience.
|
| 78 |
+
|
| 79 |
+
There is a single **all** split (1,028 rows). Create your own train/validation/test splits if needed (e.g. by story or by reasoning facet).
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## Usage Example
|
| 84 |
+
|
| 85 |
+
```python
|
| 86 |
+
from datasets import load_dataset
|
| 87 |
+
|
| 88 |
+
ds = load_dataset("your-org/chronoqa", split="all")
|
| 89 |
+
example = ds[0]
|
| 90 |
+
|
| 91 |
+
print("Question:", example["query"])
|
| 92 |
+
print("Answer :", example["ground_truth"])
|
| 93 |
+
print("Evidence:", example["passages"][0]["excerpt"][:300], "…")
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
## Citation Information
|
| 97 |
+
```
|
| 98 |
+
@article{zhang2025respecting,
|
| 99 |
+
title={Respecting Temporal-Causal Consistency: Entity-Event Knowledge Graphs for Retrieval-Augmented Generation},
|
| 100 |
+
author={Zhang, Ze Yu and Li, Zitao and Li, Yaliang and Ding, Bolin and Low, Bryan Kian Hsiang},
|
| 101 |
+
journal={arXiv preprint arXiv:2506.05939},
|
| 102 |
+
year={2025}
|
| 103 |
+
}
|
| 104 |
+
```
|