File size: 6,160 Bytes
045a6a9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | ---
annotations_creators:
- expert-generated
language:
- en
language_creators:
- expert-generated
license: cc-by-sa-4.0
multimodality:
- video
- text
pretty_name: 'SynWTS: Synthetic Woven Traffic Safety Dataset'
size_categories:
- 100<n<1K
source_datasets:
- WTS (Woven Traffic Safety)
tags:
- traffic-safety
- sim2real
- video-captioning
- vqa
- autonomous-driving
- ai-city-challenge
- vlm
- multimodal
- description
task_categories:
- visual-question-answering
- video-classification
- question-answering
- text-generation
- video-text-to-text
task_ids:
- visual-question-answering
- natural-language-inference
- closed-domain-qa
- multiple-choice-qa
contact: David C. Anastasiu danastasiu@scu.edu
---
# SynWTS: Synthetic Woven Traffic Safety Dataset
SynWTS is a high-fidelity synthetic dataset built as a **Digital Twin** of the [Woven Traffic Safety (WTS) dataset](https://woven-visionai.github.io/wts-dataset-homepage/). It is developed for the [**2026 AI City Challenge (Track 2)**](https://www.aicitychallenge.org/2026-track2/) to advance Sim2Real research in transportation safety understanding.
## Dataset Summary
Participants in the Sim2Real challenge must train models exclusively on this synthetic data and evaluate performance on real-world video. SynWTS provides a geometric match to real-world test locations, focusing on pedestrian-involved incidents with multi-view 1080p video, structured temporal captions, and complex Visual Question Answering (VQA) pairs.
### Key Features
- **Sim2Real Benchmark:** Specifically designed to bridge the gap between NVIDIA Isaac Sim environments and real-world traffic scenarios.
- **Multi-View Perception:** Synchronized views from overhead infrastructure cameras and vehicle-ego perspectives.
- **Temporal Segmentation:** Scenarios are partitioned into five safety-critical phases: *Pre-recognition, Recognition, Judgment, Action, and Avoidance.*
- **Structured Annotations:** Descriptions cover four pillars: **Location, Attention, Behavior, and Context.**
---
## Dataset Structure
### Directory Layout
```text
data/
βββ videos/
β βββ {split}/{scenario}/{view}/*.mp4
βββ annotations/
β βββ caption/
β β βββ {split}/{scenario}/{view}/{scenario}_caption.json
β βββ bbox_annotated/
β β βββ pedestrian/{split}/{scenario}/{view}/{scenario}_{camera_id}_bbox.json
β β βββ vehicle/{split}/{scenario}/overhead_view/{scenario}_{camera_id}_bbox.json
β βββ vqa/
β βββ {split}/{scenario}/{view}/{scenario}.json
```
*{split} = train | val | test*
*{view} = overhead_view | vehicle_view | environment*
*{camera_id} = {camera_ip_address}_{direction_id} | vehicle_view*
### Data Fields & Samples
#### 1. Fine-Grained Captions
Captions are generated from a checklist of 170+ traffic items. Each event phase contains a distinct caption for the pedestrian and the vehicle. We used the same annotations as in the WTS dataset and only updated necessary details that could not be simulated in the current version.
**Sample (from overhead_view_caption.json):**
```json
{
"id": 765,
"event_phase": [
{
"labels": ["4"],
"caption_pedestrian": "The pedestrian was a male in his 30s walking slowly... He was standing close behind a vehicle... Although he almost noticed the vehicle, he seemed unaware of it.",
"caption_vehicle": "The vehicle was on the left side of the pedestrian and was close to them... The vehicle slightly collided with the pedestrian while moving at a speed of 0 km/h.",
"start_time": "8.993",
"end_time": "14.903"
}
]
}
```
#### 2. Visual Question Answering (VQA)
Includes multiple-choice questions covering position, distance, visibility, and actions.
**Sample (from vqa-vehicle_view.json):**
```json
{
"question": "What is the action taken by vehicle?",
"a": "Swerved to the left to avoid",
"b": "Swerved to the right, but could not avoid",
"c": "Tried sudden braking but could not avoid",
"d": "Collided with the pedestrian",
"correct": "d"
}
```
---
## Technical Specifications & Limitations
### Digital Twin Characteristics
- **Environmental Fidelity:** Roads and buildings are a close geometric match to real-world WTS locations.
- **No 3D Gaze:** Unlike the original WTS, 3D gaze and head bounding boxes are not included due to simulation constraints.
- **Character Dynamics:** Poses are simulated and may not perfectly replicate real-world physics.
- **Object Limitations:** Characters do not hold hand-held objects (umbrellas, phones) that may appear in the real-world test set. Labels/VQA have been adjusted accordingly.
---
## Test Set
The dataset only includes the `train` and `val` sets of the data. The test set will be the "internal" or "main" subset of the [WTS Dataset](https://github.com/woven-visionai/wts-dataset). Note that the WTS dataset also contais a BDD_PC_5K subset in its train/val/test splits that will not be used for this challenge since synthetic versions of those scenarios are not included in our training and validation sets.
---
## Release Schedule
- **Initial Release:** 80 scenarios (May 1, 2026)
- **Mid-May Update:** 144 scenarios (May 11, 2026)
- **Final Dataset:** ~249 scenarios total (Expected May 25, 2026).
---
## Team & Credits
### Santa Clara University
Dhanishtha Patil, Ridham Kachhadiya, Andrew Vattuone, and David C. Anastasiu
### NVIDIA
Haoquan Liang, Jiajun Li, Yuxing Wang, and Thomas Tang
### Woven by Toyota
Ashutosh Kumar and Quan Kong
**Point of Contact:**
For questions regarding the SynWTS dataset or the AI City Challenge Track 2, please contact:
> David C. Anastasiu
>
> Email: danastasiu@scu.edu
---
## Citation
Please cite the original WTS paper and the 2026 AI City Challenge:
```bibtex
@article{kong2024wts,
title={WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding},
author={Kong, Quan and Kumar, Ashutosh and others},
journal={arXiv preprint arXiv:2407.15350},
year={2024}
}
```
Stay tuned for an updated citation to our dataset paper. |