dragosanastasiu commited on
Commit
045a6a9
Β·
verified Β·
1 Parent(s): 238ac05

Update README

Browse files
Files changed (1) hide show
  1. README.md +170 -0
README.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ annotations_creators:
3
+ - expert-generated
4
+ language:
5
+ - en
6
+ language_creators:
7
+ - expert-generated
8
+ license: cc-by-sa-4.0
9
+ multimodality:
10
+ - video
11
+ - text
12
+ pretty_name: 'SynWTS: Synthetic Woven Traffic Safety Dataset'
13
+ size_categories:
14
+ - 100<n<1K
15
+ source_datasets:
16
+ - WTS (Woven Traffic Safety)
17
+ tags:
18
+ - traffic-safety
19
+ - sim2real
20
+ - video-captioning
21
+ - vqa
22
+ - autonomous-driving
23
+ - ai-city-challenge
24
+ - vlm
25
+ - multimodal
26
+ - description
27
+ task_categories:
28
+ - visual-question-answering
29
+ - video-classification
30
+ - question-answering
31
+ - text-generation
32
+ - video-text-to-text
33
+ task_ids:
34
+ - visual-question-answering
35
+ - natural-language-inference
36
+ - closed-domain-qa
37
+ - multiple-choice-qa
38
+ contact: David C. Anastasiu danastasiu@scu.edu
39
+ ---
40
+
41
+ # SynWTS: Synthetic Woven Traffic Safety Dataset
42
+
43
+ SynWTS is a high-fidelity synthetic dataset built as a **Digital Twin** of the [Woven Traffic Safety (WTS) dataset](https://woven-visionai.github.io/wts-dataset-homepage/). It is developed for the [**2026 AI City Challenge (Track 2)**](https://www.aicitychallenge.org/2026-track2/) to advance Sim2Real research in transportation safety understanding.
44
+
45
+ ## Dataset Summary
46
+ Participants in the Sim2Real challenge must train models exclusively on this synthetic data and evaluate performance on real-world video. SynWTS provides a geometric match to real-world test locations, focusing on pedestrian-involved incidents with multi-view 1080p video, structured temporal captions, and complex Visual Question Answering (VQA) pairs.
47
+
48
+ ### Key Features
49
+ - **Sim2Real Benchmark:** Specifically designed to bridge the gap between NVIDIA Isaac Sim environments and real-world traffic scenarios.
50
+ - **Multi-View Perception:** Synchronized views from overhead infrastructure cameras and vehicle-ego perspectives.
51
+ - **Temporal Segmentation:** Scenarios are partitioned into five safety-critical phases: *Pre-recognition, Recognition, Judgment, Action, and Avoidance.*
52
+ - **Structured Annotations:** Descriptions cover four pillars: **Location, Attention, Behavior, and Context.**
53
+
54
+ ---
55
+
56
+ ## Dataset Structure
57
+
58
+ ### Directory Layout
59
+ ```text
60
+ data/
61
+ β”œβ”€β”€ videos/
62
+ β”‚ └── {split}/{scenario}/{view}/*.mp4
63
+ β”œβ”€β”€ annotations/
64
+ β”‚ β”œβ”€β”€ caption/
65
+ β”‚ β”‚ └── {split}/{scenario}/{view}/{scenario}_caption.json
66
+ β”‚ β”œβ”€β”€ bbox_annotated/
67
+ β”‚ β”‚ β”œβ”€β”€ pedestrian/{split}/{scenario}/{view}/{scenario}_{camera_id}_bbox.json
68
+ β”‚ β”‚ └── vehicle/{split}/{scenario}/overhead_view/{scenario}_{camera_id}_bbox.json
69
+ β”‚ └── vqa/
70
+ β”‚ └── {split}/{scenario}/{view}/{scenario}.json
71
+ ```
72
+ *{split} = train | val | test*
73
+
74
+ *{view} = overhead_view | vehicle_view | environment*
75
+
76
+ *{camera_id} = {camera_ip_address}_{direction_id} | vehicle_view*
77
+
78
+ ### Data Fields & Samples
79
+
80
+ #### 1. Fine-Grained Captions
81
+ Captions are generated from a checklist of 170+ traffic items. Each event phase contains a distinct caption for the pedestrian and the vehicle. We used the same annotations as in the WTS dataset and only updated necessary details that could not be simulated in the current version.
82
+
83
+ **Sample (from overhead_view_caption.json):**
84
+ ```json
85
+ {
86
+ "id": 765,
87
+ "event_phase": [
88
+ {
89
+ "labels": ["4"],
90
+ "caption_pedestrian": "The pedestrian was a male in his 30s walking slowly... He was standing close behind a vehicle... Although he almost noticed the vehicle, he seemed unaware of it.",
91
+ "caption_vehicle": "The vehicle was on the left side of the pedestrian and was close to them... The vehicle slightly collided with the pedestrian while moving at a speed of 0 km/h.",
92
+ "start_time": "8.993",
93
+ "end_time": "14.903"
94
+ }
95
+ ]
96
+ }
97
+ ```
98
+
99
+ #### 2. Visual Question Answering (VQA)
100
+ Includes multiple-choice questions covering position, distance, visibility, and actions.
101
+
102
+ **Sample (from vqa-vehicle_view.json):**
103
+ ```json
104
+ {
105
+ "question": "What is the action taken by vehicle?",
106
+ "a": "Swerved to the left to avoid",
107
+ "b": "Swerved to the right, but could not avoid",
108
+ "c": "Tried sudden braking but could not avoid",
109
+ "d": "Collided with the pedestrian",
110
+ "correct": "d"
111
+ }
112
+ ```
113
+
114
+ ---
115
+
116
+ ## Technical Specifications & Limitations
117
+
118
+ ### Digital Twin Characteristics
119
+ - **Environmental Fidelity:** Roads and buildings are a close geometric match to real-world WTS locations.
120
+ - **No 3D Gaze:** Unlike the original WTS, 3D gaze and head bounding boxes are not included due to simulation constraints.
121
+ - **Character Dynamics:** Poses are simulated and may not perfectly replicate real-world physics.
122
+ - **Object Limitations:** Characters do not hold hand-held objects (umbrellas, phones) that may appear in the real-world test set. Labels/VQA have been adjusted accordingly.
123
+
124
+ ---
125
+
126
+ ## Test Set
127
+
128
+ The dataset only includes the `train` and `val` sets of the data. The test set will be the "internal" or "main" subset of the [WTS Dataset](https://github.com/woven-visionai/wts-dataset). Note that the WTS dataset also contais a BDD_PC_5K subset in its train/val/test splits that will not be used for this challenge since synthetic versions of those scenarios are not included in our training and validation sets.
129
+
130
+ ---
131
+
132
+ ## Release Schedule
133
+ - **Initial Release:** 80 scenarios (May 1, 2026)
134
+ - **Mid-May Update:** 144 scenarios (May 11, 2026)
135
+ - **Final Dataset:** ~249 scenarios total (Expected May 25, 2026).
136
+
137
+ ---
138
+
139
+ ## Team & Credits
140
+
141
+ ### Santa Clara University
142
+ Dhanishtha Patil, Ridham Kachhadiya, Andrew Vattuone, and David C. Anastasiu
143
+
144
+ ### NVIDIA
145
+ Haoquan Liang, Jiajun Li, Yuxing Wang, and Thomas Tang
146
+
147
+ ### Woven by Toyota
148
+ Ashutosh Kumar and Quan Kong
149
+
150
+ **Point of Contact:**
151
+
152
+ For questions regarding the SynWTS dataset or the AI City Challenge Track 2, please contact:
153
+ > David C. Anastasiu
154
+ >
155
+ > Email: danastasiu@scu.edu
156
+
157
+ ---
158
+
159
+ ## Citation
160
+ Please cite the original WTS paper and the 2026 AI City Challenge:
161
+
162
+ ```bibtex
163
+ @article{kong2024wts,
164
+ title={WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding},
165
+ author={Kong, Quan and Kumar, Ashutosh and others},
166
+ journal={arXiv preprint arXiv:2407.15350},
167
+ year={2024}
168
+ }
169
+ ```
170
+ Stay tuned for an updated citation to our dataset paper.