File size: 2,501 Bytes
8c77584
 
 
 
 
 
cbc3062
8c77584
 
 
e078b1d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
title: CUD Traffic AI
emoji: 🚦
colorFrom: yellow
colorTo: red
sdk: docker
app_port: 7860
pinned: false
---

# CUD - AAI - Midterm Project - Traffic Incident Summarization

This repo compares extractive and abstractive summarization methods for traffic incident reports and ships with a polished React + FastAPI demo.

## What is included

- U.S. Accidents ingestion with automatic Kaggle download support
- GCC regional track with bundled Dubai, Abu Dhabi, and UAE federal sample datasets
- Rule-based GCC narrative generation so structured GCC records become natural-language incident reports
- Baselines: Lead-1 and TextRank
- Abstractive models: BART, Flan-T5, optional PEGASUS
- Evaluation pipeline, notebooks, LaTeX paper draft, poster content, and a demo UI

## GCC data note

The repo now includes **official source references** for Dubai Pulse, Abu Dhabi Open Data, and UAE federal traffic statistics, along with **normalized bundled sample files** so the project runs immediately offline. This is the practical compromise because public GCC portals often expose structured records, JavaScript-only dashboards, or gated exports rather than ready-to-bundle narrative text.

In the paper, describe the GCC track like this:

> Structured GCC traffic records were normalized into a common schema and converted into operator-style narrative incident descriptions using a rule-based text generator. Official source references were retained for provenance, while bundled sample extracts were used to make the demo reproducible offline.

## Quick start

### 1. Python environment

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

### 2. Prepare data

```bash
python -m src.cli.run_prepare --source both
```

Behavior:

- If `data/raw/US_Accidents_March23.csv` is missing, the script attempts Kaggle download.
- GCC sample sources are already bundled.
- A combined corpus is written to `data/interim/combined_incident_corpus.csv`.

### 3. Start backend

```bash
uvicorn backend.main:app --reload --port 8000
```

### 4. Start frontend

```bash
cd frontend
npm install
npm run dev
```

## Demo features

- Beautiful hero dashboard for screenshots
- Dataset track toggle: US or GCC
- Sample incident picker
- Summarize and compare endpoints
- Copy and download summary cards
- Batch CSV upload preview

## Important paths

- `data/raw/gcc/source_manifest.csv`
- `data/interim/gcc_narratives.csv`
- `docs/paper/main.tex`
- `docs/poster/poster_content.md`