Spaces:

rajvivan
/

CUD-Traffic-AI

Running

App Files Files Community

CUD-Traffic-AI / README.md

rajvivan

Update README.md

cbc3062 verified 7 days ago

preview code

raw

history blame contribute delete

2.5 kB

	---
	title: CUD Traffic AI
	emoji: 🚦
	colorFrom: yellow
	colorTo: red
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# CUD - AAI - Midterm Project - Traffic Incident Summarization

	This repo compares extractive and abstractive summarization methods for traffic incident reports and ships with a polished React + FastAPI demo.

	## What is included

	- U.S. Accidents ingestion with automatic Kaggle download support
	- GCC regional track with bundled Dubai, Abu Dhabi, and UAE federal sample datasets
	- Rule-based GCC narrative generation so structured GCC records become natural-language incident reports
	- Baselines: Lead-1 and TextRank
	- Abstractive models: BART, Flan-T5, optional PEGASUS
	- Evaluation pipeline, notebooks, LaTeX paper draft, poster content, and a demo UI

	## GCC data note

	The repo now includes official source references for Dubai Pulse, Abu Dhabi Open Data, and UAE federal traffic statistics, along with normalized bundled sample files so the project runs immediately offline. This is the practical compromise because public GCC portals often expose structured records, JavaScript-only dashboards, or gated exports rather than ready-to-bundle narrative text.

	In the paper, describe the GCC track like this:

	> Structured GCC traffic records were normalized into a common schema and converted into operator-style narrative incident descriptions using a rule-based text generator. Official source references were retained for provenance, while bundled sample extracts were used to make the demo reproducible offline.

	## Quick start

	### 1. Python environment

	```bash
	python -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	```

	### 2. Prepare data

	```bash
	python -m src.cli.run_prepare --source both
	```

	Behavior:

	- If `data/raw/US_Accidents_March23.csv` is missing, the script attempts Kaggle download.
	- GCC sample sources are already bundled.
	- A combined corpus is written to `data/interim/combined_incident_corpus.csv`.

	### 3. Start backend

	```bash
	uvicorn backend.main:app --reload --port 8000
	```

	### 4. Start frontend

	```bash
	cd frontend
	npm install
	npm run dev
	```

	## Demo features

	- Beautiful hero dashboard for screenshots
	- Dataset track toggle: US or GCC
	- Sample incident picker
	- Summarize and compare endpoints
	- Copy and download summary cards
	- Batch CSV upload preview

	## Important paths

	- `data/raw/gcc/source_manifest.csv`
	- `data/interim/gcc_narratives.csv`
	- `docs/paper/main.tex`
	- `docs/poster/poster_content.md`