Spaces:
Running
Running
| title: CUD Traffic AI | |
| emoji: 🚦 | |
| colorFrom: yellow | |
| colorTo: red | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| # CUD - AAI - Midterm Project - Traffic Incident Summarization | |
| This repo compares extractive and abstractive summarization methods for traffic incident reports and ships with a polished React + FastAPI demo. | |
| ## What is included | |
| - U.S. Accidents ingestion with automatic Kaggle download support | |
| - GCC regional track with bundled Dubai, Abu Dhabi, and UAE federal sample datasets | |
| - Rule-based GCC narrative generation so structured GCC records become natural-language incident reports | |
| - Baselines: Lead-1 and TextRank | |
| - Abstractive models: BART, Flan-T5, optional PEGASUS | |
| - Evaluation pipeline, notebooks, LaTeX paper draft, poster content, and a demo UI | |
| ## GCC data note | |
| The repo now includes **official source references** for Dubai Pulse, Abu Dhabi Open Data, and UAE federal traffic statistics, along with **normalized bundled sample files** so the project runs immediately offline. This is the practical compromise because public GCC portals often expose structured records, JavaScript-only dashboards, or gated exports rather than ready-to-bundle narrative text. | |
| In the paper, describe the GCC track like this: | |
| > Structured GCC traffic records were normalized into a common schema and converted into operator-style narrative incident descriptions using a rule-based text generator. Official source references were retained for provenance, while bundled sample extracts were used to make the demo reproducible offline. | |
| ## Quick start | |
| ### 1. Python environment | |
| ```bash | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Prepare data | |
| ```bash | |
| python -m src.cli.run_prepare --source both | |
| ``` | |
| Behavior: | |
| - If `data/raw/US_Accidents_March23.csv` is missing, the script attempts Kaggle download. | |
| - GCC sample sources are already bundled. | |
| - A combined corpus is written to `data/interim/combined_incident_corpus.csv`. | |
| ### 3. Start backend | |
| ```bash | |
| uvicorn backend.main:app --reload --port 8000 | |
| ``` | |
| ### 4. Start frontend | |
| ```bash | |
| cd frontend | |
| npm install | |
| npm run dev | |
| ``` | |
| ## Demo features | |
| - Beautiful hero dashboard for screenshots | |
| - Dataset track toggle: US or GCC | |
| - Sample incident picker | |
| - Summarize and compare endpoints | |
| - Copy and download summary cards | |
| - Batch CSV upload preview | |
| ## Important paths | |
| - `data/raw/gcc/source_manifest.csv` | |
| - `data/interim/gcc_narratives.csv` | |
| - `docs/paper/main.tex` | |
| - `docs/poster/poster_content.md` | |