Debayan Mandal commited on
Commit
acda8b7
·
1 Parent(s): 139c8bc

Initial Dashboard Upload

Browse files
Files changed (8) hide show
  1. .gitignore +4 -0
  2. Dockerfile +24 -0
  3. README.md +57 -7
  4. app.py +790 -0
  5. assets/styles.css +42 -0
  6. dashboard_helpers.py +471 -0
  7. data_pipeline.py +274 -0
  8. requirements.txt +10 -0
.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ *.db
2
+ *.csv
3
+ __pycache__/
4
+ *.pyc
Dockerfile ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ RUN apt-get update && apt-get install -y --no-install-recommends \
4
+ libgdal-dev \
5
+ gdal-bin \
6
+ libgeos-dev \
7
+ libproj-dev \
8
+ && rm -rf /var/lib/apt/lists/*
9
+
10
+ WORKDIR /app
11
+
12
+ COPY requirements.txt .
13
+ RUN pip install --no-cache-dir -r requirements.txt
14
+
15
+ COPY data_pipeline.py .
16
+ COPY dashboard_helpers.py .
17
+ COPY app.py .
18
+ COPY assets/ assets/
19
+
20
+ RUN python data_pipeline.py
21
+
22
+ EXPOSE 7860
23
+
24
+ CMD ["gunicorn", "app:server", "--bind", "0.0.0.0:7860", "--workers", "2", "--timeout", "120"]
README.md CHANGED
@@ -1,12 +1,62 @@
1
  ---
2
- title: Sf Taxi Equity Dashboard Plotlydash
3
- emoji: 🦀
4
  colorFrom: blue
5
- colorTo: green
6
  sdk: docker
7
- pinned: false
8
- license: mit
9
- short_description: A Plotly Dash Web App for SF Taxi Neighborhoods
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: SF Taxi Mobility Equity Dashboard
3
+ emoji: 🚕
4
  colorFrom: blue
5
+ colorTo: yellow
6
  sdk: docker
7
+ app_port: 7860
8
+ pinned: True
 
9
  ---
10
 
11
+ # SF Taxi Mobility Equity Dashboard
12
+
13
+ An interactive Plotly Dash dashboard analyzing spatial equity in San Francisco taxi services. Compares **Street-Hail** vs **App-Based** trip patterns across SF's 41 Analysis Neighborhoods and evaluates service representation relative to demographic baselines using Representative Ratios.
14
+
15
+ _Debayan Mandal_
16
+
17
+ ## Features
18
+
19
+ - **Interactive choropleth maps** — click any neighborhood to cross-filter all views
20
+ - **Representative Ratio visualizations** — bar chart and heatmap showing the central equity metric (overrepresentation vs underrepresentation by demographic group)
21
+ - **Neighborhood detail panel** — click to see full profile: trips, demographics, deviations, and trends
22
+ - **Monthly comparison** — side-by-side difference maps revealing temporal trends
23
+ - **Dynamic narrative** — auto-generated equity insights that update with your selections
24
+ - **CSV data export** — download filtered trip + demographic data
25
+ - **Publication-ready image export** — high-resolution PNG via the camera icon on each map
26
+ - **Guided tour** — step-by-step walkthrough for non-technical audiences
27
+ - **Colorblind-safe palettes** — Viridis, Cividis, and RdBu scales
28
+
29
+ ## Data Sources
30
+
31
+ - **SF Taxi Trips**: [DataSF Taxi Trips (m8hk-2ipk)](https://data.sfgov.org/Transportation/Taxi-Trips/m8hk-2ipk/)
32
+ - **SF Analysis Neighborhoods**: [DataSF Analysis Neighborhoods (j2bu-swwd)](https://data.sfgov.org/resource/j2bu-swwd.geojson)
33
+ - **Census Demographics**: [ACS 5-Year 2022, Table B02001](https://api.census.gov/data/2022/acs/acs5.html), Block Groups for SF County (FIPS 06075)
34
+ - **Block Group Geometries**: [TIGER/Line 2022](https://www2.census.gov/geo/tiger/TIGER2022/BG/tl_2022_06_bg.zip)
35
+
36
+ ## Local Setup
37
+
38
+ ```bash
39
+ pip install -r requirements.txt
40
+ python data_pipeline.py # builds sf_dashboard.db
41
+ python app.py # opens dashboard at http://localhost:7860
42
+ ```
43
+
44
+ ## Docker
45
+
46
+ ```bash
47
+ docker build -t sf-taxi-dashboard .
48
+ docker run -p 7860:7860 sf-taxi-dashboard
49
+ ```
50
+
51
+ Then open http://localhost:7860.
52
+
53
+ ## Architecture
54
+
55
+ | File | Purpose |
56
+ |------|---------|
57
+ | `data_pipeline.py` | Downloads taxi trips, neighborhoods, and census data; builds `sf_dashboard.db` |
58
+ | `dashboard_helpers.py` | Plotly figure builders and data query helpers |
59
+ | `app.py` | Plotly Dash application layout and callbacks |
60
+ | `assets/styles.css` | Custom CSS |
61
+ | `requirements.txt` | Python dependencies |
62
+ | `Dockerfile` | Containerization for Hugging Face Spaces |
app.py ADDED
@@ -0,0 +1,790 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import dash
2
+ from dash import dcc, html, Input, Output, State, callback, ctx, dash_table
3
+ import dash_bootstrap_components as dbc
4
+ import duckdb
5
+
6
+ from dashboard_helpers import (
7
+ get_neighborhood_geojson,
8
+ get_all_neighborhoods,
9
+ build_trip_choropleth,
10
+ build_demo_choropleth,
11
+ build_rr_bar_chart,
12
+ build_rr_heatmap,
13
+ build_neighborhood_profile,
14
+ build_comparison_map,
15
+ get_trip_stats_df,
16
+ get_download_csv,
17
+ _GRAPH_CONFIG,
18
+ )
19
+
20
+ DB_PATH = "sf_dashboard.db"
21
+
22
+ def get_con():
23
+ c = duckdb.connect(DB_PATH, read_only=True)
24
+ c.install_extension("spatial")
25
+ c.load_extension("spatial")
26
+ return c
27
+
28
+ _init_con = get_con()
29
+ GEOJSON = get_neighborhood_geojson(_init_con)
30
+ NEIGHBORHOODS = get_all_neighborhoods(_init_con)
31
+ MONTHS = _init_con.sql(
32
+ "SELECT DISTINCT month FROM trip_counts_pu ORDER BY month"
33
+ ).df()["month"].tolist()
34
+
35
+ baseline_df = _init_con.sql("SELECT * FROM city_baselines").df()
36
+ BASELINE_WHITE = float(baseline_df["baseline_white_pct"].iloc[0])
37
+ BASELINE_ASIAN = float(baseline_df["baseline_asian_pct"].iloc[0])
38
+ _init_con.close()
39
+
40
+ app = dash.Dash(
41
+ __name__,
42
+ external_stylesheets=[dbc.themes.DARKLY],
43
+ meta_tags=[
44
+ {"name": "viewport", "content": "width=device-width, initial-scale=1"}
45
+ ],
46
+ title="SF Taxi Mobility Equity Dashboard",
47
+ )
48
+ server = app.server
49
+
50
+ sidebar = dbc.Card(
51
+ [
52
+ html.H4("Controls", className="text-center mb-3"),
53
+ html.Hr(),
54
+ dbc.Label("Month"),
55
+ dcc.Dropdown(
56
+ id="month-selector",
57
+ options=[{"label": m, "value": m} for m in MONTHS],
58
+ value="Jan2024",
59
+ clearable=False,
60
+ className="mb-3",
61
+ ),
62
+ dbc.Label("Hail Type"),
63
+ dbc.Checklist(
64
+ id="hail-type-filter",
65
+ options=[
66
+ {"label": " Street-Hail", "value": "Street"},
67
+ {"label": " App-Based", "value": "App"},
68
+ ],
69
+ value=["Street", "App"],
70
+ className="mb-3",
71
+ ),
72
+ html.Hr(),
73
+ html.Div(id="selected-nhood-display", className="mb-3"),
74
+ dbc.Button(
75
+ "Reset Selection",
76
+ id="reset-selection-btn",
77
+ color="secondary",
78
+ size="sm",
79
+ className="w-100 mb-2",
80
+ ),
81
+ dbc.Button(
82
+ "Download CSV",
83
+ id="download-btn",
84
+ color="info",
85
+ size="sm",
86
+ className="w-100 mb-2",
87
+ ),
88
+ dbc.Button(
89
+ "Download GeoJSON",
90
+ id="download-geojson-btn",
91
+ color="success",
92
+ size="sm",
93
+ className="w-100 mb-2",
94
+ ),
95
+ dcc.Download(id="csv-download"),
96
+ dcc.Download(id="geojson-download"),
97
+ ],
98
+ body=True,
99
+ id="sidebar",
100
+ className="bg-dark",
101
+ style={
102
+ "position": "sticky",
103
+ "top": "10px",
104
+ "height": "calc(100vh - 20px)",
105
+ "overflowY": "auto",
106
+ },
107
+ )
108
+
109
+ _STAT_CARD_STYLE = {"height": "100%"}
110
+
111
+ insights_banner = dbc.Row(
112
+ [
113
+ dbc.Col(
114
+ dbc.Card(
115
+ [
116
+ html.H3(id="stat-total-trips", className="text-center mb-0"),
117
+ html.P("Total Trips", className="text-center text-muted small"),
118
+ ],
119
+ body=True,
120
+ className="bg-dark border-secondary",
121
+ style=_STAT_CARD_STYLE,
122
+ ),
123
+ md=3,
124
+ ),
125
+ dbc.Col(
126
+ dbc.Card(
127
+ [
128
+ html.H3(id="stat-top-nhood", className="text-center mb-0",
129
+ style={"fontSize": "1.6rem"}),
130
+ html.P("Most Served", className="text-center text-muted small"),
131
+ ],
132
+ body=True,
133
+ className="bg-dark border-secondary",
134
+ style=_STAT_CARD_STYLE,
135
+ ),
136
+ md=3,
137
+ ),
138
+ dbc.Col(
139
+ dbc.Card(
140
+ [
141
+ html.H3(id="stat-rr-white", className="text-center mb-0"),
142
+ html.P("White RR", className="text-center text-muted small"),
143
+ ],
144
+ body=True,
145
+ className="bg-dark border-secondary",
146
+ style=_STAT_CARD_STYLE,
147
+ ),
148
+ md=3,
149
+ ),
150
+ dbc.Col(
151
+ dbc.Card(
152
+ [
153
+ html.H3(id="stat-rr-asian", className="text-center mb-0"),
154
+ html.P("Asian RR", className="text-center text-muted small"),
155
+ ],
156
+ body=True,
157
+ className="bg-dark border-secondary",
158
+ style=_STAT_CARD_STYLE,
159
+ ),
160
+ md=3,
161
+ ),
162
+ ],
163
+ id="insights-banner",
164
+ className="mb-2 g-2",
165
+ )
166
+
167
+ narrative_row = dbc.Row(
168
+ dbc.Col(
169
+ html.P(
170
+ id="narrative-text",
171
+ className="text-center fst-italic",
172
+ style={"color": "#adb5bd", "fontSize": "0.95rem"},
173
+ ),
174
+ ),
175
+ className="mb-3",
176
+ )
177
+
178
+ trip_maps = dbc.Row(
179
+ [
180
+ dbc.Col(
181
+ dcc.Graph(id="street-pu-map", config=_GRAPH_CONFIG),
182
+ md=6,
183
+ ),
184
+ dbc.Col(
185
+ dcc.Graph(id="app-pu-map", config=_GRAPH_CONFIG),
186
+ md=6,
187
+ ),
188
+ dbc.Col(
189
+ dcc.Graph(id="street-do-map", config=_GRAPH_CONFIG),
190
+ md=6,
191
+ className="mt-2",
192
+ ),
193
+ dbc.Col(
194
+ dcc.Graph(id="app-do-map", config=_GRAPH_CONFIG),
195
+ md=6,
196
+ className="mt-2",
197
+ ),
198
+ ],
199
+ className="mb-3",
200
+ )
201
+
202
+ demographics_tab = dbc.Row(
203
+ [
204
+ dbc.Col(dcc.Graph(id="white-deviation-map", config=_GRAPH_CONFIG), md=6),
205
+ dbc.Col(dcc.Graph(id="asian-deviation-map", config=_GRAPH_CONFIG), md=6),
206
+ ]
207
+ )
208
+
209
+ rr_tab = html.Div(
210
+ [
211
+ dbc.Row(
212
+ [
213
+ dbc.Col(dcc.Graph(id="rr-bar-chart", config=_GRAPH_CONFIG), md=7),
214
+ dbc.Col(dcc.Graph(id="rr-heatmap", config=_GRAPH_CONFIG), md=5),
215
+ ]
216
+ ),
217
+ ],
218
+ id="rr-section",
219
+ )
220
+
221
+ comparison_tab = html.Div(
222
+ [
223
+ dbc.Row(
224
+ [
225
+ dbc.Col(
226
+ [
227
+ dbc.Label("Month A"),
228
+ dcc.Dropdown(
229
+ id="comp-month-a",
230
+ options=[{"label": m, "value": m} for m in MONTHS],
231
+ value="Jan2024",
232
+ clearable=False,
233
+ ),
234
+ ],
235
+ md=3,
236
+ ),
237
+ dbc.Col(
238
+ [
239
+ dbc.Label("Month B"),
240
+ dcc.Dropdown(
241
+ id="comp-month-b",
242
+ options=[{"label": m, "value": m} for m in MONTHS],
243
+ value="Mar2024",
244
+ clearable=False,
245
+ ),
246
+ ],
247
+ md=3,
248
+ ),
249
+ dbc.Col(
250
+ [
251
+ dbc.Label("Hail Type"),
252
+ dcc.Dropdown(
253
+ id="comp-hail",
254
+ options=[
255
+ {"label": "Street-Hail", "value": "Street"},
256
+ {"label": "App-Based", "value": "App"},
257
+ ],
258
+ value="Street",
259
+ clearable=False,
260
+ ),
261
+ ],
262
+ md=3,
263
+ ),
264
+ dbc.Col(
265
+ [
266
+ dbc.Label("Metric"),
267
+ dcc.Dropdown(
268
+ id="comp-metric",
269
+ options=[
270
+ {"label": "Pickups", "value": "pu"},
271
+ {"label": "Drop-offs", "value": "do"},
272
+ ],
273
+ value="pu",
274
+ clearable=False,
275
+ ),
276
+ ],
277
+ md=3,
278
+ ),
279
+ ],
280
+ className="mb-3",
281
+ ),
282
+ dbc.Row(dbc.Col(dcc.Graph(id="comparison-map", config=_GRAPH_CONFIG))),
283
+ ]
284
+ )
285
+
286
+ top10_section = dbc.Row(
287
+ [
288
+ dbc.Col(
289
+ dbc.Card(
290
+ [
291
+ html.H5(
292
+ id="street-stats-title",
293
+ className="text-center",
294
+ ),
295
+ dbc.Row(
296
+ [
297
+ dbc.Col(
298
+ [
299
+ html.H6("Pickups", className="text-center text-muted"),
300
+ html.Div(id="street-pu-table"),
301
+ ],
302
+ md=6,
303
+ ),
304
+ dbc.Col(
305
+ [
306
+ html.H6("Drop-offs", className="text-center text-muted"),
307
+ html.Div(id="street-do-table"),
308
+ ],
309
+ md=6,
310
+ ),
311
+ ]
312
+ ),
313
+ ],
314
+ body=True,
315
+ className="bg-dark border-secondary",
316
+ ),
317
+ md=6,
318
+ ),
319
+ dbc.Col(
320
+ dbc.Card(
321
+ [
322
+ html.H5(
323
+ id="app-stats-title",
324
+ className="text-center",
325
+ ),
326
+ dbc.Row(
327
+ [
328
+ dbc.Col(
329
+ [
330
+ html.H6("Pickups", className="text-center text-muted"),
331
+ html.Div(id="app-pu-table"),
332
+ ],
333
+ md=6,
334
+ ),
335
+ dbc.Col(
336
+ [
337
+ html.H6("Drop-offs", className="text-center text-muted"),
338
+ html.Div(id="app-do-table"),
339
+ ],
340
+ md=6,
341
+ ),
342
+ ]
343
+ ),
344
+ ],
345
+ body=True,
346
+ className="bg-dark border-secondary",
347
+ ),
348
+ md=6,
349
+ ),
350
+ ],
351
+ className="mt-3",
352
+ )
353
+
354
+ analysis_tabs = dbc.Tabs(
355
+ [
356
+ dbc.Tab(demographics_tab, label="Demographics", tab_id="tab-demo"),
357
+ dbc.Tab(rr_tab, label="Representative Ratios", tab_id="tab-rr"),
358
+ dbc.Tab(comparison_tab, label="Monthly Comparison", tab_id="tab-comp"),
359
+ ],
360
+ id="analysis-tabs",
361
+ active_tab="tab-demo",
362
+ className="mb-3",
363
+ )
364
+
365
+ nhood_offcanvas = dbc.Offcanvas(
366
+ html.Div(id="nhood-detail-content"),
367
+ id="nhood-offcanvas",
368
+ title="Neighborhood Profile",
369
+ placement="end",
370
+ is_open=False,
371
+ style={"width": "400px", "backgroundColor": "#303030"},
372
+ )
373
+
374
+ app.layout = dbc.Container(
375
+ [
376
+ dcc.Store(id="selected-neighborhood", data=None),
377
+ nhood_offcanvas,
378
+ # Header
379
+ dbc.Row(
380
+ dbc.Col(
381
+ [
382
+ html.H2(
383
+ "SF Taxi Mobility Equity Dashboard",
384
+ className="text-center mt-3 mb-1",
385
+ ),
386
+ html.P(
387
+ "Analyzing whether Street-Hail and App-Based taxi services "
388
+ "in San Francisco are equitably distributed across "
389
+ "neighborhoods with different demographic compositions.",
390
+ className="text-center text-muted mb-3",
391
+ style={"maxWidth": "700px", "margin": "0 auto"},
392
+ ),
393
+ ]
394
+ )
395
+ ),
396
+ # Body: sidebar + main
397
+ dbc.Row(
398
+ [
399
+ dbc.Col(sidebar, md=2, className="pe-1"),
400
+ dbc.Col(
401
+ [
402
+ insights_banner,
403
+ narrative_row,
404
+ html.H5("Trip Distribution Maps", className="mb-2"),
405
+ trip_maps,
406
+ html.H5("Analysis", className="mb-2"),
407
+ analysis_tabs,
408
+ html.H5("Top 10 Neighborhoods", className="mb-2"),
409
+ top10_section,
410
+ ],
411
+ md=10,
412
+ ),
413
+ ]
414
+ ),
415
+ # Footer
416
+ dbc.Row(
417
+ dbc.Col(
418
+ html.P(
419
+ [
420
+ "Data: ",
421
+ html.A("DataSF Taxi Trips", href="https://data.sfgov.org/Transportation/Taxi-Trips/m8hk-2ipk/", target="_blank"),
422
+ " | ",
423
+ html.A("ACS 2022", href="https://api.census.gov/data/2022/acs/acs5.html", target="_blank"),
424
+ " | Debayan Mandal",
425
+ ],
426
+ className="text-center text-muted small mt-4 mb-3",
427
+ )
428
+ )
429
+ ),
430
+ ],
431
+ fluid=True,
432
+ )
433
+
434
+ @callback(
435
+ Output("street-pu-map", "figure"),
436
+ Output("app-pu-map", "figure"),
437
+ Output("street-do-map", "figure"),
438
+ Output("app-do-map", "figure"),
439
+ Input("month-selector", "value"),
440
+ Input("hail-type-filter", "value"),
441
+ Input("selected-neighborhood", "data"),
442
+ )
443
+ def update_trip_maps(month, hail_types, sel_nhood):
444
+ hail_types = hail_types or ["Street", "App"]
445
+ con = get_con()
446
+ figs = []
447
+ for ht, metric in [
448
+ ("Street", "pu"),
449
+ ("App", "pu"),
450
+ ("Street", "do"),
451
+ ("App", "do"),
452
+ ]:
453
+ if ht in hail_types:
454
+ figs.append(
455
+ build_trip_choropleth(con, GEOJSON, ht, month, metric, sel_nhood)
456
+ )
457
+ else:
458
+ import plotly.graph_objects as go
459
+ fig = go.Figure()
460
+ fig.update_layout(
461
+ title=f"{ht} {'Pickups' if metric == 'pu' else 'Drop-offs'} (filtered out)",
462
+ paper_bgcolor="rgba(0,0,0,0)",
463
+ font_color="#e0e0e0",
464
+ height=420,
465
+ )
466
+ figs.append(fig)
467
+ return figs[0], figs[1], figs[2], figs[3]
468
+
469
+ @callback(
470
+ Output("selected-neighborhood", "data"),
471
+ Input("street-pu-map", "clickData"),
472
+ Input("app-pu-map", "clickData"),
473
+ Input("street-do-map", "clickData"),
474
+ Input("app-do-map", "clickData"),
475
+ Input("white-deviation-map", "clickData"),
476
+ Input("asian-deviation-map", "clickData"),
477
+ Input("reset-selection-btn", "n_clicks"),
478
+ prevent_initial_call=True,
479
+ )
480
+ def sync_map_selection(c1, c2, c3, c4, c5, c6, reset):
481
+ trigger = ctx.triggered_id
482
+ if trigger == "reset-selection-btn":
483
+ return None
484
+ for click in [c1, c2, c3, c4, c5, c6]:
485
+ if click and trigger in [
486
+ "street-pu-map", "app-pu-map", "street-do-map", "app-do-map",
487
+ "white-deviation-map", "asian-deviation-map",
488
+ ]:
489
+ try:
490
+ return click["points"][0]["customdata"][0]
491
+ except (KeyError, IndexError, TypeError):
492
+ try:
493
+ return click["points"][0]["location"]
494
+ except (KeyError, IndexError, TypeError):
495
+ pass
496
+ return dash.no_update
497
+
498
+ @callback(
499
+ Output("selected-nhood-display", "children"),
500
+ Input("selected-neighborhood", "data"),
501
+ )
502
+ def display_selected_nhood(nhood):
503
+ if nhood:
504
+ return dbc.Alert(
505
+ [html.Strong("Selected: "), nhood],
506
+ color="info",
507
+ className="py-2 mb-0",
508
+ )
509
+ return html.P("Click a neighborhood on any map", className="text-muted small")
510
+
511
+ @callback(
512
+ Output("stat-total-trips", "children"),
513
+ Output("stat-top-nhood", "children"),
514
+ Output("stat-rr-white", "children"),
515
+ Output("stat-rr-asian", "children"),
516
+ Output("narrative-text", "children"),
517
+ Input("month-selector", "value"),
518
+ Input("hail-type-filter", "value"),
519
+ )
520
+ def update_insights(month, hail_types):
521
+ hail_types = hail_types or ["Street", "App"]
522
+ ht_filter = ", ".join(f"'{h}'" for h in hail_types)
523
+ con = get_con()
524
+
525
+ total = con.sql(f"""
526
+ SELECT SUM(trips_pu) AS total
527
+ FROM trip_counts_pu
528
+ WHERE month = '{month}' AND hail_type IN ({ht_filter})
529
+ """).df()["total"].iloc[0]
530
+ total = int(total) if total else 0
531
+
532
+ top = con.sql(f"""
533
+ SELECT nhood, SUM(trips_pu) AS t
534
+ FROM trip_counts_pu
535
+ WHERE month = '{month}' AND hail_type IN ({ht_filter})
536
+ GROUP BY nhood ORDER BY t DESC LIMIT 1
537
+ """).df()
538
+ top_nhood = top["nhood"].iloc[0] if not top.empty else "N/A"
539
+
540
+ rr = con.sql(f"""
541
+ SELECT AVG(RR_white_PU) AS rr_w, AVG(RR_asian_PU) AS rr_a
542
+ FROM representative_ratios
543
+ WHERE month = '{month}' AND hail_type IN ({ht_filter})
544
+ """).df()
545
+ rr_w = round(float(rr["rr_w"].iloc[0]), 2) if not rr.empty and rr["rr_w"].iloc[0] else 0
546
+ rr_a = round(float(rr["rr_a"].iloc[0]), 2) if not rr.empty and rr["rr_a"].iloc[0] else 0
547
+
548
+ # Dynamic narrative
549
+ parts = [f"In {month}, {total:,} taxi trips were recorded across SF."]
550
+ parts.append(f"{top_nhood} was the most served neighborhood.")
551
+ if rr_w > 1.0:
552
+ parts.append(
553
+ f"White-majority neighborhoods received {rr_w:.2f}x their expected "
554
+ f"share of service,"
555
+ )
556
+ if rr_a < 1.0:
557
+ parts.append(
558
+ f"and Asian-majority neighborhoods received {rr_a:.2f}x."
559
+ )
560
+ narrative = " ".join(parts)
561
+
562
+ return f"{total:,}", top_nhood, f"{rr_w:.2f}x", f"{rr_a:.2f}x", narrative
563
+
564
+ @callback(
565
+ Output("white-deviation-map", "figure"),
566
+ Output("asian-deviation-map", "figure"),
567
+ Input("selected-neighborhood", "data"),
568
+ )
569
+ def update_demo_maps(sel_nhood):
570
+ con = get_con()
571
+ w = build_demo_choropleth(
572
+ con, GEOJSON, "white_pct", BASELINE_WHITE,
573
+ f"White Pop. Deviation ({BASELINE_WHITE:.1f}% baseline)",
574
+ sel_nhood,
575
+ )
576
+ a = build_demo_choropleth(
577
+ con, GEOJSON, "asian_pct", BASELINE_ASIAN,
578
+ f"Asian Pop. Deviation ({BASELINE_ASIAN:.1f}% baseline)",
579
+ sel_nhood,
580
+ )
581
+ return w, a
582
+
583
+ @callback(
584
+ Output("rr-bar-chart", "figure"),
585
+ Output("rr-heatmap", "figure"),
586
+ Input("month-selector", "value"),
587
+ )
588
+ def update_rr(month):
589
+ con = get_con()
590
+ return build_rr_bar_chart(con, month), build_rr_heatmap(con)
591
+
592
+ @callback(
593
+ Output("comparison-map", "figure"),
594
+ Input("comp-month-a", "value"),
595
+ Input("comp-month-b", "value"),
596
+ Input("comp-hail", "value"),
597
+ Input("comp-metric", "value"),
598
+ )
599
+ def update_comparison(month_a, month_b, hail, metric):
600
+ con = get_con()
601
+ return build_comparison_map(con, GEOJSON, hail, metric, month_a, month_b)
602
+
603
+ @callback(
604
+ Output("nhood-offcanvas", "is_open"),
605
+ Output("nhood-detail-content", "children"),
606
+ Input("selected-neighborhood", "data"),
607
+ State("month-selector", "value"),
608
+ )
609
+ def update_nhood_panel(nhood, month):
610
+ if not nhood:
611
+ return False, []
612
+
613
+ con = get_con()
614
+ profile = build_neighborhood_profile(con, nhood, month)
615
+ demo = profile["demographics"]
616
+
617
+ children = [
618
+ html.H4(profile["name"]),
619
+ html.Hr(),
620
+ ]
621
+
622
+ if demo:
623
+ children.extend(
624
+ [
625
+ html.H6("Demographics"),
626
+ html.P(f"Population: {demo['total_pop']:,}"),
627
+ html.P(f"White: {demo['white_pct']}% (deviation: {demo['white_dev']:+.1f} pp)"),
628
+ html.P(f"Black: {demo['black_pct']}%"),
629
+ html.P(f"Asian: {demo['asian_pct']}% (deviation: {demo['asian_dev']:+.1f} pp)"),
630
+ html.Hr(),
631
+ ]
632
+ )
633
+
634
+ if profile["trips"]:
635
+ children.append(html.H6(f"Trips ({month})"))
636
+ for key, val in sorted(profile["trips"].items()):
637
+ if month in key:
638
+ label = key.replace(f"_{month}", "").replace("_", " ")
639
+ children.append(html.P(f"{label}: {val:,}"))
640
+ children.append(html.Hr())
641
+
642
+ if profile["trend_fig"]:
643
+ children.append(
644
+ dcc.Graph(
645
+ figure=profile["trend_fig"],
646
+ config={"displayModeBar": False},
647
+ style={"height": "280px"},
648
+ )
649
+ )
650
+
651
+ return True, children
652
+
653
+ def _make_table(df):
654
+ if df.empty:
655
+ return html.P("No data", className="text-muted")
656
+ return dash_table.DataTable(
657
+ data=df.to_dict("records"),
658
+ columns=[{"name": c, "id": c} for c in df.columns],
659
+ style_table={"overflowX": "auto"},
660
+ style_header={
661
+ "backgroundColor": "#375a7f",
662
+ "color": "white",
663
+ "fontWeight": "bold",
664
+ "textAlign": "center",
665
+ },
666
+ style_cell={
667
+ "backgroundColor": "#303030",
668
+ "color": "#e0e0e0",
669
+ "textAlign": "center",
670
+ "padding": "6px",
671
+ "fontSize": "0.85rem",
672
+ },
673
+ style_data_conditional=[
674
+ {
675
+ "if": {"row_index": 0},
676
+ "backgroundColor": "#3a506b",
677
+ "fontWeight": "bold",
678
+ }
679
+ ],
680
+ page_size=10,
681
+ )
682
+
683
+
684
+ @callback(
685
+ Output("street-stats-title", "children"),
686
+ Output("street-pu-table", "children"),
687
+ Output("street-do-table", "children"),
688
+ Output("app-stats-title", "children"),
689
+ Output("app-pu-table", "children"),
690
+ Output("app-do-table", "children"),
691
+ Input("month-selector", "value"),
692
+ )
693
+ def update_top10(month):
694
+ con = get_con()
695
+ s_pu = get_trip_stats_df(con, "Street", month, "pu")
696
+ s_do = get_trip_stats_df(con, "Street", month, "do")
697
+ a_pu = get_trip_stats_df(con, "App", month, "pu")
698
+ a_do = get_trip_stats_df(con, "App", month, "do")
699
+
700
+ return (
701
+ f"Street-Hail Top 10 ({month})",
702
+ _make_table(s_pu),
703
+ _make_table(s_do),
704
+ f"App-Based Top 10 ({month})",
705
+ _make_table(a_pu),
706
+ _make_table(a_do),
707
+ )
708
+
709
+ @callback(
710
+ Output("csv-download", "data"),
711
+ Input("download-btn", "n_clicks"),
712
+ State("month-selector", "value"),
713
+ State("hail-type-filter", "value"),
714
+ State("selected-neighborhood", "data"),
715
+ prevent_initial_call=True,
716
+ )
717
+ def trigger_download(n_clicks, month, hail_types, nhood):
718
+ con = get_con()
719
+ df = get_download_csv(con, month, hail_types, nhood)
720
+ filename = f"sf_taxi_data_{month}"
721
+ if nhood:
722
+ filename += f"_{nhood.replace(' ', '_').replace('/', '_')}"
723
+ return dcc.send_data_frame(df.to_csv, f"{filename}.csv", index=False)
724
+
725
+ @callback(
726
+ Output("geojson-download", "data"),
727
+ Input("download-geojson-btn", "n_clicks"),
728
+ State("month-selector", "value"),
729
+ State("hail-type-filter", "value"),
730
+ State("selected-neighborhood", "data"),
731
+ prevent_initial_call=True,
732
+ )
733
+ def trigger_geojson_download(n_clicks, month, hail_types, nhood):
734
+ import json
735
+ import copy
736
+ con = get_con()
737
+ hail_types = hail_types or ["Street", "App"]
738
+ ht_filter = ", ".join(f"'{h}'" for h in hail_types)
739
+ nhood_clause = f"AND n.nhood = '{nhood}'" if nhood else ""
740
+
741
+ # Get trip + demographic data per neighborhood
742
+ data_df = con.sql(f"""
743
+ SELECT n.nhood,
744
+ COALESCE(SUM(pu.trips_pu), 0) AS total_pickups,
745
+ COALESCE(SUM(td.trips_do), 0) AS total_dropoffs,
746
+ nd.total_pop, nd.white_pct, nd.black_pct, nd.asian_pct
747
+ FROM neighborhoods n
748
+ LEFT JOIN trip_counts_pu pu
749
+ ON pu.nhood = n.nhood AND pu.month = '{month}'
750
+ AND pu.hail_type IN ({ht_filter})
751
+ LEFT JOIN trip_counts_do td
752
+ ON td.nhood = n.nhood AND td.month = '{month}'
753
+ AND td.hail_type IN ({ht_filter})
754
+ LEFT JOIN nhood_demographics nd ON nd.nhood = n.nhood
755
+ WHERE 1=1 {nhood_clause}
756
+ GROUP BY n.nhood, nd.total_pop, nd.white_pct, nd.black_pct, nd.asian_pct
757
+ """).df()
758
+ data_map = {row["nhood"]: row.to_dict() for _, row in data_df.iterrows()}
759
+
760
+ geojson = copy.deepcopy(GEOJSON)
761
+ # Filter to selected neighborhood if one is chosen
762
+ if nhood:
763
+ geojson["features"] = [
764
+ f for f in geojson["features"]
765
+ if f["properties"]["nhood"] == nhood
766
+ ]
767
+ # Enrich features with data
768
+ for feat in geojson["features"]:
769
+ name = feat["properties"]["nhood"]
770
+ if name in data_map:
771
+ d = data_map[name]
772
+ feat["properties"]["month"] = month
773
+ feat["properties"]["total_pickups"] = int(d["total_pickups"])
774
+ feat["properties"]["total_dropoffs"] = int(d["total_dropoffs"])
775
+ feat["properties"]["total_pop"] = int(d["total_pop"]) if d["total_pop"] else 0
776
+ feat["properties"]["white_pct"] = round(float(d["white_pct"]), 1) if d["white_pct"] else 0
777
+ feat["properties"]["black_pct"] = round(float(d["black_pct"]), 1) if d["black_pct"] else 0
778
+ feat["properties"]["asian_pct"] = round(float(d["asian_pct"]), 1) if d["asian_pct"] else 0
779
+
780
+ filename = f"sf_taxi_{month}"
781
+ if nhood:
782
+ filename += f"_{nhood.replace(' ', '_').replace('/', '_')}"
783
+ return dict(
784
+ content=json.dumps(geojson, indent=2),
785
+ filename=f"{filename}.geojson",
786
+ type="application/geo+json",
787
+ )
788
+
789
+ if __name__ == "__main__":
790
+ app.run(host="0.0.0.0", port=7860, debug=False)
assets/styles.css ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .card {
2
+ transition: box-shadow 0.2s ease;
3
+ }
4
+ .card:hover {
5
+ box-shadow: 0 0 12px rgba(55, 90, 127, 0.3);
6
+ }
7
+
8
+ .js-plotly-plot .plotly .modebar {
9
+ top: 4px !important;
10
+ right: 4px !important;
11
+ }
12
+
13
+ .offcanvas {
14
+ background-color: #303030 !important;
15
+ color: #e0e0e0 !important;
16
+ }
17
+ .offcanvas-header .btn-close {
18
+ filter: invert(1);
19
+ }
20
+
21
+ .dash-dropdown-value,
22
+ .dash-dropdown-value-item,
23
+ .dash-dropdown-trigger {
24
+ color: #000 !important;
25
+ }
26
+
27
+ body {
28
+ overflow-y: auto;
29
+ }
30
+
31
+ .dash-table-container .dash-spreadsheet-container {
32
+ border-radius: 4px;
33
+ overflow: hidden;
34
+ }
35
+
36
+ @media (max-width: 768px) {
37
+ #sidebar {
38
+ position: relative !important;
39
+ height: auto !important;
40
+ margin-bottom: 1rem;
41
+ }
42
+ }
dashboard_helpers.py ADDED
@@ -0,0 +1,471 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import functools
3
+ import pandas as pd
4
+ import geopandas as gpd
5
+ import plotly.express as px
6
+ import plotly.graph_objects as go
7
+
8
+ _geojson_cache = {}
9
+
10
+ def get_neighborhood_geojson(con) -> dict:
11
+ if "geojson" not in _geojson_cache:
12
+ df = con.sql(
13
+ "SELECT nhood, ST_AsText(geometry) AS geometry FROM neighborhoods"
14
+ ).df()
15
+ gdf = gpd.GeoDataFrame(
16
+ df,
17
+ geometry=gpd.GeoSeries.from_wkt(df["geometry"]),
18
+ crs="EPSG:4326",
19
+ )
20
+ _geojson_cache["geojson"] = json.loads(gdf.to_json())
21
+ return _geojson_cache["geojson"]
22
+
23
+ def get_all_neighborhoods(con) -> list[str]:
24
+ return (
25
+ con.sql("SELECT DISTINCT nhood FROM neighborhoods ORDER BY nhood")
26
+ .df()["nhood"]
27
+ .tolist()
28
+ )
29
+
30
+ _MAP_CENTER = {"lat": 37.76, "lon": -122.44}
31
+ _MAP_ZOOM = 11
32
+ _MAP_STYLE = "carto-darkmatter"
33
+ _MAP_MARGIN = dict(l=0, r=0, t=32, b=0)
34
+ _MAP_HEIGHT = 420
35
+ _GRAPH_CONFIG = {
36
+ "toImageButtonOptions": {
37
+ "format": "png",
38
+ "width": 1200,
39
+ "height": 800,
40
+ "scale": 2,
41
+ },
42
+ "displayModeBar": True,
43
+ }
44
+
45
+ def _highlight_trace(con, geojson, nhood):
46
+ for feat in geojson["features"]:
47
+ if feat["properties"]["nhood"] == nhood:
48
+ geom = feat["geometry"]
49
+ coords = (
50
+ geom["coordinates"][0]
51
+ if geom["type"] == "Polygon"
52
+ else geom["coordinates"][0][0]
53
+ )
54
+ lons = [c[0] for c in coords] + [None]
55
+ lats = [c[1] for c in coords] + [None]
56
+ return go.Scattermapbox(
57
+ lon=lons,
58
+ lat=lats,
59
+ mode="lines",
60
+ line=dict(width=3, color="#ff4444"),
61
+ hoverinfo="skip",
62
+ showlegend=False,
63
+ )
64
+ return None
65
+
66
+ def build_trip_choropleth(
67
+ con, geojson, hail_type, month, metric, selected_nhood=None
68
+ ):
69
+ if metric == "pu":
70
+ table, col = "trip_counts_pu", "trips_pu"
71
+ color_scale = "Viridis"
72
+ title = f"{'Street-Hail' if hail_type == 'Street' else 'App-Based'} Pickups"
73
+ else:
74
+ table, col = "trip_counts_do", "trips_do"
75
+ color_scale = "Cividis"
76
+ title = f"{'Street-Hail' if hail_type == 'Street' else 'App-Based'} Drop-offs"
77
+
78
+ df = con.sql(f"""
79
+ SELECT n.nhood, COALESCE(t.{col}, 0) AS trips
80
+ FROM neighborhoods n
81
+ LEFT JOIN {table} t
82
+ ON t.nhood = n.nhood
83
+ AND t.hail_type = '{hail_type}'
84
+ AND t.month = '{month}'
85
+ """).df()
86
+
87
+ fig = px.choropleth_mapbox(
88
+ df,
89
+ geojson=geojson,
90
+ locations="nhood",
91
+ featureidkey="properties.nhood",
92
+ color="trips",
93
+ color_continuous_scale=color_scale,
94
+ mapbox_style=_MAP_STYLE,
95
+ center=_MAP_CENTER,
96
+ zoom=_MAP_ZOOM,
97
+ opacity=0.75,
98
+ hover_data={"nhood": True, "trips": True},
99
+ title=f"{title} ({month})",
100
+ )
101
+ fig.update_traces(
102
+ customdata=df[["nhood"]].values,
103
+ hovertemplate="<b>%{customdata[0]}</b><br>Trips: %{z:,}<extra></extra>",
104
+ )
105
+
106
+ if selected_nhood:
107
+ trace = _highlight_trace(con, geojson, selected_nhood)
108
+ if trace:
109
+ fig.add_trace(trace)
110
+
111
+ fig.update_layout(
112
+ margin=_MAP_MARGIN,
113
+ height=_MAP_HEIGHT,
114
+ coloraxis_colorbar=dict(title="Trips", thickness=15, len=0.6),
115
+ paper_bgcolor="rgba(0,0,0,0)",
116
+ plot_bgcolor="rgba(0,0,0,0)",
117
+ font_color="#e0e0e0",
118
+ )
119
+ return fig
120
+
121
+ def build_demo_choropleth(
122
+ con, geojson, column, baseline_val, legend_title, selected_nhood=None
123
+ ):
124
+ df = con.sql(f"""
125
+ SELECT nd.nhood,
126
+ ROUND(nd.{column} - {baseline_val}, 1) AS deviation
127
+ FROM nhood_demographics nd
128
+ JOIN neighborhoods n ON nd.nhood = n.nhood
129
+ WHERE nd.total_pop > 0
130
+ """).df()
131
+
132
+ max_abs = max(abs(df["deviation"].min()), abs(df["deviation"].max()), 1)
133
+
134
+ fig = px.choropleth_mapbox(
135
+ df,
136
+ geojson=geojson,
137
+ locations="nhood",
138
+ featureidkey="properties.nhood",
139
+ color="deviation",
140
+ color_continuous_scale="RdBu",
141
+ range_color=[-max_abs, max_abs],
142
+ color_continuous_midpoint=0,
143
+ mapbox_style=_MAP_STYLE,
144
+ center=_MAP_CENTER,
145
+ zoom=_MAP_ZOOM,
146
+ opacity=0.75,
147
+ title=legend_title,
148
+ )
149
+ fig.update_traces(
150
+ customdata=df[["nhood", "deviation"]].values,
151
+ hovertemplate=(
152
+ "<b>%{customdata[0]}</b><br>"
153
+ "Deviation: %{customdata[1]:+.1f} pp<extra></extra>"
154
+ ),
155
+ )
156
+
157
+ if selected_nhood:
158
+ trace = _highlight_trace(con, geojson, selected_nhood)
159
+ if trace:
160
+ fig.add_trace(trace)
161
+
162
+ fig.update_layout(
163
+ margin=_MAP_MARGIN,
164
+ height=_MAP_HEIGHT,
165
+ coloraxis_colorbar=dict(title="Dev (pp)", thickness=15, len=0.6),
166
+ paper_bgcolor="rgba(0,0,0,0)",
167
+ plot_bgcolor="rgba(0,0,0,0)",
168
+ font_color="#e0e0e0",
169
+ )
170
+ return fig
171
+
172
+ def build_rr_bar_chart(con, month):
173
+ df = con.sql(f"""
174
+ SELECT hail_type, month,
175
+ RR_white_PU, RR_asian_PU, RR_white_DO, RR_asian_DO
176
+ FROM representative_ratios
177
+ WHERE month = '{month}'
178
+ """).df()
179
+
180
+ if df.empty:
181
+ return go.Figure().update_layout(
182
+ title="No data", paper_bgcolor="rgba(0,0,0,0)"
183
+ )
184
+
185
+ rows = []
186
+ for _, r in df.iterrows():
187
+ for metric_col, label in [
188
+ ("RR_white_PU", "White: Pickups"),
189
+ ("RR_white_DO", "White: Drop-offs"),
190
+ ("RR_asian_PU", "Asian: Pickups"),
191
+ ("RR_asian_DO", "Asian: Drop-offs"),
192
+ ]:
193
+ rows.append(
194
+ {
195
+ "Hail Type": r["hail_type"],
196
+ "Metric": label,
197
+ "RR": round(float(r[metric_col]), 3),
198
+ }
199
+ )
200
+ plot_df = pd.DataFrame(rows)
201
+
202
+ fig = px.bar(
203
+ plot_df,
204
+ x="Metric",
205
+ y="RR",
206
+ color="Hail Type",
207
+ barmode="group",
208
+ color_discrete_map={"Street": "#636EFA", "App": "#EF553B"},
209
+ title=f"Representative Ratios: {month}",
210
+ )
211
+ fig.add_hline(
212
+ y=1.0,
213
+ line_dash="dash",
214
+ line_color="#ffd700",
215
+ annotation_text="Perfect Representation (1.0)",
216
+ annotation_position="top left",
217
+ annotation_font_color="#ffd700",
218
+ )
219
+ fig.update_layout(
220
+ yaxis_title="Representative Ratio",
221
+ xaxis_title="",
222
+ height=420,
223
+ paper_bgcolor="rgba(0,0,0,0)",
224
+ plot_bgcolor="rgba(0,0,0,0)",
225
+ font_color="#e0e0e0",
226
+ legend=dict(orientation="h", yanchor="bottom", y=1.02, x=0.5, xanchor="center"),
227
+ )
228
+ fig.update_yaxes(gridcolor="rgba(255,255,255,0.1)")
229
+ return fig
230
+
231
+ def build_rr_heatmap(con):
232
+ df = con.sql(
233
+ "SELECT * FROM representative_ratios ORDER BY hail_type, month"
234
+ ).df()
235
+
236
+ if df.empty:
237
+ return go.Figure().update_layout(
238
+ title="No data", paper_bgcolor="rgba(0,0,0,0)"
239
+ )
240
+
241
+ labels = []
242
+ z_vals = []
243
+ for _, r in df.iterrows():
244
+ row_label = f"{r['hail_type']}: {r['month']}"
245
+ labels.append(row_label)
246
+ z_vals.append(
247
+ [
248
+ round(float(r["RR_white_PU"]), 3),
249
+ round(float(r["RR_asian_PU"]), 3),
250
+ round(float(r["RR_white_DO"]), 3),
251
+ round(float(r["RR_asian_DO"]), 3),
252
+ ]
253
+ )
254
+
255
+ col_labels = ["White PU", "Asian PU", "White DO", "Asian DO"]
256
+
257
+ fig = go.Figure(
258
+ data=go.Heatmap(
259
+ z=z_vals,
260
+ x=col_labels,
261
+ y=labels,
262
+ colorscale="RdBu",
263
+ zmid=1.0,
264
+ text=z_vals,
265
+ texttemplate="%{text:.3f}",
266
+ textfont=dict(size=12),
267
+ hovertemplate=(
268
+ "<b>%{y}</b><br>%{x}: %{z:.3f}<extra></extra>"
269
+ ),
270
+ colorbar=dict(title="RR", thickness=15),
271
+ )
272
+ )
273
+ fig.update_layout(
274
+ title="Representative Ratios: All Months",
275
+ height=350,
276
+ margin=dict(l=0, r=0, t=40, b=0),
277
+ paper_bgcolor="rgba(0,0,0,0)",
278
+ plot_bgcolor="rgba(0,0,0,0)",
279
+ font_color="#e0e0e0",
280
+ xaxis=dict(side="top"),
281
+ )
282
+ return fig
283
+
284
+ def build_neighborhood_profile(con, nhood, month):
285
+ demo = con.sql(f"""
286
+ SELECT total_pop, white_pop, black_pop, asian_pop,
287
+ white_pct, black_pct, asian_pct
288
+ FROM nhood_demographics
289
+ WHERE nhood = '{nhood}'
290
+ """).df()
291
+
292
+ baselines = con.sql("SELECT * FROM city_baselines").df()
293
+
294
+ trips_pu = con.sql(f"""
295
+ SELECT hail_type, month, trips_pu
296
+ FROM trip_counts_pu
297
+ WHERE nhood = '{nhood}'
298
+ ORDER BY month, hail_type
299
+ """).df()
300
+
301
+ trips_do = con.sql(f"""
302
+ SELECT hail_type, month, trips_do
303
+ FROM trip_counts_do
304
+ WHERE nhood = '{nhood}'
305
+ ORDER BY month, hail_type
306
+ """).df()
307
+
308
+ profile = {"name": nhood, "demographics": {}, "trips": {}, "trend_fig": None}
309
+
310
+ if not demo.empty:
311
+ d = demo.iloc[0]
312
+ bw = float(baselines["baseline_white_pct"].iloc[0])
313
+ ba = float(baselines["baseline_asian_pct"].iloc[0])
314
+ profile["demographics"] = {
315
+ "total_pop": int(d["total_pop"]),
316
+ "white_pct": round(float(d["white_pct"]), 1),
317
+ "black_pct": round(float(d["black_pct"]), 1),
318
+ "asian_pct": round(float(d["asian_pct"]), 1),
319
+ "white_dev": round(float(d["white_pct"]) - bw, 1),
320
+ "asian_dev": round(float(d["asian_pct"]) - ba, 1),
321
+ }
322
+
323
+ for _, r in trips_pu.iterrows():
324
+ key = f"{r['hail_type']}_PU_{r['month']}"
325
+ profile["trips"][key] = int(r["trips_pu"])
326
+
327
+ for _, r in trips_do.iterrows():
328
+ key = f"{r['hail_type']}_DO_{r['month']}"
329
+ profile["trips"][key] = int(r["trips_do"])
330
+
331
+ # Mini trend chart
332
+ trend_rows = []
333
+ for _, r in trips_pu.iterrows():
334
+ trend_rows.append(
335
+ {"Month": r["month"], "Type": f"{r['hail_type']} PU", "Trips": int(r["trips_pu"])}
336
+ )
337
+ for _, r in trips_do.iterrows():
338
+ trend_rows.append(
339
+ {"Month": r["month"], "Type": f"{r['hail_type']} DO", "Trips": int(r["trips_do"])}
340
+ )
341
+
342
+ if trend_rows:
343
+ trend_df = pd.DataFrame(trend_rows)
344
+ trend_fig = px.bar(
345
+ trend_df,
346
+ x="Month",
347
+ y="Trips",
348
+ color="Type",
349
+ barmode="group",
350
+ title=f"Trip Trends: {nhood}",
351
+ height=280,
352
+ )
353
+ trend_fig.update_layout(
354
+ paper_bgcolor="rgba(0,0,0,0)",
355
+ plot_bgcolor="rgba(0,0,0,0)",
356
+ font_color="#e0e0e0",
357
+ margin=dict(l=0, r=0, t=40, b=0),
358
+ legend=dict(orientation="h", y=-0.2),
359
+ )
360
+ trend_fig.update_yaxes(gridcolor="rgba(255,255,255,0.1)")
361
+ profile["trend_fig"] = trend_fig
362
+
363
+ return profile
364
+
365
+ def get_trip_stats_df(con, hail_type, month, metric):
366
+ if metric == "pu":
367
+ table, col, alias = "trip_counts_pu", "trips_pu", "Pickups"
368
+ else:
369
+ table, col, alias = "trip_counts_do", "trips_do", "Drop-offs"
370
+
371
+ return con.sql(f"""
372
+ SELECT n.nhood AS Neighborhood, t.{col} AS "{alias}"
373
+ FROM {table} t
374
+ JOIN neighborhoods n ON t.nhood = n.nhood
375
+ WHERE t.hail_type = '{hail_type}' AND t.month = '{month}'
376
+ ORDER BY t.{col} DESC
377
+ LIMIT 10
378
+ """).df()
379
+
380
+ def get_download_csv(con, month, hail_types, nhood=None):
381
+ ht_filter = ", ".join(f"'{h}'" for h in hail_types) if hail_types else "'Street','App'"
382
+ nhood_clause = f"AND n.nhood = '{nhood}'" if nhood else ""
383
+
384
+ pu = con.sql(f"""
385
+ SELECT n.nhood, t.hail_type, t.month, t.trips_pu,
386
+ nd.total_pop, nd.white_pct, nd.black_pct, nd.asian_pct
387
+ FROM trip_counts_pu t
388
+ JOIN neighborhoods n ON t.nhood = n.nhood
389
+ JOIN nhood_demographics nd ON n.nhood = nd.nhood
390
+ WHERE t.month = '{month}'
391
+ AND t.hail_type IN ({ht_filter})
392
+ {nhood_clause}
393
+ ORDER BY t.trips_pu DESC
394
+ """).df()
395
+
396
+ do = con.sql(f"""
397
+ SELECT n.nhood, t.hail_type, t.month, t.trips_do
398
+ FROM trip_counts_do t
399
+ JOIN neighborhoods n ON t.nhood = n.nhood
400
+ WHERE t.month = '{month}'
401
+ AND t.hail_type IN ({ht_filter})
402
+ {nhood_clause}
403
+ """).df()
404
+
405
+ merged = pd.merge(
406
+ pu,
407
+ do[["nhood", "hail_type", "month", "trips_do"]],
408
+ on=["nhood", "hail_type", "month"],
409
+ how="outer",
410
+ )
411
+ return merged.fillna(0)
412
+
413
+ def build_comparison_map(con, geojson, hail_type, metric, month_a, month_b):
414
+ if metric == "pu":
415
+ table, col = "trip_counts_pu", "trips_pu"
416
+ label = "Pickups"
417
+ else:
418
+ table, col = "trip_counts_do", "trips_do"
419
+ label = "Drop-offs"
420
+
421
+ df = con.sql(f"""
422
+ WITH a AS (
423
+ SELECT nhood, {col} AS trips_a
424
+ FROM {table}
425
+ WHERE hail_type = '{hail_type}' AND month = '{month_a}'
426
+ ),
427
+ b AS (
428
+ SELECT nhood, {col} AS trips_b
429
+ FROM {table}
430
+ WHERE hail_type = '{hail_type}' AND month = '{month_b}'
431
+ )
432
+ SELECT n.nhood,
433
+ COALESCE(b.trips_b, 0) - COALESCE(a.trips_a, 0) AS diff
434
+ FROM neighborhoods n
435
+ LEFT JOIN a ON n.nhood = a.nhood
436
+ LEFT JOIN b ON n.nhood = b.nhood
437
+ """).df()
438
+
439
+ max_abs = max(abs(df["diff"].min()), abs(df["diff"].max()), 1)
440
+
441
+ fig = px.choropleth_mapbox(
442
+ df,
443
+ geojson=geojson,
444
+ locations="nhood",
445
+ featureidkey="properties.nhood",
446
+ color="diff",
447
+ color_continuous_scale="RdBu",
448
+ range_color=[-max_abs, max_abs],
449
+ color_continuous_midpoint=0,
450
+ mapbox_style=_MAP_STYLE,
451
+ center=_MAP_CENTER,
452
+ zoom=_MAP_ZOOM,
453
+ opacity=0.75,
454
+ title=f"{hail_type} {label}: {month_b} vs {month_a}",
455
+ )
456
+ fig.update_traces(
457
+ customdata=df[["nhood", "diff"]].values,
458
+ hovertemplate=(
459
+ "<b>%{customdata[0]}</b><br>"
460
+ "Change: %{customdata[1]:+d} trips<extra></extra>"
461
+ ),
462
+ )
463
+ fig.update_layout(
464
+ margin=_MAP_MARGIN,
465
+ height=_MAP_HEIGHT,
466
+ coloraxis_colorbar=dict(title="Change", thickness=15, len=0.6),
467
+ paper_bgcolor="rgba(0,0,0,0)",
468
+ plot_bgcolor="rgba(0,0,0,0)",
469
+ font_color="#e0e0e0",
470
+ )
471
+ return fig
data_pipeline.py ADDED
@@ -0,0 +1,274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import io
2
+ import os
3
+ import pathlib
4
+ import duckdb
5
+ import pandas as pd
6
+ import geopandas as gpd
7
+ import requests
8
+
9
+ con = duckdb.connect("sf_dashboard.db")
10
+ con.install_extension("httpfs")
11
+ con.load_extension("httpfs")
12
+ con.install_extension("spatial")
13
+ con.load_extension("spatial")
14
+
15
+ NHOOD_URL = "https://data.sfgov.org/resource/j2bu-swwd.geojson"
16
+ print("[1/6] Loading SF Analysis Neighborhoods ...")
17
+ nhoods = gpd.read_file(NHOOD_URL)
18
+ nhoods_df = nhoods[["nhood", "geometry"]].copy()
19
+ nhoods_df = nhoods_df.to_crs("EPSG:4326")
20
+ nhoods_utm = nhoods_df.to_crs("EPSG:26910")
21
+ print(f" Loaded {len(nhoods_df)} neighborhoods")
22
+
23
+ # Register as DuckDB table
24
+ nhoods_df["geometry"] = nhoods_df["geometry"].apply(lambda g: bytes(g.wkb))
25
+ con.sql("""
26
+ CREATE OR REPLACE TABLE neighborhoods AS
27
+ SELECT nhood, ST_GeomFromWKB(geometry)::GEOMETRY AS geometry
28
+ FROM nhoods_df
29
+ """)
30
+
31
+ # Register UTM version
32
+ nhoods_utm["geometry"] = nhoods_utm["geometry"].apply(lambda g: bytes(g.wkb))
33
+ con.sql("""
34
+ CREATE OR REPLACE TABLE neighborhoods_utm AS
35
+ SELECT nhood, ST_GeomFromWKB(geometry)::GEOMETRY AS geometry
36
+ FROM nhoods_utm
37
+ """)
38
+
39
+ SOCRATA_BASE = "https://data.sfgov.org/resource/m8hk-2ipk.csv"
40
+ MONTHS = [
41
+ ("Jan2024", "2024-01-01T00:00:00", "2024-01-31T23:59:59"),
42
+ ("Feb2024", "2024-02-01T00:00:00", "2024-02-29T23:59:59"),
43
+ ("Mar2024", "2024-03-01T00:00:00", "2024-03-31T23:59:59"),
44
+ ("Apr2024", "2024-04-01T00:00:00", "2024-04-30T23:59:59"),
45
+ ("May2024", "2024-05-01T00:00:00", "2024-05-31T23:59:59"),
46
+ ("Jun2024", "2024-06-01T00:00:00", "2024-06-30T23:59:59"),
47
+ ("Jul2024", "2024-07-01T00:00:00", "2024-07-31T23:59:59"),
48
+ ("Aug2024", "2024-08-01T00:00:00", "2024-08-31T23:59:59"),
49
+ ("Sep2024", "2024-09-01T00:00:00", "2024-09-30T23:59:59"),
50
+ ("Oct2024", "2024-10-01T00:00:00", "2024-10-31T23:59:59"),
51
+ ("Nov2024", "2024-11-01T00:00:00", "2024-11-30T23:59:59"),
52
+ ("Dec2024", "2024-12-01T00:00:00", "2024-12-31T23:59:59"),
53
+ ]
54
+ LIMIT = 1000
55
+ print("[2/6] Downloading SF taxi trips ...")
56
+ if not os.path.exists("raw_trips.csv"):
57
+ all_trips = []
58
+ for month_label, start, end in MONTHS:
59
+ OFFSET = 0
60
+ while True:
61
+ params = {
62
+ "$where": f"start_time_local between '{start}' and '{end}'",
63
+ "$limit": LIMIT,
64
+ "$offset": OFFSET,
65
+ "$order": "start_time_local"
66
+ }
67
+ response = requests.get(SOCRATA_BASE, params=params, timeout=30)
68
+ df = pd.read_csv(io.StringIO(response.text))
69
+ df["month"] = month_label
70
+ all_trips.append(df)
71
+ if len(df) < LIMIT:
72
+ break
73
+ OFFSET += LIMIT
74
+ print(f" {month_label}: {len(df)} rows")
75
+
76
+ trips_df = pd.concat(all_trips, ignore_index=True)
77
+ trips_df.to_csv("raw_trips.csv", index=False)
78
+ else:
79
+ trips_df = pd.read_csv("raw_trips.csv")
80
+
81
+ # Drop rows with missing or zero coordinates
82
+ trips_df = trips_df.dropna(
83
+ subset=[
84
+ "pickup_location_latitude", "pickup_location_longitude",
85
+ "dropoff_location_latitude", "dropoff_location_longitude",
86
+ ]
87
+ )
88
+ trips_df = trips_df[
89
+ (trips_df["pickup_location_latitude"] != 0)
90
+ & (trips_df["pickup_location_longitude"] != 0)
91
+ & (trips_df["dropoff_location_latitude"] != 0)
92
+ & (trips_df["dropoff_location_longitude"] != 0)
93
+ ]
94
+
95
+ # Normalise hail_type to two categories
96
+ def normalise_hail_type(hail_type):
97
+ if hail_type in ["street","dispatch"]:
98
+ return "Street"
99
+ else:
100
+ return "App"
101
+ trips_df["hail_type"] = trips_df["hail_type"].apply(normalise_hail_type)
102
+ bad_flags = ['DR', 'FTR', 'ST', 'ET']
103
+ trips_df = trips_df[
104
+ ~trips_df['qa_flags'].fillna('').apply(
105
+ lambda flags: any(f in flags.split('-') for f in bad_flags)
106
+ )
107
+ ]
108
+
109
+ con.sql("CREATE OR REPLACE TABLE raw_trips AS SELECT * FROM trips_df")
110
+
111
+ print("[3/6] Spatial join: pickup points to neighborhoods ...")
112
+ con.sql("""
113
+ CREATE OR REPLACE TABLE trip_counts_pu AS
114
+ SELECT
115
+ t.hail_type,
116
+ t.month,
117
+ n.nhood,
118
+ COUNT(*) AS trips_pu
119
+ FROM raw_trips AS t
120
+ JOIN neighborhoods AS n
121
+ ON ST_Intersects(
122
+ n.geometry,
123
+ ST_Point(t.pickup_location_longitude, t.pickup_location_latitude)::GEOMETRY
124
+ )
125
+ GROUP BY t.hail_type, t.month, n.nhood
126
+ """)
127
+
128
+ print("[3/6] Spatial join: dropoff points to neighborhoods ...")
129
+ con.sql("""
130
+ CREATE OR REPLACE TABLE trip_counts_do AS
131
+ SELECT
132
+ t.hail_type,
133
+ t.month,
134
+ n.nhood,
135
+ COUNT(*) AS trips_do
136
+ FROM raw_trips AS t
137
+ JOIN neighborhoods AS n
138
+ ON ST_Intersects(
139
+ n.geometry,
140
+ ST_Point(t.dropoff_location_longitude, t.dropoff_location_latitude)::GEOMETRY
141
+ )
142
+ GROUP BY t.hail_type, t.month, n.nhood
143
+ """)
144
+
145
+ top5 = con.sql("""
146
+ SELECT nhood, SUM(trips_pu) AS total
147
+ FROM trip_counts_pu GROUP BY nhood ORDER BY total DESC LIMIT 5
148
+ """).df()
149
+ print(" Top 5 pickup neighborhoods:")
150
+ print(top5.to_string(index=False))
151
+
152
+ print("[4/6] Computing neighborhood demographics ...")
153
+
154
+ # ACS 5-Year 2022, block groups in SF County (state=06, county=075)
155
+ response = requests.get(
156
+ "https://api.census.gov/data/2022/acs/acs5",
157
+ params={
158
+ "get": "B02001_001E,B02001_002E,B02001_003E,B02001_005E",
159
+ "ucgid": "pseudo(0500000US06075$1500000)"
160
+ },
161
+ timeout=30
162
+ )
163
+ data = response.json()
164
+
165
+ census_df = pd.DataFrame(data[1:], columns=data[0])
166
+
167
+ # Convert to numeric
168
+ for col in ["B02001_001E", "B02001_002E", "B02001_003E", "B02001_005E"]:
169
+ census_df[col] = pd.to_numeric(census_df[col], errors="coerce")
170
+
171
+ census_df = census_df.rename(columns={
172
+ "B02001_001E": "total_pop",
173
+ "B02001_002E": "white_pop",
174
+ "B02001_003E": "black_pop",
175
+ "B02001_005E": "asian_pop",
176
+ })
177
+
178
+ census_df["GEOID"] = census_df["ucgid"].str[-12:]
179
+ BG_URL = "https://www2.census.gov/geo/tiger/TIGER2022/BG/tl_2022_06_bg.zip"
180
+ bg_gdf = gpd.read_file(BG_URL)
181
+ bg_gdf = bg_gdf[bg_gdf["COUNTYFP"] == "075"] # SF county only
182
+ bg_gdf = bg_gdf[["GEOID", "geometry"]].copy()
183
+ bg_gdf = bg_gdf.to_crs("EPSG:4326")
184
+
185
+ # Merge census data with geometries
186
+ census_gdf = bg_gdf.merge(
187
+ census_df[["GEOID", "total_pop", "white_pop", "black_pop", "asian_pop"]],
188
+ on="GEOID",
189
+ how="inner"
190
+ )
191
+ census_db = pd.DataFrame(census_gdf)
192
+ census_db["geometry"] = census_gdf["geometry"].apply(lambda g: bytes(g.wkb))
193
+ con.register("census_raw", census_db)
194
+ con.sql("""
195
+ CREATE OR REPLACE TABLE census_blocks AS
196
+ SELECT GEOID, total_pop, white_pop, black_pop, asian_pop,
197
+ ST_GeomFromWKB(geometry)::GEOMETRY AS geometry
198
+ FROM census_raw
199
+ """)
200
+ con.sql("""
201
+ CREATE OR REPLACE TABLE nhood_demographics AS
202
+ SELECT
203
+ n.nhood,
204
+ SUM(cb.total_pop) AS total_pop,
205
+ SUM(cb.white_pop) AS white_pop,
206
+ SUM(cb.black_pop) AS black_pop,
207
+ SUM(cb.asian_pop) AS asian_pop,
208
+ CASE WHEN SUM(cb.total_pop) > 0
209
+ THEN 100.0 * SUM(cb.white_pop) / SUM(cb.total_pop)
210
+ ELSE 0 END AS white_pct,
211
+ CASE WHEN SUM(cb.total_pop) > 0
212
+ THEN 100.0 * SUM(cb.black_pop) / SUM(cb.total_pop)
213
+ ELSE 0 END AS black_pct,
214
+ CASE WHEN SUM(cb.total_pop) > 0
215
+ THEN 100.0 * SUM(cb.asian_pop) / SUM(cb.total_pop)
216
+ ELSE 0 END AS asian_pct
217
+ FROM census_blocks AS cb
218
+ JOIN neighborhoods AS n
219
+ ON ST_Intersects(n.geometry, cb.geometry)
220
+ GROUP BY n.nhood
221
+ """)
222
+ con.sql("SELECT * FROM nhood_demographics ORDER BY total_pop DESC LIMIT 10").df()
223
+
224
+ print("[5/6] Computing city-wide baselines ...")
225
+ baseline_df = con.sql("""
226
+ SELECT
227
+ ROUND(100.0 * SUM(white_pop) / SUM(total_pop), 2) AS baseline_white_pct,
228
+ ROUND(100.0 * SUM(black_pop) / SUM(total_pop), 2) AS baseline_black_pct,
229
+ ROUND(100.0 * SUM(asian_pop) / SUM(total_pop), 2) AS baseline_asian_pct
230
+ FROM nhood_demographics
231
+ WHERE total_pop > 0
232
+ """).df()
233
+ con.sql("CREATE OR REPLACE TABLE city_baselines AS SELECT * FROM baseline_df")
234
+ print(f" Baselines: {baseline_df.to_dict('records')[0]}")
235
+
236
+ bw = float(baseline_df["baseline_white_pct"].iloc[0])
237
+ bb = float(baseline_df["baseline_black_pct"].iloc[0])
238
+ ba = float(baseline_df["baseline_asian_pct"].iloc[0])
239
+
240
+ print("[6/6] Computing representative ratios ...")
241
+ rr_pu_df = con.sql(f"""
242
+ SELECT tp.hail_type, tp.month,
243
+ SUM(tp.trips_pu * nd.white_pct) * 1.0
244
+ / SUM(tp.trips_pu) / {bw} AS RR_white_PU,
245
+ SUM(tp.trips_pu * nd.asian_pct) * 1.0
246
+ / SUM(tp.trips_pu) / {ba} AS RR_asian_PU
247
+ FROM trip_counts_pu AS tp
248
+ JOIN nhood_demographics AS nd ON tp.nhood = nd.nhood
249
+ WHERE nd.total_pop > 0
250
+ GROUP BY tp.hail_type, tp.month
251
+ """).df()
252
+
253
+ rr_do_df = con.sql(f"""
254
+ SELECT td.hail_type, td.month,
255
+ SUM(td.trips_do * nd.white_pct) * 1.0
256
+ / SUM(td.trips_do) / {bw} AS RR_white_DO,
257
+ SUM(td.trips_do * nd.asian_pct) * 1.0
258
+ / SUM(td.trips_do) / {ba} AS RR_asian_DO
259
+ FROM trip_counts_do AS td
260
+ JOIN nhood_demographics AS nd ON td.nhood = nd.nhood
261
+ WHERE nd.total_pop > 0
262
+ GROUP BY td.hail_type, td.month
263
+ """).df()
264
+
265
+ rr_combined = pd.merge(
266
+ rr_pu_df, rr_do_df, on=["hail_type", "month"], how="outer"
267
+ )
268
+ con.sql("CREATE OR REPLACE TABLE representative_ratios AS SELECT * FROM rr_combined")
269
+
270
+ print("\nPipeline complete. Database: sf_dashboard.db")
271
+ print("Representative ratios:")
272
+ print(rr_combined.to_string(index=False))
273
+
274
+ con.close()
requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ dash>=2.14.0
2
+ dash-bootstrap-components>=1.5.0
3
+ plotly>=5.18.0
4
+ duckdb>=1.0.0
5
+ pandas>=2.0.0
6
+ geopandas>=0.14.0
7
+ shapely>=2.0.0
8
+ pyproj>=3.6.0
9
+ requests>=2.28.0
10
+ gunicorn>=21.2.0