π DoorDash Delivery Time Predictor
π Project Overview
This project analyzes and predicts DoorDash delivery times using historical order data. We tackle the problem as both a regression task (predict exact duration) and a classification task (predict Fast / Normal / Slow delivery).
π¦ Dataset
| Property | Value |
|---|---|
| Source | Kaggle β DoorDash Historical Delivery Data |
| Rows | 197,428 orders |
| Features | 16 columns (numeric + categorical) |
| Target | delivery_duration_seconds |
Key Features:
total_onshift_dashersβ how many dashers are availabletotal_busy_dashersβ how many dashers are currently occupiedtotal_outstanding_ordersβ current order backlogstore_primary_categoryβ type of restaurantestimated_store_to_consumer_driving_durationβ estimated driving time
π Research Question
Can we predict how long a DoorDash delivery will take based on order details, restaurant category, and real-time dasher availability?
βοΈ Feature Engineering
We created 9 new features from the raw data:
dasher_util_ratioβ how busy are available dashers (busy/onshift)order_pressureβ outstanding orders per available dasheris_peak_lunch/is_peak_dinnerβ peak hour flagsis_weekendβ weekend flaglog_subtotalβ log-transformed order valueprice_spreadβ range between cheapest and most expensive itemdasher_idleβ dashers available but not workingclusterβ KMeans cluster ID (5 clusters)
π€ Models
Regression
| Model | MAE | RMSE | RΒ² |
|---|---|---|---|
| Baseline Linear Regression | 11.2 min | 14.7 min | 0.233 |
| Linear Regression (Engineered) | 11.2 min | 14.7 min | 0.240 |
| Random Forest | 10.8 min | 14.1 min | 0.294 |
| Gradient Boosting β | 10.5 min | 13.9 min | 0.319 |
![]() |
Classification (Fast / Normal / Slow)
| Model | Macro F1 |
|---|---|
| Logistic Regression | 0.502 |
| Random Forest | 0.518 |
| Gradient Boosting β | 0.533 |
![]() |
π Winners
- Regression:
GradientBoostingRegressorβ RΒ² = 0.319 - Classification:
GradientBoostingClassifierβ Macro F1 = 0.533
π Repository Files
| File | Description |
|---|---|
regression_model.pkl |
Trained GradientBoostingRegressor + scaler + features |
classification_model.pkl |
Trained GradientBoostingClassifier + thresholds |
notebook.ipynb |
Full analysis, EDA, training and evaluation code |
π·οΈ Classification Strategy
Delivery times split into 3 equal quantiles:
- π’ Fast (0): under 38.1 minutes
- π‘ Normal (1): 38.1 β 51.4 minutes
- π΄ Slow (2): over 51.4 minutes

