| --- |
| language: |
| - en |
| license: mit |
| tags: |
| - tabular-regression |
| - tabular-classification |
| - doordash |
| - delivery-time |
| --- |
| |
| <div align="center"> |
| <video width="100%" controls autoplay loop muted> |
| <source src="https://huggingface.co/almador2002/doordash-delivery-predictor/resolve/main/d08aecd4dc994e8b9fae392148fa0c6d.mp4" type="video/mp4"> |
| Your browser does not support the video tag. |
| </video> |
| </div> |
| |
| # π DoorDash Delivery Time Predictor |
|
|
| ## π Project Overview |
| This project analyzes and predicts DoorDash delivery times using historical order data. |
| We tackle the problem as both a **regression task** (predict exact duration) and a |
| **classification task** (predict Fast / Normal / Slow delivery). |
|
|
| --- |
|
|
| ## π¦ Dataset |
| | Property | Value | |
| |----------|-------| |
| | Source | Kaggle β DoorDash Historical Delivery Data | |
| | Rows | 197,428 orders | |
| | Features | 16 columns (numeric + categorical) | |
| | Target | `delivery_duration_seconds` | |
|
|
| **Key Features:** |
| - `total_onshift_dashers` β how many dashers are available |
| - `total_busy_dashers` β how many dashers are currently occupied |
| - `total_outstanding_orders` β current order backlog |
| - `store_primary_category` β type of restaurant |
| - `estimated_store_to_consumer_driving_duration` β estimated driving time |
|
|
|  |
|
|
| --- |
|
|
| ## π Research Question |
| > *Can we predict how long a DoorDash delivery will take based on order details, |
| > restaurant category, and real-time dasher availability?* |
|
|
| --- |
|
|
| ## βοΈ Feature Engineering |
| We created 9 new features from the raw data: |
| - **`dasher_util_ratio`** β how busy are available dashers (busy/onshift) |
| - **`order_pressure`** β outstanding orders per available dasher |
| - **`is_peak_lunch`** / **`is_peak_dinner`** β peak hour flags |
| - **`is_weekend`** β weekend flag |
| - **`log_subtotal`** β log-transformed order value |
| - **`price_spread`** β range between cheapest and most expensive item |
| - **`dasher_idle`** β dashers available but not working |
| - **`cluster`** β KMeans cluster ID (5 clusters) |
| |
| --- |
| |
| ## π€ Models |
| |
| ### Regression |
| | Model | MAE | RMSE | RΒ² | |
| |-------|-----|------|----| |
| | Baseline Linear Regression | 11.2 min | 14.7 min | 0.233 | |
| | Linear Regression (Engineered) | 11.2 min | 14.7 min | 0.240 | |
| | Random Forest | 10.8 min | 14.1 min | 0.294 | |
| | **Gradient Boosting β** | **10.5 min** | **13.9 min** | **0.319** | |
| |
| <img src="model_comparison.png" width="800"/> |
| |
| ### Classification (Fast / Normal / Slow) |
| | Model | Macro F1 | |
| |-------|----------| |
| | Logistic Regression | 0.502 | |
| | Random Forest | 0.518 | |
| | **Gradient Boosting β** | **0.533** | |
| |
| <img src="confusion_matrices.png" width="800"/> |
| |
| --- |
| |
| ## π Winners |
| - **Regression:** `GradientBoostingRegressor` β RΒ² = 0.319 |
| - **Classification:** `GradientBoostingClassifier` β Macro F1 = 0.533 |
| |
| --- |
| |
| ## π Repository Files |
| | File | Description | |
| |------|-------------| |
| | `regression_model.pkl` | Trained GradientBoostingRegressor + scaler + features | |
| | `classification_model.pkl` | Trained GradientBoostingClassifier + thresholds | |
| | `notebook.ipynb` | Full analysis, EDA, training and evaluation code | |
| |
| --- |
| |
| ## π·οΈ Classification Strategy |
| Delivery times split into 3 equal quantiles: |
| - π’ **Fast (0):** under 38.1 minutes |
| - π‘ **Normal (1):** 38.1 β 51.4 minutes |
| - π΄ **Slow (2):** over 51.4 minutes |
| |
| --- |
| |
| ## π οΈ Tech Stack |
|  |
|  |
|  |
|  |