πŸš€ DoorDash Delivery Time Predictor

πŸ“‹ Project Overview

This project analyzes and predicts DoorDash delivery times using historical order data. We tackle the problem as both a regression task (predict exact duration) and a classification task (predict Fast / Normal / Slow delivery).


πŸ“¦ Dataset

Property Value
Source Kaggle β€” DoorDash Historical Delivery Data
Rows 197,428 orders
Features 16 columns (numeric + categorical)
Target delivery_duration_seconds

Key Features:

  • total_onshift_dashers β€” how many dashers are available
  • total_busy_dashers β€” how many dashers are currently occupied
  • total_outstanding_orders β€” current order backlog
  • store_primary_category β€” type of restaurant
  • estimated_store_to_consumer_driving_duration β€” estimated driving time
  • EDA

πŸ” Research Question

Can we predict how long a DoorDash delivery will take based on order details, restaurant category, and real-time dasher availability?


βš™οΈ Feature Engineering

We created 9 new features from the raw data:

  • dasher_util_ratio β€” how busy are available dashers (busy/onshift)
  • order_pressure β€” outstanding orders per available dasher
  • is_peak_lunch / is_peak_dinner β€” peak hour flags
  • is_weekend β€” weekend flag
  • log_subtotal β€” log-transformed order value
  • price_spread β€” range between cheapest and most expensive item
  • dasher_idle β€” dashers available but not working
  • cluster β€” KMeans cluster ID (5 clusters)

πŸ€– Models

Regression

Model MAE RMSE RΒ²
Baseline Linear Regression 11.2 min 14.7 min 0.233
Linear Regression (Engineered) 11.2 min 14.7 min 0.240
Random Forest 10.8 min 14.1 min 0.294
Gradient Boosting βœ“ 10.5 min 13.9 min 0.319

Classification (Fast / Normal / Slow)

Model Macro F1
Logistic Regression 0.502
Random Forest 0.518
Gradient Boosting βœ“ 0.533

πŸ† Winners

  • Regression: GradientBoostingRegressor β€” RΒ² = 0.319
  • Classification: GradientBoostingClassifier β€” Macro F1 = 0.533

πŸ“ Repository Files

File Description
regression_model.pkl Trained GradientBoostingRegressor + scaler + features
classification_model.pkl Trained GradientBoostingClassifier + thresholds
notebook.ipynb Full analysis, EDA, training and evaluation code

🏷️ Classification Strategy

Delivery times split into 3 equal quantiles:

  • 🟒 Fast (0): under 38.1 minutes
  • 🟑 Normal (1): 38.1 – 51.4 minutes
  • πŸ”΄ Slow (2): over 51.4 minutes

πŸ› οΈ Tech Stack

Python scikit-learn pandas numpy

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support