| # Lead Scoring with Bank Marketing Dataset |
|
|
| [](https://www.python.org/) |
| [](https://scikit-learn.org/) |
| [](https://jupyter.org/) |
|
|
| --- |
|
|
| ## Overview |
|
|
| This notebook demonstrates building a **lead scoring model** using the Bank Marketing dataset. The goal is to predict whether a client will **convert** (sign up for a service) based on various features. |
|
|
| We cover: |
|
|
| 1. Data preparation and handling missing values. |
| 2. Feature importance using ROC AUC for numerical variables. |
| 3. Logistic regression modeling with **one-hot encoding**. |
| 4. Precision, recall, and F1 score analysis to select thresholds. |
| 5. 5-fold cross-validation to check model stability. |
| 6. Hyperparameter tuning to select the best regularization parameter. |
|
|
| --- |
|
|
| ## Key Results |
|
|
| - **Best numerical feature (ROC AUC):** `number_of_courses_viewed` |
| - **Validation AUC:** `0.794` |
| - **Threshold where precision ≈ recall:** `0.59` |
| - **Threshold with max F1:** `0.47` |
| - **Standard deviation of AUC across folds:** `0.01` |
| - **Best regularization parameter C:** `0.001` |
|
|
| --- |
|
|
| ## Lessons Learned |
|
|
| - ROC AUC can help identify predictive features even before modeling. |
| - Logistic regression combined with one-hot encoding provides a strong baseline. |
| - Threshold tuning is crucial for balancing precision and recall based on business needs. |
| - Cross-validation confirms the robustness of the model and prevents overfitting. |
| - Hyperparameter tuning improves model performance and reliability. |
|
|
| --- |
|
|
| ## Environment |
|
|
| - Python 3.12 |
| - Jupyter Notebook |
| - Libraries: `pandas`, `numpy`, `scikit-learn`, `matplotlib`, `seaborn` |
|
|
| --- |
|
|
| ## Dataset |
|
|
| Bank Marketing dataset used in this project is publicly available: |
| [Bank Marketing Dataset CSV](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/course_lead_scoring.csv) |
|
|
| --- |
|
|
| ## Author |
|
|
| Created as part of **ML Zoomcamp 2025 Homework 4**. |
|
|
|
|