Machine Learning Zoomcamp 2025 - Homework 3
Homework 3: Machine Learning for Classification
This repository contains solutions for Homework 3 of Machine Learning Zoomcamp 2025, focused on classification tasks using the Bank Marketing dataset.
π Project Overview
- Dataset: Bank Marketing Dataset
- Target variable:
converted(whether the client signed up) - Objective: Data preprocessing, exploratory analysis, feature selection, and training logistic regression models (regularized and unregularized).
Tech Stack:
- Python 3.11 β core programming language
- Pandas β data manipulation
- NumPy β numerical operations
- Scikit-Learn β machine learning models, feature selection, evaluation
- Jupyter Notebook β interactive coding and documentation
πΉ Questions & Answers
| Question | Task | Answer |
|---|---|---|
| 1 | Mode of industry |
retail |
| 2 | Biggest correlation (numerical features) | annual_income and interaction_count |
| 3 | Biggest mutual information (categorical features) | lead_source |
| 4 | Logistic regression validation accuracy | 0.74 |
| 5 | Least useful feature (feature elimination) | lead_score |
| 6 | Best C value for regularized logistic regression |
1 |
π Approach / Key Steps
Data Cleaning & Preparation
- Filled missing values: categorical β
'NA', numerical β0.0 - Verified feature types and correlations
- Filled missing values: categorical β
Exploratory Analysis
- Mode of categorical variables
- Correlation matrix for numerical features
Feature Selection
- Calculated mutual information for categorical variables using
mutual_info_score - Identified least useful features via feature elimination
- Calculated mutual information for categorical variables using
Model Training
- Logistic Regression with one-hot encoded categorical variables
- Regularized logistic regression with hyperparameter tuning (
Cvalues)
π Results
- Baseline logistic regression accuracy: 0.74
- Least useful feature:
lead_score - Best regularization parameter
C: 1
β How to Run
Clone the repository:
git clone https://github.com/yourusername/ml-zoomcamp-hw3.gitInstall requirements:
pip install -r requirements.txtOpen the Jupyter Notebook and run cells sequentially:
jupyter notebook
π References
- Bank Marketing Dataset
- Scikit-Learn Documentation
- Pandas Documentation
- NumPy Documentation
- Jupyter Notebook Documentation