sks01dev's picture
Create readme.md
4108ad2
|
raw
history blame
3.43 kB

Machine Learning Zoomcamp 2025 - Homework 3

Python Pandas Scikit-Learn Jupyter


Homework 3: Machine Learning for Classification

This repository contains solutions for Homework 3 of Machine Learning Zoomcamp 2025, focused on classification tasks using the Bank Marketing dataset.


πŸ“‚ Project Overview

  • Dataset: Bank Marketing Dataset
  • Target variable: converted (whether the client signed up)
  • Objective: Data preprocessing, exploratory analysis, feature selection, and training logistic regression models (regularized and unregularized).

Tech Stack:

  • Python 3.11 – core programming language
  • Pandas – data manipulation
  • NumPy – numerical operations
  • Scikit-Learn – machine learning models, feature selection, evaluation
  • Jupyter Notebook – interactive coding and documentation

πŸ”Ή Questions & Answers

Question Task Answer
1 Mode of industry retail
2 Biggest correlation (numerical features) annual_income and interaction_count
3 Biggest mutual information (categorical features) lead_source
4 Logistic regression validation accuracy 0.74
5 Least useful feature (feature elimination) lead_score
6 Best C value for regularized logistic regression 1

πŸ“Œ Approach / Key Steps

  1. Data Cleaning & Preparation

    • Filled missing values: categorical β†’ 'NA', numerical β†’ 0.0
    • Verified feature types and correlations
  2. Exploratory Analysis

    • Mode of categorical variables
    • Correlation matrix for numerical features
  3. Feature Selection

    • Calculated mutual information for categorical variables using mutual_info_score
    • Identified least useful features via feature elimination
  4. Model Training

    • Logistic Regression with one-hot encoded categorical variables
    • Regularized logistic regression with hyperparameter tuning (C values)

πŸ“ˆ Results

  • Baseline logistic regression accuracy: 0.74
  • Least useful feature: lead_score
  • Best regularization parameter C: 1

βš™ How to Run

  1. Clone the repository:

    git clone https://github.com/yourusername/ml-zoomcamp-hw3.git
    
  2. Install requirements:

    pip install -r requirements.txt
    
  3. Open the Jupyter Notebook and run cells sequentially:

    jupyter notebook
    

πŸ“š References