{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ML Practice Series: Module 14 - Gradient Boosting & XGBoost\n", "\n", "Welcome to Module 14! We're moving into **Boosting**, where we train models sequentially to correct previous errors. This includes **Gradient Boosting** and its optimized version, **XGBoost**.\n", "\n", "### Resources:\n", "Refer to the **[Boosting Section](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub for a comparison of Bagging vs. Boosting and interactive diagrams of residual refinement.\n", "\n", "### Objectives:\n", "1. **Boosting Principle**: How weak learners become strong learners.\n", "2. **XGBoost**: Extreme Gradient Boosting and its hardware efficiency.\n", "3. **Tuning**: Learning rates, tree depth, and subsampling.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Setup\n", "We will use the **Wine Quality** dataset from Scikit-Learn (regression)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from sklearn.datasets import load_wine\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.ensemble import GradientBoostingClassifier\n", "from sklearn.metrics import accuracy_score, classification_report\n", "\n", "# For XGBoost, you'll need the library installed\n", "# (pip install xgboost)\n", "import xgboost as xgb\n", "\n", "# Load dataset\n", "wine = load_wine()\n", "X = wine.data\n", "y = wine.target\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Gradient Boosting\n", "\n", "### Task 1: Scikit-Learn Gradient Boosting\n", "Train a `GradientBoostingClassifier` and evaluate it on the test set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "gb = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)\n", "gb.fit(X_train, y_train)\n", "y_pred = gb.predict(X_test)\n", "print(\"GB Accuracy:\", accuracy_score(y_test, y_pred))\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. XGBoost (The Kaggle Champion)\n", "\n", "### Task 2: Training XGBoost\n", "Use the `XGBClassifier` to train a model and check its performance. Notice the speed advantage.\n", "\n", "*Web Reference: [XGBoost Section on your site](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "xgb_model = xgb.XGBClassifier(n_estimators=100, learning_rate=0.1, use_label_encoder=False, eval_metric='mlogloss')\n", "xgb_model.fit(X_train, y_train)\n", "y_pred_xgb = xgb_model.predict(X_test)\n", "print(\"XGB Accuracy:\", accuracy_score(y_test, y_pred_xgb))\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "### Power Move! \n", "You've learned how to harness Gradient Boosting. These models are often the most accurate for structured data.\n", "Next: **Dimensionality Reduction (PCA)**." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 4 }