{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# ML Practice Series: Module 14 - Gradient Boosting & XGBoost\n",
                "\n",
                "Welcome to Module 14! We're moving into **Boosting**, where we train models sequentially to correct previous errors. This includes **Gradient Boosting** and its optimized version, **XGBoost**.\n",
                "\n",
                "### Resources:\n",
                "Refer to the **[Boosting Section](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub for a comparison of Bagging vs. Boosting and interactive diagrams of residual refinement.\n",
                "\n",
                "### Objectives:\n",
                "1. **Boosting Principle**: How weak learners become strong learners.\n",
                "2. **XGBoost**: Extreme Gradient Boosting and its hardware efficiency.\n",
                "3. **Tuning**: Learning rates, tree depth, and subsampling.\n",
                "\n",
                "---"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1. Setup\n",
                "We will use the **Wine Quality** dataset from Scikit-Learn (regression)."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd\n",
                "import numpy as np\n",
                "from sklearn.datasets import load_wine\n",
                "from sklearn.model_selection import train_test_split\n",
                "from sklearn.ensemble import GradientBoostingClassifier\n",
                "from sklearn.metrics import accuracy_score, classification_report\n",
                "\n",
                "# For XGBoost, you'll need the library installed\n",
                "# (pip install xgboost)\n",
                "import xgboost as xgb\n",
                "\n",
                "# Load dataset\n",
                "wine = load_wine()\n",
                "X = wine.data\n",
                "y = wine.target\n",
                "\n",
                "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 2. Gradient Boosting\n",
                "\n",
                "### Task 1: Scikit-Learn Gradient Boosting\n",
                "Train a `GradientBoostingClassifier` and evaluate it on the test set."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "gb = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)\n",
                "gb.fit(X_train, y_train)\n",
                "y_pred = gb.predict(X_test)\n",
                "print(\"GB Accuracy:\", accuracy_score(y_test, y_pred))\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 3. XGBoost (The Kaggle Champion)\n",
                "\n",
                "### Task 2: Training XGBoost\n",
                "Use the `XGBClassifier` to train a model and check its performance. Notice the speed advantage.\n",
                "\n",
                "*Web Reference: [XGBoost Section on your site](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)*"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "xgb_model = xgb.XGBClassifier(n_estimators=100, learning_rate=0.1, use_label_encoder=False, eval_metric='mlogloss')\n",
                "xgb_model.fit(X_train, y_train)\n",
                "y_pred_xgb = xgb_model.predict(X_test)\n",
                "print(\"XGB Accuracy:\", accuracy_score(y_test, y_pred_xgb))\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "--- \n",
                "### Power Move! \n",
                "You've learned how to harness Gradient Boosting. These models are often the most accurate for structured data.\n",
                "Next: **Dimensionality Reduction (PCA)**."
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.12.7"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}