{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# ML Practice Series: Module 13 - Naive Bayes\n",
                "\n",
                "Welcome to Module 13! We're exploring **Naive Bayes**, a probabilistic classifier based on Bayes' Theorem with the \"naive\" assumption of independence between features.\n",
                "\n",
                "### Resources:\n",
                "Refer to the **[Naive Bayes Section](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub for the mathematical derivation of $P(A|B)$ and how it's used in spam filtering.\n",
                "\n",
                "### Objectives:\n",
                "1. **Bayes Theorem**: Calculating posterior probability.\n",
                "2. **Different Variants**: Gaussian vs Multinomial vs Bernoulli.\n",
                "3. **Text Classification**: Using Naive Bayes for NLP tasks.\n",
                "\n",
                "---"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1. Setup\n",
                "We will use a small text dataset for **Spam detection**."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd \n",
                "from sklearn.model_selection import train_test_split\n",
                "from sklearn.feature_extraction.text import CountVectorizer\n",
                "from sklearn.naive_bayes import MultinomialNB\n",
                "from sklearn.metrics import accuracy_score, confusion_matrix\n",
                "\n",
                "# Sample Text Data\n",
                "data = {\n",
                "    'text': [\n",
                "        'Free money now!', \n",
                "        'Hi, how are you?', \n",
                "        'Limited offer, buy now!', \n",
                "        'Meeting at 5pm', \n",
                "        'Win a prize today!', \n",
                "        'Review the documents'\n",
                "    ],\n",
                "    'label': [1, 0, 1, 0, 1, 0]  # 1 = Spam, 0 = Ham\n",
                "}\n",
                "df = pd.DataFrame(data)\n",
                "df"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 2. Text Preprocessing\n",
                "\n",
                "### Task 1: Vectorization\n",
                "Machine learning models can't read text directly. Use `CountVectorizer` to convert text into a matrix of token counts."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "cv = CountVectorizer(stop_words='english')\n",
                "X = cv.fit_transform(df['text'])\n",
                "y = df['label']\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 3. Training & Prediction\n",
                "\n",
                "### Task 2: Multinomial NB\n",
                "Fit a `MultinomialNB` model and predict the class for a new message: \"Win money buy now\"."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "nb = MultinomialNB()\n",
                "nb.fit(X, y)\n",
                "\n",
                "new_msg = [\"Win money buy now\"]\n",
                "new_vec = cv.transform(new_msg)\n",
                "prediction = nb.predict(new_vec)\n",
                "print(\"Spam\" if prediction[0] == 1 else \"Ham\")\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "--- \n",
                "### Excellent Probabilistic Thinking! \n",
                "Naive Bayes is often the baseline for NLP projects because it's fast and effective.\n",
                "Next: **Gradient Boosting & XGBoost**."
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.12.7"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}