Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,703 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ChurnPredict Pro: A Stacking Ensemble Framework for Customer Churn Prediction with Explainable AI and CLV Scoring
|
| 2 |
+
|
| 3 |
+
> **Subtitle:** End-to-End Machine Learning Pipeline for Telecommunications and Banking Customer Retention β Combining Gradient Boosting, Neural Networks, and Game-Theoretic Interpretability
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Table of Contents
|
| 8 |
+
|
| 9 |
+
1. [Problem Statement](#1-problem-statement)
|
| 10 |
+
2. [Idea of Solution](#2-idea-of-solution)
|
| 11 |
+
3. [Objectives](#3-objectives)
|
| 12 |
+
4. [Literature Review & References](#4-literature-review--references)
|
| 13 |
+
5. [Dataset Understanding](#5-dataset-understanding)
|
| 14 |
+
6. [Proposed Methodology](#6-proposed-methodology)
|
| 15 |
+
7. [Implementation Strategy](#7-implementation-strategy)
|
| 16 |
+
8. [Experimental Design](#8-experimental-design)
|
| 17 |
+
9. [Result Analysis](#9-result-analysis)
|
| 18 |
+
10. [Iterative Improvement](#10-iterative-improvement)
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## 1. Problem Statement
|
| 23 |
+
|
| 24 |
+
### 1.1 Business Context
|
| 25 |
+
|
| 26 |
+
Customer churn β the loss of clients to competitors or market attrition β is one of the most financially consequential challenges in subscription-based and service-oriented industries. In telecommunications, acquiring a new customer costs **5β25Γ more** than retaining an existing one (industry estimates, 2024). In banking, customer attrition erodes lifetime value portfolios and damages brand equity. For both sectors, even a **1% reduction in churn** can translate to millions in retained revenue.
|
| 27 |
+
|
| 28 |
+
Current retention strategies suffer from two critical gaps:
|
| 29 |
+
- **Reactive approaches:** Firms typically respond to churn *after* it occurs, through win-back campaigns that are expensive and low-yield.
|
| 30 |
+
- **Black-box predictions:** Machine learning models deployed in production often lack interpretability, making it impossible for marketing and customer-success teams to act on model outputs with confidence.
|
| 31 |
+
|
| 32 |
+
### 1.2 Technical Challenges
|
| 33 |
+
|
| 34 |
+
| Challenge | Description | Impact |
|
| 35 |
+
|-----------|-------------|--------|
|
| 36 |
+
| **Class Imbalance** | Churners typically represent 10β30% of the customer base. Standard accuracy metrics are misleading. | High false-negative rates; missed at-risk customers |
|
| 37 |
+
| **Feature Heterogeneity** | Datasets mix categorical (contract type, payment method), numerical (tenure, charges), and temporal features (quarter, month-on-book). | Preprocessing complexity; risk of data leakage |
|
| 38 |
+
| **Concept Drift** | Customer behavior patterns shift seasonally and with market conditions. Models degrade without retraining. | Production model staleness; declining precision |
|
| 39 |
+
| **Interpretability vs. Performance Trade-off** | High-accuracy ensembles are often opaque. Explainable models (e.g., logistic regression) underperform on tabular data. | Regulatory non-compliance (GDPR Article 22); low stakeholder trust |
|
| 40 |
+
| **Multi-Domain Generalization** | Models trained on telecom data fail on banking data due to domain shift in feature distributions. | Siloed, non-reusable models per industry |
|
| 41 |
+
|
| 42 |
+
### 1.3 Gaps in Existing Solutions
|
| 43 |
+
|
| 44 |
+
1. **Single-model reliance:** Most production churn models deploy a single classifier (XGBoost or logistic regression), missing the variance-reduction benefits of ensemble diversity.
|
| 45 |
+
2. **No CLV integration:** Churn predictions are binary β they do not incorporate *which* churners are most valuable to retain, leading to inefficient marketing spend.
|
| 46 |
+
3. **Weak experimental rigor:** Many published churn studies use a single train/test split without cross-validation, statistical testing, or confidence intervals on metrics.
|
| 47 |
+
4. **Dataset isolation:** Telco and bank churn datasets are studied separately; few works evaluate cross-domain transfer or unified pipelines.
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
## 2. Idea of Solution
|
| 52 |
+
|
| 53 |
+
### 2.1 Architecture Overview
|
| 54 |
+
|
| 55 |
+
We propose **ChurnPredict Pro**, a **stacking ensemble architecture** that combines the complementary strengths of five diverse base learners under a meta-learner. The design philosophy is:
|
| 56 |
+
|
| 57 |
+
> *"Diversity in inductive bias reduces variance; interpretability in the meta-layer preserves actionability."*
|
| 58 |
+
|
| 59 |
+
### 2.2 The 5-Model Stacking Ensemble
|
| 60 |
+
|
| 61 |
+
```
|
| 62 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 63 |
+
β CHURNPRED PRO β STACKING ENSEMBLE β
|
| 64 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
| 65 |
+
β β
|
| 66 |
+
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββ β
|
| 67 |
+
β β XGBoost β βLightGBM β βCatBoost β β MLP β β LR β β
|
| 68 |
+
β β (GBDT) β β (GBDT) β β (OGB) β β (Deep) β β(Base)β β
|
| 69 |
+
β β Base 1 β β Base 2 β β Base 3 β β Base 4 β βBase 5β β
|
| 70 |
+
β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββ¬ββββ β
|
| 71 |
+
β β β β β β β
|
| 72 |
+
β βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄βββββββββββββ β
|
| 73 |
+
β β β
|
| 74 |
+
β βββββββββββΌββββββββββ β
|
| 75 |
+
β β META-LEARNER β β
|
| 76 |
+
β β (Logistic Reg β β
|
| 77 |
+
β β / XGBoost) β β
|
| 78 |
+
β βββββββββββ¬ββββββββββ β
|
| 79 |
+
β β β
|
| 80 |
+
β βββββββββββΌββββββββββ β
|
| 81 |
+
β β CLV SCORING β β
|
| 82 |
+
β β + SHAP EXPLAINER β β
|
| 83 |
+
β βββββββββββββββββββββ β
|
| 84 |
+
β β
|
| 85 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
### 2.3 Why These 5 Base Models?
|
| 89 |
+
|
| 90 |
+
| Model | Inductive Bias | Strength on Churn Data | Weakness Mitigated by Ensemble |
|
| 91 |
+
|-------|---------------|------------------------|-------------------------------|
|
| 92 |
+
| **XGBoost** | Greedy gradient boosting with regularization | Best-in-class on sparse/tabular data; handles missing values natively | Prone to overfitting on small datasets |
|
| 93 |
+
| **LightGBM** | Histogram-based leaf-wise boosting | Faster training; GOSS sampling for large data | Leaf-wise can overfit; GOSS introduces bias |
|
| 94 |
+
| **CatBoost** | Ordered boosting + categorical encoding | Native categorical feature handling; reduces target leakage | Slower than LightGBM; ordered boosting complexity |
|
| 95 |
+
| **MLP (Deep)** | Non-linear feature interactions | Captures complex feature cross-products | Needs more data; less interpretable |
|
| 96 |
+
| **Logistic Regression** | Linear decision boundary | Fast, interpretable baseline; L1 regularization for feature selection | Cannot model non-linear relationships |
|
| 97 |
+
|
| 98 |
+
The meta-learner (Logistic Regression or a shallow XGBoost) learns optimal weights for combining the five base models' predictions, leveraging their uncorrelated errors.
|
| 99 |
+
|
| 100 |
+
### 2.4 CLV-Weighted Scoring
|
| 101 |
+
|
| 102 |
+
Instead of ranking customers by churn probability alone, we multiply P(churn) by estimated CLV to produce a **Retention Priority Score (RPS)**:
|
| 103 |
+
|
| 104 |
+
$$
|
| 105 |
+
\text{RPS}_i = P(\text{churn}_i) \times \text{CLV}_i
|
| 106 |
+
$$
|
| 107 |
+
|
| 108 |
+
This ensures retention campaigns target high-value at-risk customers, maximizing ROI.
|
| 109 |
+
|
| 110 |
+
---
|
| 111 |
+
|
| 112 |
+
## 3. Objectives
|
| 113 |
+
|
| 114 |
+
### 3.1 Primary Goals
|
| 115 |
+
|
| 116 |
+
| ID | Objective | Metric Target | Success Criterion |
|
| 117 |
+
|----|-----------|---------------|-----------------|
|
| 118 |
+
| P1 | Build a stacking ensemble that outperforms any single base model | F1-Score | ΞF1 β₯ +0.03 over best single model |
|
| 119 |
+
| P2 | Achieve high recall on churn class (minimize false negatives) | Recall@Churn | β₯ 0.85 on both datasets |
|
| 120 |
+
| P3 | Deliver actionable model explanations per customer | SHAP summary | Top-5 features identified per prediction |
|
| 121 |
+
| P4 | Rank customers by retention value, not just churn risk | AUC-PR weighted by CLV | ROC-AUC β₯ 0.90 |
|
| 122 |
+
|
| 123 |
+
### 3.2 Secondary Goals
|
| 124 |
+
|
| 125 |
+
| ID | Objective | Metric Target |
|
| 126 |
+
|----|-----------|---------------|
|
| 127 |
+
| S1 | Evaluate cross-domain generalization (Telco β Bank, Bank β Telco) | Transfer AUC β₯ 0.80 |
|
| 128 |
+
| S2 | Achieve sub-second inference latency for batch scoring | β€ 500ms per 1,000 records |
|
| 129 |
+
| S3 | Deploy a reproducible, version-controlled pipeline | Docker + DVC + CI/CD |
|
| 130 |
+
| S4 | Document model behavior for regulatory compliance (GDPR/CCPA) | Full SHAP + model card |
|
| 131 |
+
|
| 132 |
+
### 3.3 Success Criteria Summary
|
| 133 |
+
|
| 134 |
+
- **Model Performance:** F1-Score > 0.85, ROC-AUC > 0.90, PR-AUC > 0.80 on both datasets
|
| 135 |
+
- **Business Impact:** Identify top 20% at-risk customers with β₯ 70% precision
|
| 136 |
+
- **Interpretability:** Every prediction accompanied by SHAP force plot; global SHAP summary for stakeholder dashboards
|
| 137 |
+
- **Robustness:** 5-fold stratified CV with 95% confidence intervals on all metrics
|
| 138 |
+
|
| 139 |
+
---
|
| 140 |
+
|
| 141 |
+
## 4. Literature Review & References
|
| 142 |
+
|
| 143 |
+
### 4.1 Category Overview
|
| 144 |
+
|
| 145 |
+
| Category | Count | Papers |
|
| 146 |
+
|----------|-------|--------|
|
| 147 |
+
| Ensemble / Boosting Methods | 4 | [1β4] |
|
| 148 |
+
| SHAP / LIME Interpretability | 3 | [5β7] |
|
| 149 |
+
| Deep Learning for Churn | 3 | [8β10] |
|
| 150 |
+
| CLV / Profit-Driven Churn | 3 | [11β13] |
|
| 151 |
+
| Financial / Bank Churn | 4 | [14β17] |
|
| 152 |
+
| Survey / Benchmark / Foundation | 4 | [18β21] |
|
| 153 |
+
| **Total** | **21** | |
|
| 154 |
+
|
| 155 |
+
### 4.2 Full References (2016β2024)
|
| 156 |
+
|
| 157 |
+
#### [1] XGBoost: A Scalable Tree Boosting System
|
| 158 |
+
**Chen, T., & Guestrin, C.** (2016). *KDD*. arXiv:1603.02754.
|
| 159 |
+
Introduced sparsity-aware algorithms and weighted quantile sketch for gradient boosting. Became the dominant algorithm for tabular churn prediction tasks worldwide.
|
| 160 |
+
|
| 161 |
+
#### [2] Tabular Data: Deep Learning is Not All You Need
|
| 162 |
+
**Shwartz-Ziv, R., & Armon, A.** (2021). arXiv:2106.03253.
|
| 163 |
+
Rigorous comparison showing XGBoost outperforms recent deep learning models on tabular data; ensembling deep models with XGBoost further improves performance.
|
| 164 |
+
|
| 165 |
+
#### [3] CatBoost: Unbiased Boosting with Categorical Features
|
| 166 |
+
**Prokhorenkova, L., et al.** (2017). arXiv:1706.09516.
|
| 167 |
+
Ordered boosting and novel categorical feature processing; outperforms other boosting implementations on datasets with high-cardinality categorical churn predictors.
|
| 168 |
+
|
| 169 |
+
#### [4] Enhancing Customer Churn Prediction: An Adaptive Ensemble Learning Approach
|
| 170 |
+
**Shaikhsurab, S., & Magadum, S.** (2024). arXiv:2408.16284.
|
| 171 |
+
Adaptive ensemble combining XGBoost, LightGBM, LSTM, MLP, and SVM with stacking + meta-feature generation; achieved **99.28% accuracy** on telecom churn datasets.
|
| 172 |
+
|
| 173 |
+
#### [5] A Unified Approach to Interpreting Model Predictions (SHAP)
|
| 174 |
+
**Lundberg, S. M., & Lee, S.-I.** (2017). *NeurIPS*. arXiv:1705.07874.
|
| 175 |
+
Proposed SHAP values as a unified measure of feature importance based on game-theoretic Shapley values; unified six existing explanation methods.
|
| 176 |
+
|
| 177 |
+
#### [6] "Why Should I Trust You?": Explaining Predictions of Any Classifier (LIME)
|
| 178 |
+
**Ribeiro, M. T., Singh, S., & Guestrin, C.** (2016). *KDD*. arXiv:1602.04938.
|
| 179 |
+
Introduced LIME to explain any classifier locally via interpretable surrogate models; foundational for churn model explainability and regulatory compliance.
|
| 180 |
+
|
| 181 |
+
#### [7] XAI Handbook: Towards a Unified Framework for Explainable AI
|
| 182 |
+
**Palacio, D. G., et al.** (2021). arXiv:2105.06677.
|
| 183 |
+
Provides theoretical framework unifying XAI terminology (LIME, SHAP, Grad-CAM, etc.); essential for regulatory compliance and method comparison in churn explainability.
|
| 184 |
+
|
| 185 |
+
#### [8] Early Churn Prediction from Large-Scale User-Product Interaction Time Series
|
| 186 |
+
**Bhattacharjee, A., Thukral, K., & Patil, C.** (2023). arXiv:2309.14390.
|
| 187 |
+
Applied multivariate time series classification with deep neural networks to fantasy sports churn; scales to 10βΈ users β demonstrates feasibility of deep learning at scale.
|
| 188 |
+
|
| 189 |
+
#### [9] Modelling Customer Churn for the Retail Industry in a Deep Learning Sequential Framework
|
| 190 |
+
**Equihua, C., et al.** (2023). arXiv:2304.00575.
|
| 191 |
+
Deep survival framework using recurrent neural networks for non-contractual retail churn; avoids extensive feature engineering through learned representations.
|
| 192 |
+
|
| 193 |
+
#### [10] Churn Reduction via Distillation
|
| 194 |
+
**Jiang, Y., et al.** (2021). arXiv:2106.02654.
|
| 195 |
+
Showed model distillation reduces predictive churn (model instability during retraining) while maintaining accuracy across FC, CNN, and transformer architectures.
|
| 196 |
+
|
| 197 |
+
#### [11] OptDist: Learning Optimal Distribution for Customer Lifetime Value Prediction
|
| 198 |
+
**Weng, S., et al.** (2024). arXiv:2408.08585.
|
| 199 |
+
Proposed OptDist with distribution learning/selection modules; adaptively selects optimal sub-distributions for CLTV prediction on public and industrial datasets.
|
| 200 |
+
|
| 201 |
+
#### [12] Customer Lifetime Value Prediction with Uncertainty Estimation Using Monte Carlo Dropout
|
| 202 |
+
**Cao, Y., Xu, Y., & Yang, Q.** (2024). arXiv:2411.15944.
|
| 203 |
+
Enhanced neural network CLTV prediction with Monte Carlo Dropout for uncertainty quantification; improved Top-5% MAPE significantly.
|
| 204 |
+
|
| 205 |
+
#### [13] A Predict-and-Optimize Approach to Profit-Driven Churn Prevention
|
| 206 |
+
**GΓ³mez-Vargas, E., Maldonado, S., & Vairetti, S.** (2023). arXiv:2310.07047.
|
| 207 |
+
First predict-and-optimize approach for churn prevention using individual CLVs (not averages); regret minimization via SGD; tested on 12 real-world datasets.
|
| 208 |
+
|
| 209 |
+
#### [14] Dynamic Customer Embeddings for Financial Service Applications
|
| 210 |
+
**Chitsazan, N., et al.** (2021). arXiv:2106.11880.
|
| 211 |
+
DCE framework uses customer digital activity + financial context for intent/fraud/call-center prediction; financial services benchmark for learned representations.
|
| 212 |
+
|
| 213 |
+
#### [15] FinPT: Financial Risk Prediction with Profile Tuning on Pretrained Foundation Models
|
| 214 |
+
**Yin, H., et al.** (2023). arXiv:2308.00065.
|
| 215 |
+
Introduced FinBench dataset + FinPT method for financial risk prediction (default, fraud, churn) using LLM-generated customer profiles; strong zero-shot transfer.
|
| 216 |
+
|
| 217 |
+
#### [16] Advanced User Credit Risk Prediction Using LightGBM, XGBoost and TabNet with SMOTEENN
|
| 218 |
+
**Yu, B., et al.** (2024). arXiv:2408.03497.
|
| 219 |
+
Combined PCA, SMOTEENN, and LightGBM for bank credit risk prediction; outperformed other models in identifying high-quality applicants under class imbalance.
|
| 220 |
+
|
| 221 |
+
#### [17] Credit Card Fraud Detection β Classifier Selection Strategy
|
| 222 |
+
**Kulatilleke, S.** (2022). arXiv:2208.11900.
|
| 223 |
+
Data-driven classifier selection + sampling methods for imbalanced fraud detection; directly applicable to churn's class imbalance challenges.
|
| 224 |
+
|
| 225 |
+
#### [18] Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data
|
| 226 |
+
**Gregory, J.** (2018). arXiv:1802.03396.
|
| 227 |
+
Applied XGBoost with temporal feature engineering to time-series churn data; achieved top performance in large-scale competition settings.
|
| 228 |
+
|
| 229 |
+
#### [19] Predictive Churn with the Set of Good Models
|
| 230 |
+
**Watson-Daniels, D., et al.** (2024). arXiv:2402.07745.
|
| 231 |
+
Examined prediction instability during model retraining via Rashomon set; critical for production churn model deployment and monitoring.
|
| 232 |
+
|
| 233 |
+
#### [20] Retention Is All You Need
|
| 234 |
+
**Mohiuddin, K., et al.** (2023). arXiv:2304.03103.
|
| 235 |
+
HR Decision Support System using SHAP + what-if analysis for employee attrition; demonstrates SHAP utility for retention/churn use cases with interpretable dashboards.
|
| 236 |
+
|
| 237 |
+
#### [21] Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance
|
| 238 |
+
**(2024).** arXiv:2409.19751.
|
| 239 |
+
Comprehensive study of SMOTE, Class Weights, and Decision Threshold Calibration for binary classification; **Decision Threshold Calibration most consistently effective** β directly guides our experimental design.
|
| 240 |
+
|
| 241 |
+
---
|
| 242 |
+
|
| 243 |
+
## 5. Dataset Understanding
|
| 244 |
+
|
| 245 |
+
### 5.1 Dataset 1: Telco Customer Churn (IBM)
|
| 246 |
+
|
| 247 |
+
**Source:** [aai510-group1/telco-customer-churn](https://hf.co/datasets/aai510-group1/telco-customer-churn)
|
| 248 |
+
**Type:** Fictional telecommunications company data
|
| 249 |
+
**Format:** CSV / Parquet
|
| 250 |
+
**Splits:** train / validation / test
|
| 251 |
+
|
| 252 |
+
#### Schema Summary
|
| 253 |
+
|
| 254 |
+
| Feature Category | Count | Key Features |
|
| 255 |
+
|-----------------|-------|-------------|
|
| 256 |
+
| **Demographics** | 7 | Age, Gender, Married, Dependents, Number of Dependents, Senior Citizen, Under 30 |
|
| 257 |
+
| **Service Usage** | 10 | Phone Service, Internet Service, Internet Type, Multiple Lines, Online Security, Online Backup, Device Protection, Tech Support, Streaming TV, Streaming Movies |
|
| 258 |
+
| **Contract & Billing** | 6 | Contract, Payment Method, Paperless Billing, Monthly Charge, Total Charges, Total Refunds |
|
| 259 |
+
| **Engagement** | 7 | Tenure (months), Number of Referrals, Referred a Friend, Offer, Satisfaction Score, Churn Score, Quarter |
|
| 260 |
+
| **Revenue** | 6 | Total Revenue, Total Long Distance Charges, Total Extra Data Charges, Avg Monthly Long Distance, Avg Monthly GB Download, CLTV |
|
| 261 |
+
| **Geographic** | 5 | City, State, Zip Code, Latitude, Longitude, Population |
|
| 262 |
+
| **Target** | 2 | Churn (binary), Churn Reason (string), Churn Category (string), Customer Status |
|
| 263 |
+
|
| 264 |
+
**Total Features:** ~52 (including derived identifiers like `Lat Long`, `Customer ID`)
|
| 265 |
+
|
| 266 |
+
#### Class Distribution (Audited)
|
| 267 |
+
|
| 268 |
+
| Split | Total Rows | Churned (1) | Stayed (0) | Churn Rate |
|
| 269 |
+
|-------|-----------|-------------|------------|------------|
|
| 270 |
+
| Train | ~4,400 | ~1,100 | ~3,300 | ~25% |
|
| 271 |
+
| Validation | ~1,500 | ~375 | ~1,125 | ~25% |
|
| 272 |
+
| Test | ~1,500 | ~375 | ~1,125 | ~25% |
|
| 273 |
+
|
| 274 |
+
*Note: Exact counts vary by split. The dataset exhibits moderate class imbalance (~25% churn), manageable without aggressive oversampling.*
|
| 275 |
+
|
| 276 |
+
#### Notable Data Characteristics
|
| 277 |
+
|
| 278 |
+
1. **Rich categorical encoding:** Internet Type (DSL, Fiber Optic, Cable, None), Contract (Month-to-Month, One Year, Two Year), Payment Method (4 types)
|
| 279 |
+
2. **Temporal granularity:** `Quarter` field (Q1βQ4) enables time-aware feature engineering
|
| 280 |
+
3. **Pre-computed churn scores:** `Churn Score` (0β100) and `Satisfaction Score` (1β5) are strong engineered features β risk of target leakage if not handled carefully
|
| 281 |
+
4. **CLTV integration:** `CLTV` field directly available for revenue-weighted ranking
|
| 282 |
+
5. **Geographic features:** Latitude/longitude enable spatial clustering or geo-derived features
|
| 283 |
+
|
| 284 |
+
#### Data Quality Flags
|
| 285 |
+
|
| 286 |
+
- `Total Charges` has blank/missing values for zero-tenure customers (new sign-ups)
|
| 287 |
+
- `Churn Reason` and `Churn Category` are populated only for churned customers β post-hoc labels, not usable as features
|
| 288 |
+
- `Customer Status` is highly correlated with target; should be excluded or used as stratification
|
| 289 |
+
- Some categorical fields (City, State) have high cardinality (50+ states, 1,000+ cities)
|
| 290 |
+
|
| 291 |
+
---
|
| 292 |
+
|
| 293 |
+
### 5.2 Dataset 2: Bank Customer Churners
|
| 294 |
+
|
| 295 |
+
**Source:** [ZZHHJ/bank_churners](https://hf.co/datasets/ZZHHJ/bank_churners)
|
| 296 |
+
**Type:** Credit card customer attrition data
|
| 297 |
+
**Format:** CSV / Parquet
|
| 298 |
+
**Splits:** single train split (requires manual partitioning)
|
| 299 |
+
|
| 300 |
+
#### Schema Summary
|
| 301 |
+
|
| 302 |
+
| Feature Category | Count | Key Features |
|
| 303 |
+
|-----------------|-------|-------------|
|
| 304 |
+
| **Demographics** | 4 | Customer_Age, Gender, Dependent_count, Education_Level, Marital_Status, Income_Category |
|
| 305 |
+
| **Account Behavior** | 5 | Months_on_book, Total_Relationship_Count, Months_Inactive_12_mon, Contacts_Count_12_mon, Card_Category |
|
| 306 |
+
| **Financial** | 7 | Credit_Limit, Total_Revolving_Bal, Avg_Open_To_Buy, Total_Amt_Chng_Q4_Q1, Total_Trans_Amt, Total_Trans_Ct, Total_Ct_Chng_Q4_Q1, Avg_Utilization_Ratio |
|
| 307 |
+
| **Target** | 1 | Attrition_Flag (Existing Customer / Attrited Customer) |
|
| 308 |
+
| **Artifacts** | 2 | Naive_Bayes_Classifier columns (pre-computed probabilities β **must be removed** to avoid data leakage) |
|
| 309 |
+
|
| 310 |
+
**Total Features:** 21 (19 usable + 1 ID + 2 NB artifacts to drop)
|
| 311 |
+
|
| 312 |
+
#### Class Distribution (Estimated)
|
| 313 |
+
|
| 314 |
+
| Class | Approximate Count | Rate |
|
| 315 |
+
|-------|-------------------|------|
|
| 316 |
+
| Existing Customer | ~8,500 | ~83% |
|
| 317 |
+
| Attrited Customer | ~1,700 | ~17% |
|
| 318 |
+
|
| 319 |
+
**Churn rate ~17%** β more imbalanced than Telco; SMOTE/ADASYN or class weighting will be necessary.
|
| 320 |
+
|
| 321 |
+
#### Notable Data Characteristics
|
| 322 |
+
|
| 323 |
+
1. **Quarter-over-quarter dynamics:** `Total_Amt_Chng_Q4_Q1` and `Total_Ct_Chng_Q4_Q1` capture behavioral velocity β powerful churn signals
|
| 324 |
+
2. **Utilization ratio:** `Avg_Utilization_Ratio` is a strong proxy for engagement; low utilization often precedes attrition
|
| 325 |
+
3. **Income categories are binned:** `$60K - $80K`, `$80K - $120K`, etc. β ordinal encoding preferred
|
| 326 |
+
4. **Card category:** `Blue` (vast majority), `Silver`, `Gold`, `Platinum` β strong class imbalance within feature itself
|
| 327 |
+
|
| 328 |
+
#### Data Quality Flags
|
| 329 |
+
|
| 330 |
+
- **Critical:** Two `Naive_Bayes_Classifier_*` columns are pre-computed churn probabilities from a baseline model. Using them as features would constitute **data leakage** β they must be dropped before any model training.
|
| 331 |
+
- No explicit CLTV field; must be estimated from `Credit_Limit`, `Total_Trans_Amt`, and `Total_Trans_Ct`
|
| 332 |
+
- Single split requires manual stratified partitioning (70/15/15 or 80/10/10)
|
| 333 |
+
|
| 334 |
+
---
|
| 335 |
+
|
| 336 |
+
### 5.3 Cross-Dataset Comparison
|
| 337 |
+
|
| 338 |
+
| Attribute | Telco (IBM) | Bank Churners |
|
| 339 |
+
|-----------|-------------|---------------|
|
| 340 |
+
| **Records** | ~7,000 | ~10,000 |
|
| 341 |
+
| **Features (usable)** | ~45 | ~19 |
|
| 342 |
+
| **Churn Rate** | ~25% | ~17% |
|
| 343 |
+
| **Industry** | Telecommunications | Banking / Credit Cards |
|
| 344 |
+
| **Temporal Features** | Quarter, Tenure (months) | Months_on_book, Q4/Q1 change ratios |
|
| 345 |
+
| **CLTV Available** | Yes (explicit field) | No (must derive) |
|
| 346 |
+
| **Geographic Data** | Yes (lat/lon, city, state) | No |
|
| 347 |
+
| **Pre-computed Scores** | Churn Score, Satisfaction | Naive Bayayes (leakage β drop) |
|
| 348 |
+
| **Class Imbalance Severity** | Moderate | High |
|
| 349 |
+
| **Primary Churn Driver** | Contract type, tenure, service usage | Inactivity, transaction decline, utilization |
|
| 350 |
+
|
| 351 |
+
---
|
| 352 |
+
|
| 353 |
+
## 6. Proposed Methodology
|
| 354 |
+
|
| 355 |
+
### 6.1 The 7-Phase Pipeline
|
| 356 |
+
|
| 357 |
+
```
|
| 358 |
+
Phase 1: Data Ingestion & Audit
|
| 359 |
+
β
|
| 360 |
+
Phase 2: Preprocessing & Feature Engineering
|
| 361 |
+
β
|
| 362 |
+
Phase 3: Exploratory Data Analysis (EDA)
|
| 363 |
+
β
|
| 364 |
+
Phase 4: Model Training β 5-Base Stacking Ensemble
|
| 365 |
+
β
|
| 366 |
+
Phase 5: Hyperparameter Optimization
|
| 367 |
+
β
|
| 368 |
+
Phase 6: Evaluation, Interpretability & CLV Scoring
|
| 369 |
+
β
|
| 370 |
+
Phase 7: Deployment, Monitoring & Documentation
|
| 371 |
+
```
|
| 372 |
+
|
| 373 |
+
### Phase 1: Data Ingestion & Audit
|
| 374 |
+
|
| 375 |
+
- Load both datasets from Hugging Face `datasets` library
|
| 376 |
+
- Compute schema validation: type checks, missing value audit, cardinality report
|
| 377 |
+
- Flag anomalous values (negative charges, impossible ages, blank `Total Charges`)
|
| 378 |
+
- Document data provenance and version hashes (DVC)
|
| 379 |
+
|
| 380 |
+
### Phase 2: Preprocessing & Feature Engineering
|
| 381 |
+
|
| 382 |
+
#### 2A. Cleaning
|
| 383 |
+
- **Telco:** Impute `Total Charges` blanks with `Monthly Charge Γ Tenure`
|
| 384 |
+
- **Bank:** Drop `Naive_Bayes_Classifier_*` columns immediately
|
| 385 |
+
- Both datasets: remove ID fields (`Customer ID`, `CLIENTNUM`)
|
| 386 |
+
|
| 387 |
+
#### 2B. Encoding
|
| 388 |
+
| Feature Type | Encoding Strategy | Rationale |
|
| 389 |
+
|-------------|-------------------|-----------|
|
| 390 |
+
| Binary categorical | Label encoding (0/1) | `Gender`, `Partner`, `PhoneService` |
|
| 391 |
+
| Low-cardinality ordinal | One-hot encoding | `Contract`, `Payment Method`, `Education_Level` |
|
| 392 |
+
| High-cardinality nominal | Target encoding / CatBoost native | `City`, `State` (Telco); `Income_Category` (Bank) |
|
| 393 |
+
| Cyclical temporal | Sine/cosine encoding | `Quarter` mapped to angle |
|
| 394 |
+
|
| 395 |
+
#### 2C. Feature Engineering
|
| 396 |
+
- **RFM-style features (Bank):** Recency = `Months_Inactive_12_mon`, Frequency = `Total_Trans_Ct`, Monetary = `Total_Trans_Amt`
|
| 397 |
+
- **Engagement ratio (Telco):** `Satisfaction_Score / Churn_Score` as loyalty proxy
|
| 398 |
+
- **Velocity features:** Month-over-month change in charges and usage
|
| 399 |
+
- **CLTV proxy (Bank):** `Credit_Limit Γ Avg_Utilization_Ratio Γ (12 - Months_Inactive_12_mon)`
|
| 400 |
+
|
| 401 |
+
#### 2D. Scaling & Imbalance Handling
|
| 402 |
+
- Numerical features β RobustScaler (median/IQR, resistant to outliers)
|
| 403 |
+
- Class imbalance β SMOTEENN (SMOTE + Edited Nearest Neighbours) on training fold only; **never on validation/test**
|
| 404 |
+
- Class weights β `scale_pos_weight = len(negative) / len(positive)` for XGBoost/LightGBM
|
| 405 |
+
|
| 406 |
+
### Phase 3: Exploratory Data Analysis (EDA)
|
| 407 |
+
|
| 408 |
+
- Univariate distributions (histograms, boxplots for skew detection)
|
| 409 |
+
- Bivariate analysis: churn rate by contract type, payment method, tenure bins
|
| 410 |
+
- Correlation matrix (Spearman for non-linear relationships)
|
| 411 |
+
- Feature-target mutual information scores for feature selection
|
| 412 |
+
- Geographic heatmap (Telco: churn rate by state)
|
| 413 |
+
|
| 414 |
+
### Phase 4: Model Training β Stacking Ensemble
|
| 415 |
+
|
| 416 |
+
#### 4A. Cross-Validation Strategy
|
| 417 |
+
- **5-fold Stratified Cross-Validation** to preserve class distribution
|
| 418 |
+
- **GroupKFold** if temporal leakage risk (same customer in multiple quarters)
|
| 419 |
+
- Out-of-fold (OOF) predictions from each base model used as meta-features
|
| 420 |
+
|
| 421 |
+
#### 4B. Base Model Training
|
| 422 |
+
|
| 423 |
+
| Base Model | Key Hyperparameters | Tuning Range |
|
| 424 |
+
|-----------|-------------------|--------------|
|
| 425 |
+
| XGBoost | `max_depth`, `learning_rate`, `subsample`, `colsample_bytree`, `scale_pos_weight` | depth: 3β8; lr: 0.01β0.3 |
|
| 426 |
+
| LightGBM | `num_leaves`, `learning_rate`, `feature_fraction`, `bagging_fraction`, `is_unbalance` | leaves: 20β100; lr: 0.01β0.3 |
|
| 427 |
+
| CatBoost | `depth`, `learning_rate`, `iterations`, `auto_class_weights` | depth: 4β10; iterations: 200β1000 |
|
| 428 |
+
| MLP | `hidden_layers`, `dropout`, `batch_size`, `learning_rate` | layers: (128,64), (256,128,64); dropout: 0.2β0.5 |
|
| 429 |
+
| Logistic Regression | `C`, `penalty`, `solver`, `class_weight` | C: 0.001β10; penalty: l1/l2/elasticnet |
|
| 430 |
+
|
| 431 |
+
#### 4C. Meta-Learner Training
|
| 432 |
+
- Input: 5 OOF probability vectors (one per base model) + optionally top-K original features
|
| 433 |
+
- Model: **Logistic Regression** (interpretable weights showing model contribution) OR **XGBoost** (if non-linear meta-interactions needed)
|
| 434 |
+
- Validation: Same 5-fold CV; meta-learner trained on OOF predictions, tested on hold-out
|
| 435 |
+
|
| 436 |
+
### Phase 5: Hyperparameter Optimization
|
| 437 |
+
|
| 438 |
+
- **Optuna** with **TPESampler** (Tree-structured Parzen Estimator)
|
| 439 |
+
- 100 trials per base model; 50 trials for meta-learner
|
| 440 |
+
- Pruning: `MedianPruner` with early stopping on validation F1
|
| 441 |
+
- Objective: Maximize F1-Score (harmonic mean of precision and recall)
|
| 442 |
+
|
| 443 |
+
### Phase 6: Evaluation, Interpretability & CLV Scoring
|
| 444 |
+
|
| 445 |
+
#### 6A. Metrics Suite (10 metrics)
|
| 446 |
+
1. Accuracy
|
| 447 |
+
2. Precision (Churn class)
|
| 448 |
+
3. Recall (Churn class)
|
| 449 |
+
4. F1-Score
|
| 450 |
+
5. ROC-AUC
|
| 451 |
+
6. PR-AUC (Precision-Recall AUC β critical for imbalanced data)
|
| 452 |
+
7. Matthews Correlation Coefficient (MCC)
|
| 453 |
+
8. Cohen's Kappa
|
| 454 |
+
9. Balanced Accuracy
|
| 455 |
+
10. Expected Calibration Error (ECE)
|
| 456 |
+
|
| 457 |
+
#### 6B. SHAP Analysis
|
| 458 |
+
- **Global:** SHAP summary plot (beeswarm) showing feature importance across full dataset
|
| 459 |
+
- **Local:** SHAP force plot for individual predictions β customer-level actionable insights
|
| 460 |
+
- **Dependence:** SHAP dependence plots for top-5 features revealing interaction effects
|
| 461 |
+
|
| 462 |
+
#### 6C. CLV Scoring
|
| 463 |
+
- **Telco:** Use explicit `CLTV` field; multiply by churn probability
|
| 464 |
+
- **Bank:** Derive CLV proxy; multiply by churn probability
|
| 465 |
+
- Output: Prioritized customer list sorted by RPS (Retention Priority Score)
|
| 466 |
+
- Segment: Top 10% (urgent), 10β30% (high), 30β60% (medium), 60β100% (low)
|
| 467 |
+
|
| 468 |
+
### Phase 7: Deployment, Monitoring & Documentation
|
| 469 |
+
|
| 470 |
+
- Model serialization: `joblib` for sklearn/CatBoost, native formats for XGBoost/LightGBM
|
| 471 |
+
- Inference pipeline: `scikit-learn Pipeline` + custom transformers
|
| 472 |
+
- Monitoring: Track prediction distribution drift, feature drift, and metric decay over time
|
| 473 |
+
- Documentation: Model card with intended use, limitations, bias analysis, and SHAP summary
|
| 474 |
+
|
| 475 |
+
---
|
| 476 |
+
|
| 477 |
+
## 7. Implementation Strategy
|
| 478 |
+
|
| 479 |
+
### 7.1 Tech Stack
|
| 480 |
+
|
| 481 |
+
| Layer | Technology | Purpose |
|
| 482 |
+
|-------|-----------|---------|
|
| 483 |
+
| **Data Loading** | `datasets` (HF), `pandas`, `polars` | Efficient dataset ingestion |
|
| 484 |
+
| **Preprocessing** | `scikit-learn` (Pipeline, ColumnTransformer, RobustScaler) | Reproducible feature engineering |
|
| 485 |
+
| **ML Models** | `xgboost`, `lightgbm`, `catboost`, `scikit-learn` (MLP, LR) | Base learners |
|
| 486 |
+
| **Ensemble** | `mlens` / custom stacking with `scikit-learn` | Meta-learner orchestration |
|
| 487 |
+
| **Imbalance** | `imbalanced-learn` (SMOTEENN) | Oversampling + cleaning |
|
| 488 |
+
| **Optimization** | `optuna` | Hyperparameter search |
|
| 489 |
+
| **Interpretability** | `shap` | Game-theoretic explanations |
|
| 490 |
+
| **Tracking** | `trackio` + `mlflow` | Experiment logging, metrics, artifacts |
|
| 491 |
+
| **Deployment** | `gradio` / `fastapi` + Docker | API inference and UI demo |
|
| 492 |
+
| **Versioning** | `dvc` + `git` | Data and model versioning |
|
| 493 |
+
|
| 494 |
+
### 7.2 4-Week Timeline
|
| 495 |
+
|
| 496 |
+
| Week | Focus | Deliverables |
|
| 497 |
+
|------|-------|-------------|
|
| 498 |
+
| **Week 1** | Data audit, preprocessing, EDA | Clean notebooks; feature engineering pipeline; data quality report |
|
| 499 |
+
| **Week 2** | Base model training, hyperparameter tuning | 5 trained base models; Optuna study results; OOF prediction matrices |
|
| 500 |
+
| **Week 3** | Stacking ensemble, evaluation, SHAP analysis | Trained meta-learner; 10-metric report; SHAP dashboards; CLV scoring |
|
| 501 |
+
| **Week 4** | Cross-domain testing, deployment, documentation | Generalization report; Gradio demo; model card; final documentation |
|
| 502 |
+
|
| 503 |
+
### 7.3 Code Architecture
|
| 504 |
+
|
| 505 |
+
```
|
| 506 |
+
churnpredict-pro/
|
| 507 |
+
βββ data/
|
| 508 |
+
β βββ raw/ # HF datasets (versioned with DVC)
|
| 509 |
+
β βββ processed/ # Train/val/test splits
|
| 510 |
+
β βββ engineered/ # Feature-engineered datasets
|
| 511 |
+
βββ notebooks/
|
| 512 |
+
β βββ 01_eda_telco.ipynb
|
| 513 |
+
β βββ 02_eda_bank.ipynb
|
| 514 |
+
β βββ 03_feature_engineering.ipynb
|
| 515 |
+
β βββ 04_shap_analysis.ipynb
|
| 516 |
+
βββ src/
|
| 517 |
+
β βββ __init__.py
|
| 518 |
+
β βββ data/
|
| 519 |
+
β β βββ load_datasets.py # HF datasets loader
|
| 520 |
+
β β βββ preprocess.py # Cleaning + encoding + scaling
|
| 521 |
+
β β βββ feature_engineer.py # RFM, velocity, CLV proxy
|
| 522 |
+
β βββ models/
|
| 523 |
+
β β βββ base_models.py # XGB, LGBM, CatBoost, MLP, LR wrappers
|
| 524 |
+
β β βββ stacking_ensemble.py # OOF + meta-learner
|
| 525 |
+
β β βββ hyperparameter_search.py # Optuna studies
|
| 526 |
+
β βββ evaluation/
|
| 527 |
+
β β βββ metrics.py # 10-metric computation
|
| 528 |
+
β β βββ shap_explainer.py # Global + local SHAP
|
| 529 |
+
β β βββ clv_scorer.py # RPS computation
|
| 530 |
+
β βββ deployment/
|
| 531 |
+
β βββ inference_pipeline.py
|
| 532 |
+
β βββ app.py # Gradio/FastAPI interface
|
| 533 |
+
βββ configs/
|
| 534 |
+
β βββ telco_config.yaml
|
| 535 |
+
β βββ bank_config.yaml
|
| 536 |
+
βββ experiments/ # Trackio / MLflow runs
|
| 537 |
+
βββ tests/
|
| 538 |
+
β βββ test_preprocessing.py
|
| 539 |
+
β βββ test_models.py
|
| 540 |
+
βββ Dockerfile
|
| 541 |
+
βββ requirements.txt
|
| 542 |
+
βββ dvc.yaml
|
| 543 |
+
βββ README.md
|
| 544 |
+
```
|
| 545 |
+
|
| 546 |
+
---
|
| 547 |
+
|
| 548 |
+
## 8. Experimental Design
|
| 549 |
+
|
| 550 |
+
### 8.1 Five Experiments
|
| 551 |
+
|
| 552 |
+
| ID | Experiment | Hypothesis | Method |
|
| 553 |
+
|----|-----------|------------|--------|
|
| 554 |
+
| **E1** | Single Model Baseline | Individual models underperform ensemble due to bias-variance limitations | Train each of 5 base models standalone; report metrics |
|
| 555 |
+
| **E2** | Stacking Ensemble | Meta-learner combining 5 models outperforms best single model by β₯ 3% F1 | 5-fold OOF stacking with LR meta-learner |
|
| 556 |
+
| **E3** | Imbalance Strategy Comparison | Threshold calibration is more effective than SMOTE for churn (per [21]) | Compare: (a) no correction, (b) SMOTEENN, (c) class weights, (d) threshold calibration |
|
| 557 |
+
| **E4** | Cross-Domain Transfer | Models trained on Telco generalize to Bank with β₯ 80% AUC | Train on Telco, evaluate zero-shot on Bank; then fine-tune |
|
| 558 |
+
| **E5** | CLV-Weighted vs. Uniform Ranking | RPS improves campaign ROI over probability-only ranking | Compare top-20% precision: P(churn) only vs. P(churn) Γ CLV |
|
| 559 |
+
|
| 560 |
+
### 8.2 Ten Evaluation Metrics
|
| 561 |
+
|
| 562 |
+
| # | Metric | Formula / Definition | Why It Matters for Churn |
|
| 563 |
+
|---|--------|---------------------|-------------------------|
|
| 564 |
+
| 1 | **Accuracy** | (TP + TN) / (TP + TN + FP + FN) | Overall correctness; misleading if imbalanced |
|
| 565 |
+
| 2 | **Precision (Churn)** | TP / (TP + FP) | Of predicted churners, how many actually churn? (cost of false alarms) |
|
| 566 |
+
| 3 | **Recall (Churn)** | TP / (TP + FN) | Of actual churners, how many did we catch? (cost of missed churners) |
|
| 567 |
+
| 4 | **F1-Score** | 2 Γ (Precision Γ Recall) / (Precision + Recall) | Harmonic mean; balances precision and recall |
|
| 568 |
+
| 5 | **ROC-AUC** | Area under ROC curve | Discrimination ability across all thresholds |
|
| 569 |
+
| 6 | **PR-AUC** | Area under Precision-Recall curve | More informative than ROC-AUC for imbalanced data |
|
| 570 |
+
| 7 | **MCC** | (TPΓTN β FPΓFN) / β(product of marginals) | Correlation between prediction and truth; robust to imbalance |
|
| 571 |
+
| 8 | **Cohen's Kappa** | (Observed β Expected) / (1 β Expected) | Agreement beyond chance; useful for inter-rater reliability analogies |
|
| 572 |
+
| 9 | **Balanced Accuracy** | (Sensitivity + Specificity) / 2 | Average of recall on both classes; fair on imbalanced data |
|
| 573 |
+
| 10 | **ECE** | Expected Calibration Error | Measures reliability of probability outputs; critical for CLV weighting |
|
| 574 |
+
|
| 575 |
+
### 8.3 Statistical Rigor
|
| 576 |
+
|
| 577 |
+
1. **Confidence Intervals:** All metrics reported with 95% CIs from 5-fold CV (bootstrap percentile method)
|
| 578 |
+
2. **McNemar's Test:** Statistically compare stacking ensemble vs. best single model
|
| 579 |
+
3. **DeLong's Test:** Compare ROC-AUC differences between models
|
| 580 |
+
4. **Permutation Test:** Validate feature importance scores from SHAP
|
| 581 |
+
5. **Stratification:** All splits stratified on target + `Contract` type (strongest churn predictor) to prevent distribution shift
|
| 582 |
+
|
| 583 |
+
### 8.4 Reproducibility Checklist
|
| 584 |
+
|
| 585 |
+
- [ ] Random seeds fixed (`random_state=42`) for all stochastic operations
|
| 586 |
+
- [ ] `requirements.txt` with exact versions (via `pip freeze`)
|
| 587 |
+
- [ ] DVC tracking for data and model artifacts
|
| 588 |
+
- [ ] Git commit hash recorded with every experiment
|
| 589 |
+
- [ ] Trackio / MLflow logging of hyperparameters, metrics, and artifact paths
|
| 590 |
+
|
| 591 |
+
---
|
| 592 |
+
|
| 593 |
+
## 9. Result Analysis
|
| 594 |
+
|
| 595 |
+
### 9.1 Expected Performance
|
| 596 |
+
|
| 597 |
+
Based on literature benchmarks ([4] achieved 99.28% on telecom; [16] achieved strong results on bank credit risk with SMOTEENN + LightGBM), our targets are conservative and grounded:
|
| 598 |
+
|
| 599 |
+
| Dataset | Best Single Model F1 | Stacking Ensemble F1 | Expected Ξ |
|
| 600 |
+
|---------|---------------------|---------------------|------------|
|
| 601 |
+
| **Telco** | 0.82β0.84 (XGBoost/CatBoost) | **0.86β0.88** | +0.03β0.04 |
|
| 602 |
+
| **Bank** | 0.78β0.81 (LightGBM/XGBoost) | **0.82β0.85** | +0.03β0.04 |
|
| 603 |
+
|
| 604 |
+
### 9.2 SHAP Analysis β Expected Insights
|
| 605 |
+
|
| 606 |
+
Based on prior churn research, we anticipate the following feature importance rankings:
|
| 607 |
+
|
| 608 |
+
**Telco (Expected Top 5 SHAP Features):**
|
| 609 |
+
1. `Contract` (Month-to-Month vs. longer) β strongest predictor
|
| 610 |
+
2. `Tenure in Months` β inverse relationship with churn
|
| 611 |
+
3. `Monthly Charge` / `Total Charges` β price sensitivity
|
| 612 |
+
4. `Internet Type` (Fiber Optic churns more than DSL)
|
| 613 |
+
5. `Payment Method` (Electronic check = high risk)
|
| 614 |
+
|
| 615 |
+
**Bank (Expected Top 5 SHAP Features):**
|
| 616 |
+
1. `Total_Trans_Ct` (transaction frequency decline)
|
| 617 |
+
2. `Total_Trans_Amt` (monetary decline)
|
| 618 |
+
3. `Months_Inactive_12_mon` (recency of activity)
|
| 619 |
+
4. `Total_Relationship_Count` (cross-product engagement)
|
| 620 |
+
5. `Contacts_Count_12_mon` (complaint/contact proxy)
|
| 621 |
+
|
| 622 |
+
### 9.3 Business Impact Projections
|
| 623 |
+
|
| 624 |
+
Assuming a hypothetical telecom with:
|
| 625 |
+
- 100,000 customers
|
| 626 |
+
- 25% annual churn rate
|
| 627 |
+
- Average CLV = $3,000
|
| 628 |
+
- Retention campaign cost = $50 per targeted customer
|
| 629 |
+
- Campaign success rate (if well-targeted) = 30%
|
| 630 |
+
|
| 631 |
+
| Scenario | Customers Targeted | Campaign Cost | Churners Caught | Revenue Saved | Net ROI |
|
| 632 |
+
|----------|-------------------|---------------|-----------------|---------------|---------|
|
| 633 |
+
| Random targeting (25% churn) | 20,000 | $1,000,000 | 1,500 | $4,500,000 | 4.5Γ |
|
| 634 |
+
| Model-guided (top 20% by RPS) | 20,000 | $1,000,000 | 4,200 | $12,600,000 | **12.6Γ** |
|
| 635 |
+
|
| 636 |
+
*Model-guided targeting improves ROI by ~2.8Γ over random selection by focusing on high-value, high-probability churners.*
|
| 637 |
+
|
| 638 |
+
### 9.4 Visualization Plan
|
| 639 |
+
|
| 640 |
+
| Visualization | Purpose |
|
| 641 |
+
|--------------|---------|
|
| 642 |
+
| ROC & PR curves (all models overlaid) | Comparative discrimination |
|
| 643 |
+
| Confusion matrices | Error type analysis |
|
| 644 |
+
| SHAP summary plot (beeswarm) | Global feature importance |
|
| 645 |
+
| SHAP force plots (sample customers) | Local explanations for stakeholders |
|
| 646 |
+
| SHAP dependence plots | Feature interaction discovery |
|
| 647 |
+
| Calibration plot (predicted vs. actual) | Probability reliability |
|
| 648 |
+
| CLV-RPS scatter plot | Segmentation visualization |
|
| 649 |
+
| Metric bar chart with 95% CIs | Statistical comparison |
|
| 650 |
+
|
| 651 |
+
---
|
| 652 |
+
|
| 653 |
+
## 10. Iterative Improvement
|
| 654 |
+
|
| 655 |
+
### 10.1 Six Iteration Cycles
|
| 656 |
+
|
| 657 |
+
| Iteration | Focus | Action | Expected Outcome |
|
| 658 |
+
|-----------|-------|--------|------------------|
|
| 659 |
+
| **Iter 1** | Feature Engineering Deep-Dive | Add polynomial features (tenureΒ², chargeΒ²); interaction terms (contract Γ monthly charge); binning (tenure quartiles) | +1β2% F1 from non-linear feature capture |
|
| 660 |
+
| **Iter 2** | Advanced Sampling | Replace SMOTEENN with ADASYN + Edited Nearest Neighbours; test BorderlineSMOTE | Better synthetic sample quality near decision boundary |
|
| 661 |
+
| **Iter 3** | Deep Learning Augmentation | Replace MLP with TabNet or FT-Transformer for tabular deep learning; compare against MLP base | Validate whether deep tabular models improve ensemble diversity |
|
| 662 |
+
| **Iter 4** | Temporal Modeling | For Telco: add LSTM/GRU on quarterly customer journey sequences; for Bank: add transaction time-series | Capture temporal churn dynamics; +2β3% F1 on time-sensitive subsets |
|
| 663 |
+
| **Iter 5** | Ensemble Expansion | Add 6th base model (Random Forest or Extra Trees) for additional variance reduction; test blending vs. stacking | Further variance reduction; marginal F1 gain of +0.5β1% |
|
| 664 |
+
| **Iter 6** | Production Hardening | Dockerize inference; add A/B test framework; build automated retraining trigger on drift detection; write full production documentation | Deployable system with monitoring, retraining, and compliance docs |
|
| 665 |
+
|
| 666 |
+
### 10.2 Production Documentation Deliverables
|
| 667 |
+
|
| 668 |
+
| Document | Contents | Audience |
|
| 669 |
+
|----------|----------|----------|
|
| 670 |
+
| **Model Card** | Intended use, training data summary, performance metrics, limitations, bias assessment, ethical considerations | Data scientists, regulators |
|
| 671 |
+
| **API Documentation** | Endpoint specs, request/response schemas, rate limits, error codes | Engineering teams |
|
| 672 |
+
| **SHAP Dashboard Guide** | How to read force plots, summary plots, and dependence plots | Business stakeholders, customer success |
|
| 673 |
+
| **Retention Playbook** | How to act on RPS segments; recommended interventions per churn reason | Marketing, customer success |
|
| 674 |
+
| **Retraining SOP** | When and how to retrain; drift detection thresholds; rollback procedures | MLOps, data engineering |
|
| 675 |
+
| **Compliance Checklist** | GDPR Article 22 (automated decision-making), CCPA, internal audit requirements | Legal, compliance |
|
| 676 |
+
|
| 677 |
+
---
|
| 678 |
+
|
| 679 |
+
## Appendix A: Key Equations
|
| 680 |
+
|
| 681 |
+
**Retention Priority Score:**
|
| 682 |
+
$$
|
| 683 |
+
\text{RPS}_i = P(\text{churn}_i) \times \text{CLV}_i
|
| 684 |
+
$$
|
| 685 |
+
|
| 686 |
+
**F1-Score:**
|
| 687 |
+
$$
|
| 688 |
+
F1 = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
|
| 689 |
+
$$
|
| 690 |
+
|
| 691 |
+
**Matthews Correlation Coefficient:**
|
| 692 |
+
$$
|
| 693 |
+
\text{MCC} = \frac{TP \times TN - FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}
|
| 694 |
+
$$
|
| 695 |
+
|
| 696 |
+
**Expected Calibration Error:**
|
| 697 |
+
$$
|
| 698 |
+
\text{ECE} = \sum_{m=1}^{M} \frac{|B_m|}{n} \left| \text{acc}(B_m) - \text{conf}(B_m) \right|
|
| 699 |
+
$$
|
| 700 |
+
|
| 701 |
+
---
|
| 702 |
+
|
| 703 |
+
*Document compiled for the ChurnPredict Pro project. All datasets verified on Hugging Face Hub. All 21 references span peer-reviewed and high-impact arXiv publications from 2016β2024.*
|