File size: 40,098 Bytes
b3e4654
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
# ChurnPredict Pro: A Stacking Ensemble Framework for Customer Churn Prediction with Explainable AI and CLV Scoring

> **Subtitle:** End-to-End Machine Learning Pipeline for Telecommunications and Banking Customer Retention β€” Combining Gradient Boosting, Neural Networks, and Game-Theoretic Interpretability

---

## Table of Contents

1. [Problem Statement](#1-problem-statement)
2. [Idea of Solution](#2-idea-of-solution)
3. [Objectives](#3-objectives)
4. [Literature Review & References](#4-literature-review--references)
5. [Dataset Understanding](#5-dataset-understanding)
6. [Proposed Methodology](#6-proposed-methodology)
7. [Implementation Strategy](#7-implementation-strategy)
8. [Experimental Design](#8-experimental-design)
9. [Result Analysis](#9-result-analysis)
10. [Iterative Improvement](#10-iterative-improvement)

---

## 1. Problem Statement

### 1.1 Business Context

Customer churn β€” the loss of clients to competitors or market attrition β€” is one of the most financially consequential challenges in subscription-based and service-oriented industries. In telecommunications, acquiring a new customer costs **5–25Γ— more** than retaining an existing one (industry estimates, 2024). In banking, customer attrition erodes lifetime value portfolios and damages brand equity. For both sectors, even a **1% reduction in churn** can translate to millions in retained revenue.

Current retention strategies suffer from two critical gaps:
- **Reactive approaches:** Firms typically respond to churn *after* it occurs, through win-back campaigns that are expensive and low-yield.
- **Black-box predictions:** Machine learning models deployed in production often lack interpretability, making it impossible for marketing and customer-success teams to act on model outputs with confidence.

### 1.2 Technical Challenges

| Challenge | Description | Impact |
|-----------|-------------|--------|
| **Class Imbalance** | Churners typically represent 10–30% of the customer base. Standard accuracy metrics are misleading. | High false-negative rates; missed at-risk customers |
| **Feature Heterogeneity** | Datasets mix categorical (contract type, payment method), numerical (tenure, charges), and temporal features (quarter, month-on-book). | Preprocessing complexity; risk of data leakage |
| **Concept Drift** | Customer behavior patterns shift seasonally and with market conditions. Models degrade without retraining. | Production model staleness; declining precision |
| **Interpretability vs. Performance Trade-off** | High-accuracy ensembles are often opaque. Explainable models (e.g., logistic regression) underperform on tabular data. | Regulatory non-compliance (GDPR Article 22); low stakeholder trust |
| **Multi-Domain Generalization** | Models trained on telecom data fail on banking data due to domain shift in feature distributions. | Siloed, non-reusable models per industry |

### 1.3 Gaps in Existing Solutions

1. **Single-model reliance:** Most production churn models deploy a single classifier (XGBoost or logistic regression), missing the variance-reduction benefits of ensemble diversity.
2. **No CLV integration:** Churn predictions are binary β€” they do not incorporate *which* churners are most valuable to retain, leading to inefficient marketing spend.
3. **Weak experimental rigor:** Many published churn studies use a single train/test split without cross-validation, statistical testing, or confidence intervals on metrics.
4. **Dataset isolation:** Telco and bank churn datasets are studied separately; few works evaluate cross-domain transfer or unified pipelines.

---

## 2. Idea of Solution

### 2.1 Architecture Overview

We propose **ChurnPredict Pro**, a **stacking ensemble architecture** that combines the complementary strengths of five diverse base learners under a meta-learner. The design philosophy is:

> *"Diversity in inductive bias reduces variance; interpretability in the meta-layer preserves actionability."*

### 2.2 The 5-Model Stacking Ensemble

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CHURNPRED PRO β€” STACKING ENSEMBLE                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β” β”‚
β”‚   β”‚ XGBoost  β”‚  β”‚LightGBM  β”‚  β”‚CatBoost  β”‚  β”‚  MLP     β”‚  β”‚  LR  β”‚ β”‚
β”‚   β”‚ (GBDT)   β”‚  β”‚ (GBDT)   β”‚  β”‚ (OGB)    β”‚  β”‚ (Deep)   β”‚  β”‚(Base)β”‚ β”‚
β”‚   β”‚  Base 1  β”‚  β”‚  Base 2  β”‚  β”‚  Base 3  β”‚  β”‚  Base 4  β”‚  β”‚Base 5β”‚ β”‚
β”‚   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”¬β”€β”€β”€β”˜ β”‚
β”‚        β”‚             β”‚             β”‚             β”‚            β”‚     β”‚
β”‚        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                            β”‚                                        β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                             β”‚
β”‚                    β”‚  META-LEARNER     β”‚                             β”‚
β”‚                    β”‚  (Logistic Reg    β”‚                             β”‚
β”‚                    β”‚   / XGBoost)      β”‚                             β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                             β”‚
β”‚                              β”‚                                      β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                             β”‚
β”‚                    β”‚  CLV SCORING      β”‚                             β”‚
β”‚                    β”‚  + SHAP EXPLAINER β”‚                             β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                             β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### 2.3 Why These 5 Base Models?

| Model | Inductive Bias | Strength on Churn Data | Weakness Mitigated by Ensemble |
|-------|---------------|------------------------|-------------------------------|
| **XGBoost** | Greedy gradient boosting with regularization | Best-in-class on sparse/tabular data; handles missing values natively | Prone to overfitting on small datasets |
| **LightGBM** | Histogram-based leaf-wise boosting | Faster training; GOSS sampling for large data | Leaf-wise can overfit; GOSS introduces bias |
| **CatBoost** | Ordered boosting + categorical encoding | Native categorical feature handling; reduces target leakage | Slower than LightGBM; ordered boosting complexity |
| **MLP (Deep)** | Non-linear feature interactions | Captures complex feature cross-products | Needs more data; less interpretable |
| **Logistic Regression** | Linear decision boundary | Fast, interpretable baseline; L1 regularization for feature selection | Cannot model non-linear relationships |

The meta-learner (Logistic Regression or a shallow XGBoost) learns optimal weights for combining the five base models' predictions, leveraging their uncorrelated errors.

### 2.4 CLV-Weighted Scoring

Instead of ranking customers by churn probability alone, we multiply P(churn) by estimated CLV to produce a **Retention Priority Score (RPS)**:

$$
\text{RPS}_i = P(\text{churn}_i) \times \text{CLV}_i
$$

This ensures retention campaigns target high-value at-risk customers, maximizing ROI.

---

## 3. Objectives

### 3.1 Primary Goals

| ID | Objective | Metric Target | Success Criterion |
|----|-----------|---------------|-----------------|
| P1 | Build a stacking ensemble that outperforms any single base model | F1-Score | Ξ”F1 β‰₯ +0.03 over best single model |
| P2 | Achieve high recall on churn class (minimize false negatives) | Recall@Churn | β‰₯ 0.85 on both datasets |
| P3 | Deliver actionable model explanations per customer | SHAP summary | Top-5 features identified per prediction |
| P4 | Rank customers by retention value, not just churn risk | AUC-PR weighted by CLV | ROC-AUC β‰₯ 0.90 |

### 3.2 Secondary Goals

| ID | Objective | Metric Target |
|----|-----------|---------------|
| S1 | Evaluate cross-domain generalization (Telco β†’ Bank, Bank β†’ Telco) | Transfer AUC β‰₯ 0.80 |
| S2 | Achieve sub-second inference latency for batch scoring | ≀ 500ms per 1,000 records |
| S3 | Deploy a reproducible, version-controlled pipeline | Docker + DVC + CI/CD |
| S4 | Document model behavior for regulatory compliance (GDPR/CCPA) | Full SHAP + model card |

### 3.3 Success Criteria Summary

- **Model Performance:** F1-Score > 0.85, ROC-AUC > 0.90, PR-AUC > 0.80 on both datasets
- **Business Impact:** Identify top 20% at-risk customers with β‰₯ 70% precision
- **Interpretability:** Every prediction accompanied by SHAP force plot; global SHAP summary for stakeholder dashboards
- **Robustness:** 5-fold stratified CV with 95% confidence intervals on all metrics

---

## 4. Literature Review & References

### 4.1 Category Overview

| Category | Count | Papers |
|----------|-------|--------|
| Ensemble / Boosting Methods | 4 | [1–4] |
| SHAP / LIME Interpretability | 3 | [5–7] |
| Deep Learning for Churn | 3 | [8–10] |
| CLV / Profit-Driven Churn | 3 | [11–13] |
| Financial / Bank Churn | 4 | [14–17] |
| Survey / Benchmark / Foundation | 4 | [18–21] |
| **Total** | **21** | |

### 4.2 Full References (2016–2024)

#### [1] XGBoost: A Scalable Tree Boosting System
**Chen, T., & Guestrin, C.** (2016). *KDD*. arXiv:1603.02754.
Introduced sparsity-aware algorithms and weighted quantile sketch for gradient boosting. Became the dominant algorithm for tabular churn prediction tasks worldwide.

#### [2] Tabular Data: Deep Learning is Not All You Need
**Shwartz-Ziv, R., & Armon, A.** (2021). arXiv:2106.03253.
Rigorous comparison showing XGBoost outperforms recent deep learning models on tabular data; ensembling deep models with XGBoost further improves performance.

#### [3] CatBoost: Unbiased Boosting with Categorical Features
**Prokhorenkova, L., et al.** (2017). arXiv:1706.09516.
Ordered boosting and novel categorical feature processing; outperforms other boosting implementations on datasets with high-cardinality categorical churn predictors.

#### [4] Enhancing Customer Churn Prediction: An Adaptive Ensemble Learning Approach
**Shaikhsurab, S., & Magadum, S.** (2024). arXiv:2408.16284.
Adaptive ensemble combining XGBoost, LightGBM, LSTM, MLP, and SVM with stacking + meta-feature generation; achieved **99.28% accuracy** on telecom churn datasets.

#### [5] A Unified Approach to Interpreting Model Predictions (SHAP)
**Lundberg, S. M., & Lee, S.-I.** (2017). *NeurIPS*. arXiv:1705.07874.
Proposed SHAP values as a unified measure of feature importance based on game-theoretic Shapley values; unified six existing explanation methods.

#### [6] "Why Should I Trust You?": Explaining Predictions of Any Classifier (LIME)
**Ribeiro, M. T., Singh, S., & Guestrin, C.** (2016). *KDD*. arXiv:1602.04938.
Introduced LIME to explain any classifier locally via interpretable surrogate models; foundational for churn model explainability and regulatory compliance.

#### [7] XAI Handbook: Towards a Unified Framework for Explainable AI
**Palacio, D. G., et al.** (2021). arXiv:2105.06677.
Provides theoretical framework unifying XAI terminology (LIME, SHAP, Grad-CAM, etc.); essential for regulatory compliance and method comparison in churn explainability.

#### [8] Early Churn Prediction from Large-Scale User-Product Interaction Time Series
**Bhattacharjee, A., Thukral, K., & Patil, C.** (2023). arXiv:2309.14390.
Applied multivariate time series classification with deep neural networks to fantasy sports churn; scales to 10⁸ users β€” demonstrates feasibility of deep learning at scale.

#### [9] Modelling Customer Churn for the Retail Industry in a Deep Learning Sequential Framework
**Equihua, C., et al.** (2023). arXiv:2304.00575.
Deep survival framework using recurrent neural networks for non-contractual retail churn; avoids extensive feature engineering through learned representations.

#### [10] Churn Reduction via Distillation
**Jiang, Y., et al.** (2021). arXiv:2106.02654.
Showed model distillation reduces predictive churn (model instability during retraining) while maintaining accuracy across FC, CNN, and transformer architectures.

#### [11] OptDist: Learning Optimal Distribution for Customer Lifetime Value Prediction
**Weng, S., et al.** (2024). arXiv:2408.08585.
Proposed OptDist with distribution learning/selection modules; adaptively selects optimal sub-distributions for CLTV prediction on public and industrial datasets.

#### [12] Customer Lifetime Value Prediction with Uncertainty Estimation Using Monte Carlo Dropout
**Cao, Y., Xu, Y., & Yang, Q.** (2024). arXiv:2411.15944.
Enhanced neural network CLTV prediction with Monte Carlo Dropout for uncertainty quantification; improved Top-5% MAPE significantly.

#### [13] A Predict-and-Optimize Approach to Profit-Driven Churn Prevention
**GΓ³mez-Vargas, E., Maldonado, S., & Vairetti, S.** (2023). arXiv:2310.07047.
First predict-and-optimize approach for churn prevention using individual CLVs (not averages); regret minimization via SGD; tested on 12 real-world datasets.

#### [14] Dynamic Customer Embeddings for Financial Service Applications
**Chitsazan, N., et al.** (2021). arXiv:2106.11880.
DCE framework uses customer digital activity + financial context for intent/fraud/call-center prediction; financial services benchmark for learned representations.

#### [15] FinPT: Financial Risk Prediction with Profile Tuning on Pretrained Foundation Models
**Yin, H., et al.** (2023). arXiv:2308.00065.
Introduced FinBench dataset + FinPT method for financial risk prediction (default, fraud, churn) using LLM-generated customer profiles; strong zero-shot transfer.

#### [16] Advanced User Credit Risk Prediction Using LightGBM, XGBoost and TabNet with SMOTEENN
**Yu, B., et al.** (2024). arXiv:2408.03497.
Combined PCA, SMOTEENN, and LightGBM for bank credit risk prediction; outperformed other models in identifying high-quality applicants under class imbalance.

#### [17] Credit Card Fraud Detection β€” Classifier Selection Strategy
**Kulatilleke, S.** (2022). arXiv:2208.11900.
Data-driven classifier selection + sampling methods for imbalanced fraud detection; directly applicable to churn's class imbalance challenges.

#### [18] Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data
**Gregory, J.** (2018). arXiv:1802.03396.
Applied XGBoost with temporal feature engineering to time-series churn data; achieved top performance in large-scale competition settings.

#### [19] Predictive Churn with the Set of Good Models
**Watson-Daniels, D., et al.** (2024). arXiv:2402.07745.
Examined prediction instability during model retraining via Rashomon set; critical for production churn model deployment and monitoring.

#### [20] Retention Is All You Need
**Mohiuddin, K., et al.** (2023). arXiv:2304.03103.
HR Decision Support System using SHAP + what-if analysis for employee attrition; demonstrates SHAP utility for retention/churn use cases with interpretable dashboards.

#### [21] Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance
**(2024).** arXiv:2409.19751.
Comprehensive study of SMOTE, Class Weights, and Decision Threshold Calibration for binary classification; **Decision Threshold Calibration most consistently effective** β€” directly guides our experimental design.

---

## 5. Dataset Understanding

### 5.1 Dataset 1: Telco Customer Churn (IBM)

**Source:** [aai510-group1/telco-customer-churn](https://hf.co/datasets/aai510-group1/telco-customer-churn)
**Type:** Fictional telecommunications company data
**Format:** CSV / Parquet
**Splits:** train / validation / test

#### Schema Summary

| Feature Category | Count | Key Features |
|-----------------|-------|-------------|
| **Demographics** | 7 | Age, Gender, Married, Dependents, Number of Dependents, Senior Citizen, Under 30 |
| **Service Usage** | 10 | Phone Service, Internet Service, Internet Type, Multiple Lines, Online Security, Online Backup, Device Protection, Tech Support, Streaming TV, Streaming Movies |
| **Contract & Billing** | 6 | Contract, Payment Method, Paperless Billing, Monthly Charge, Total Charges, Total Refunds |
| **Engagement** | 7 | Tenure (months), Number of Referrals, Referred a Friend, Offer, Satisfaction Score, Churn Score, Quarter |
| **Revenue** | 6 | Total Revenue, Total Long Distance Charges, Total Extra Data Charges, Avg Monthly Long Distance, Avg Monthly GB Download, CLTV |
| **Geographic** | 5 | City, State, Zip Code, Latitude, Longitude, Population |
| **Target** | 2 | Churn (binary), Churn Reason (string), Churn Category (string), Customer Status |

**Total Features:** ~52 (including derived identifiers like `Lat Long`, `Customer ID`)

#### Class Distribution (Audited)

| Split | Total Rows | Churned (1) | Stayed (0) | Churn Rate |
|-------|-----------|-------------|------------|------------|
| Train | ~4,400 | ~1,100 | ~3,300 | ~25% |
| Validation | ~1,500 | ~375 | ~1,125 | ~25% |
| Test | ~1,500 | ~375 | ~1,125 | ~25% |

*Note: Exact counts vary by split. The dataset exhibits moderate class imbalance (~25% churn), manageable without aggressive oversampling.*

#### Notable Data Characteristics

1. **Rich categorical encoding:** Internet Type (DSL, Fiber Optic, Cable, None), Contract (Month-to-Month, One Year, Two Year), Payment Method (4 types)
2. **Temporal granularity:** `Quarter` field (Q1–Q4) enables time-aware feature engineering
3. **Pre-computed churn scores:** `Churn Score` (0–100) and `Satisfaction Score` (1–5) are strong engineered features β€” risk of target leakage if not handled carefully
4. **CLTV integration:** `CLTV` field directly available for revenue-weighted ranking
5. **Geographic features:** Latitude/longitude enable spatial clustering or geo-derived features

#### Data Quality Flags

- `Total Charges` has blank/missing values for zero-tenure customers (new sign-ups)
- `Churn Reason` and `Churn Category` are populated only for churned customers β€” post-hoc labels, not usable as features
- `Customer Status` is highly correlated with target; should be excluded or used as stratification
- Some categorical fields (City, State) have high cardinality (50+ states, 1,000+ cities)

---

### 5.2 Dataset 2: Bank Customer Churners

**Source:** [ZZHHJ/bank_churners](https://hf.co/datasets/ZZHHJ/bank_churners)
**Type:** Credit card customer attrition data
**Format:** CSV / Parquet
**Splits:** single train split (requires manual partitioning)

#### Schema Summary

| Feature Category | Count | Key Features |
|-----------------|-------|-------------|
| **Demographics** | 4 | Customer_Age, Gender, Dependent_count, Education_Level, Marital_Status, Income_Category |
| **Account Behavior** | 5 | Months_on_book, Total_Relationship_Count, Months_Inactive_12_mon, Contacts_Count_12_mon, Card_Category |
| **Financial** | 7 | Credit_Limit, Total_Revolving_Bal, Avg_Open_To_Buy, Total_Amt_Chng_Q4_Q1, Total_Trans_Amt, Total_Trans_Ct, Total_Ct_Chng_Q4_Q1, Avg_Utilization_Ratio |
| **Target** | 1 | Attrition_Flag (Existing Customer / Attrited Customer) |
| **Artifacts** | 2 | Naive_Bayes_Classifier columns (pre-computed probabilities β€” **must be removed** to avoid data leakage) |

**Total Features:** 21 (19 usable + 1 ID + 2 NB artifacts to drop)

#### Class Distribution (Estimated)

| Class | Approximate Count | Rate |
|-------|-------------------|------|
| Existing Customer | ~8,500 | ~83% |
| Attrited Customer | ~1,700 | ~17% |

**Churn rate ~17%** β€” more imbalanced than Telco; SMOTE/ADASYN or class weighting will be necessary.

#### Notable Data Characteristics

1. **Quarter-over-quarter dynamics:** `Total_Amt_Chng_Q4_Q1` and `Total_Ct_Chng_Q4_Q1` capture behavioral velocity β€” powerful churn signals
2. **Utilization ratio:** `Avg_Utilization_Ratio` is a strong proxy for engagement; low utilization often precedes attrition
3. **Income categories are binned:** `$60K - $80K`, `$80K - $120K`, etc. β€” ordinal encoding preferred
4. **Card category:** `Blue` (vast majority), `Silver`, `Gold`, `Platinum` β€” strong class imbalance within feature itself

#### Data Quality Flags

- **Critical:** Two `Naive_Bayes_Classifier_*` columns are pre-computed churn probabilities from a baseline model. Using them as features would constitute **data leakage** β€” they must be dropped before any model training.
- No explicit CLTV field; must be estimated from `Credit_Limit`, `Total_Trans_Amt`, and `Total_Trans_Ct`
- Single split requires manual stratified partitioning (70/15/15 or 80/10/10)

---

### 5.3 Cross-Dataset Comparison

| Attribute | Telco (IBM) | Bank Churners |
|-----------|-------------|---------------|
| **Records** | ~7,000 | ~10,000 |
| **Features (usable)** | ~45 | ~19 |
| **Churn Rate** | ~25% | ~17% |
| **Industry** | Telecommunications | Banking / Credit Cards |
| **Temporal Features** | Quarter, Tenure (months) | Months_on_book, Q4/Q1 change ratios |
| **CLTV Available** | Yes (explicit field) | No (must derive) |
| **Geographic Data** | Yes (lat/lon, city, state) | No |
| **Pre-computed Scores** | Churn Score, Satisfaction | Naive Bayayes (leakage β€” drop) |
| **Class Imbalance Severity** | Moderate | High |
| **Primary Churn Driver** | Contract type, tenure, service usage | Inactivity, transaction decline, utilization |

---

## 6. Proposed Methodology

### 6.1 The 7-Phase Pipeline

```
Phase 1: Data Ingestion & Audit
    ↓
Phase 2: Preprocessing & Feature Engineering
    ↓
Phase 3: Exploratory Data Analysis (EDA)
    ↓
Phase 4: Model Training β€” 5-Base Stacking Ensemble
    ↓
Phase 5: Hyperparameter Optimization
    ↓
Phase 6: Evaluation, Interpretability & CLV Scoring
    ↓
Phase 7: Deployment, Monitoring & Documentation
```

### Phase 1: Data Ingestion & Audit

- Load both datasets from Hugging Face `datasets` library
- Compute schema validation: type checks, missing value audit, cardinality report
- Flag anomalous values (negative charges, impossible ages, blank `Total Charges`)
- Document data provenance and version hashes (DVC)

### Phase 2: Preprocessing & Feature Engineering

#### 2A. Cleaning
- **Telco:** Impute `Total Charges` blanks with `Monthly Charge Γ— Tenure`
- **Bank:** Drop `Naive_Bayes_Classifier_*` columns immediately
- Both datasets: remove ID fields (`Customer ID`, `CLIENTNUM`)

#### 2B. Encoding
| Feature Type | Encoding Strategy | Rationale |
|-------------|-------------------|-----------|
| Binary categorical | Label encoding (0/1) | `Gender`, `Partner`, `PhoneService` |
| Low-cardinality ordinal | One-hot encoding | `Contract`, `Payment Method`, `Education_Level` |
| High-cardinality nominal | Target encoding / CatBoost native | `City`, `State` (Telco); `Income_Category` (Bank) |
| Cyclical temporal | Sine/cosine encoding | `Quarter` mapped to angle |

#### 2C. Feature Engineering
- **RFM-style features (Bank):** Recency = `Months_Inactive_12_mon`, Frequency = `Total_Trans_Ct`, Monetary = `Total_Trans_Amt`
- **Engagement ratio (Telco):** `Satisfaction_Score / Churn_Score` as loyalty proxy
- **Velocity features:** Month-over-month change in charges and usage
- **CLTV proxy (Bank):** `Credit_Limit Γ— Avg_Utilization_Ratio Γ— (12 - Months_Inactive_12_mon)`

#### 2D. Scaling & Imbalance Handling
- Numerical features β†’ RobustScaler (median/IQR, resistant to outliers)
- Class imbalance β†’ SMOTEENN (SMOTE + Edited Nearest Neighbours) on training fold only; **never on validation/test**
- Class weights β†’ `scale_pos_weight = len(negative) / len(positive)` for XGBoost/LightGBM

### Phase 3: Exploratory Data Analysis (EDA)

- Univariate distributions (histograms, boxplots for skew detection)
- Bivariate analysis: churn rate by contract type, payment method, tenure bins
- Correlation matrix (Spearman for non-linear relationships)
- Feature-target mutual information scores for feature selection
- Geographic heatmap (Telco: churn rate by state)

### Phase 4: Model Training β€” Stacking Ensemble

#### 4A. Cross-Validation Strategy
- **5-fold Stratified Cross-Validation** to preserve class distribution
- **GroupKFold** if temporal leakage risk (same customer in multiple quarters)
- Out-of-fold (OOF) predictions from each base model used as meta-features

#### 4B. Base Model Training

| Base Model | Key Hyperparameters | Tuning Range |
|-----------|-------------------|--------------|
| XGBoost | `max_depth`, `learning_rate`, `subsample`, `colsample_bytree`, `scale_pos_weight` | depth: 3–8; lr: 0.01–0.3 |
| LightGBM | `num_leaves`, `learning_rate`, `feature_fraction`, `bagging_fraction`, `is_unbalance` | leaves: 20–100; lr: 0.01–0.3 |
| CatBoost | `depth`, `learning_rate`, `iterations`, `auto_class_weights` | depth: 4–10; iterations: 200–1000 |
| MLP | `hidden_layers`, `dropout`, `batch_size`, `learning_rate` | layers: (128,64), (256,128,64); dropout: 0.2–0.5 |
| Logistic Regression | `C`, `penalty`, `solver`, `class_weight` | C: 0.001–10; penalty: l1/l2/elasticnet |

#### 4C. Meta-Learner Training
- Input: 5 OOF probability vectors (one per base model) + optionally top-K original features
- Model: **Logistic Regression** (interpretable weights showing model contribution) OR **XGBoost** (if non-linear meta-interactions needed)
- Validation: Same 5-fold CV; meta-learner trained on OOF predictions, tested on hold-out

### Phase 5: Hyperparameter Optimization

- **Optuna** with **TPESampler** (Tree-structured Parzen Estimator)
- 100 trials per base model; 50 trials for meta-learner
- Pruning: `MedianPruner` with early stopping on validation F1
- Objective: Maximize F1-Score (harmonic mean of precision and recall)

### Phase 6: Evaluation, Interpretability & CLV Scoring

#### 6A. Metrics Suite (10 metrics)
1. Accuracy
2. Precision (Churn class)
3. Recall (Churn class)
4. F1-Score
5. ROC-AUC
6. PR-AUC (Precision-Recall AUC β€” critical for imbalanced data)
7. Matthews Correlation Coefficient (MCC)
8. Cohen's Kappa
9. Balanced Accuracy
10. Expected Calibration Error (ECE)

#### 6B. SHAP Analysis
- **Global:** SHAP summary plot (beeswarm) showing feature importance across full dataset
- **Local:** SHAP force plot for individual predictions β€” customer-level actionable insights
- **Dependence:** SHAP dependence plots for top-5 features revealing interaction effects

#### 6C. CLV Scoring
- **Telco:** Use explicit `CLTV` field; multiply by churn probability
- **Bank:** Derive CLV proxy; multiply by churn probability
- Output: Prioritized customer list sorted by RPS (Retention Priority Score)
- Segment: Top 10% (urgent), 10–30% (high), 30–60% (medium), 60–100% (low)

### Phase 7: Deployment, Monitoring & Documentation

- Model serialization: `joblib` for sklearn/CatBoost, native formats for XGBoost/LightGBM
- Inference pipeline: `scikit-learn Pipeline` + custom transformers
- Monitoring: Track prediction distribution drift, feature drift, and metric decay over time
- Documentation: Model card with intended use, limitations, bias analysis, and SHAP summary

---

## 7. Implementation Strategy

### 7.1 Tech Stack

| Layer | Technology | Purpose |
|-------|-----------|---------|
| **Data Loading** | `datasets` (HF), `pandas`, `polars` | Efficient dataset ingestion |
| **Preprocessing** | `scikit-learn` (Pipeline, ColumnTransformer, RobustScaler) | Reproducible feature engineering |
| **ML Models** | `xgboost`, `lightgbm`, `catboost`, `scikit-learn` (MLP, LR) | Base learners |
| **Ensemble** | `mlens` / custom stacking with `scikit-learn` | Meta-learner orchestration |
| **Imbalance** | `imbalanced-learn` (SMOTEENN) | Oversampling + cleaning |
| **Optimization** | `optuna` | Hyperparameter search |
| **Interpretability** | `shap` | Game-theoretic explanations |
| **Tracking** | `trackio` + `mlflow` | Experiment logging, metrics, artifacts |
| **Deployment** | `gradio` / `fastapi` + Docker | API inference and UI demo |
| **Versioning** | `dvc` + `git` | Data and model versioning |

### 7.2 4-Week Timeline

| Week | Focus | Deliverables |
|------|-------|-------------|
| **Week 1** | Data audit, preprocessing, EDA | Clean notebooks; feature engineering pipeline; data quality report |
| **Week 2** | Base model training, hyperparameter tuning | 5 trained base models; Optuna study results; OOF prediction matrices |
| **Week 3** | Stacking ensemble, evaluation, SHAP analysis | Trained meta-learner; 10-metric report; SHAP dashboards; CLV scoring |
| **Week 4** | Cross-domain testing, deployment, documentation | Generalization report; Gradio demo; model card; final documentation |

### 7.3 Code Architecture

```
churnpredict-pro/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                    # HF datasets (versioned with DVC)
β”‚   β”œβ”€β”€ processed/              # Train/val/test splits
β”‚   └── engineered/             # Feature-engineered datasets
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ 01_eda_telco.ipynb
β”‚   β”œβ”€β”€ 02_eda_bank.ipynb
β”‚   β”œβ”€β”€ 03_feature_engineering.ipynb
β”‚   └── 04_shap_analysis.ipynb
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ load_datasets.py    # HF datasets loader
β”‚   β”‚   β”œβ”€β”€ preprocess.py       # Cleaning + encoding + scaling
β”‚   β”‚   └── feature_engineer.py # RFM, velocity, CLV proxy
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ base_models.py      # XGB, LGBM, CatBoost, MLP, LR wrappers
β”‚   β”‚   β”œβ”€β”€ stacking_ensemble.py # OOF + meta-learner
β”‚   β”‚   └── hyperparameter_search.py # Optuna studies
β”‚   β”œβ”€β”€ evaluation/
β”‚   β”‚   β”œβ”€β”€ metrics.py          # 10-metric computation
β”‚   β”‚   β”œβ”€β”€ shap_explainer.py   # Global + local SHAP
β”‚   β”‚   └── clv_scorer.py       # RPS computation
β”‚   └── deployment/
β”‚       β”œβ”€β”€ inference_pipeline.py
β”‚       └── app.py              # Gradio/FastAPI interface
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ telco_config.yaml
β”‚   └── bank_config.yaml
β”œβ”€β”€ experiments/                # Trackio / MLflow runs
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_preprocessing.py
β”‚   └── test_models.py
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ dvc.yaml
└── README.md
```

---

## 8. Experimental Design

### 8.1 Five Experiments

| ID | Experiment | Hypothesis | Method |
|----|-----------|------------|--------|
| **E1** | Single Model Baseline | Individual models underperform ensemble due to bias-variance limitations | Train each of 5 base models standalone; report metrics |
| **E2** | Stacking Ensemble | Meta-learner combining 5 models outperforms best single model by β‰₯ 3% F1 | 5-fold OOF stacking with LR meta-learner |
| **E3** | Imbalance Strategy Comparison | Threshold calibration is more effective than SMOTE for churn (per [21]) | Compare: (a) no correction, (b) SMOTEENN, (c) class weights, (d) threshold calibration |
| **E4** | Cross-Domain Transfer | Models trained on Telco generalize to Bank with β‰₯ 80% AUC | Train on Telco, evaluate zero-shot on Bank; then fine-tune |
| **E5** | CLV-Weighted vs. Uniform Ranking | RPS improves campaign ROI over probability-only ranking | Compare top-20% precision: P(churn) only vs. P(churn) Γ— CLV |

### 8.2 Ten Evaluation Metrics

| # | Metric | Formula / Definition | Why It Matters for Churn |
|---|--------|---------------------|-------------------------|
| 1 | **Accuracy** | (TP + TN) / (TP + TN + FP + FN) | Overall correctness; misleading if imbalanced |
| 2 | **Precision (Churn)** | TP / (TP + FP) | Of predicted churners, how many actually churn? (cost of false alarms) |
| 3 | **Recall (Churn)** | TP / (TP + FN) | Of actual churners, how many did we catch? (cost of missed churners) |
| 4 | **F1-Score** | 2 Γ— (Precision Γ— Recall) / (Precision + Recall) | Harmonic mean; balances precision and recall |
| 5 | **ROC-AUC** | Area under ROC curve | Discrimination ability across all thresholds |
| 6 | **PR-AUC** | Area under Precision-Recall curve | More informative than ROC-AUC for imbalanced data |
| 7 | **MCC** | (TPΓ—TN βˆ’ FPΓ—FN) / √(product of marginals) | Correlation between prediction and truth; robust to imbalance |
| 8 | **Cohen's Kappa** | (Observed βˆ’ Expected) / (1 βˆ’ Expected) | Agreement beyond chance; useful for inter-rater reliability analogies |
| 9 | **Balanced Accuracy** | (Sensitivity + Specificity) / 2 | Average of recall on both classes; fair on imbalanced data |
| 10 | **ECE** | Expected Calibration Error | Measures reliability of probability outputs; critical for CLV weighting |

### 8.3 Statistical Rigor

1. **Confidence Intervals:** All metrics reported with 95% CIs from 5-fold CV (bootstrap percentile method)
2. **McNemar's Test:** Statistically compare stacking ensemble vs. best single model
3. **DeLong's Test:** Compare ROC-AUC differences between models
4. **Permutation Test:** Validate feature importance scores from SHAP
5. **Stratification:** All splits stratified on target + `Contract` type (strongest churn predictor) to prevent distribution shift

### 8.4 Reproducibility Checklist

- [ ] Random seeds fixed (`random_state=42`) for all stochastic operations
- [ ] `requirements.txt` with exact versions (via `pip freeze`)
- [ ] DVC tracking for data and model artifacts
- [ ] Git commit hash recorded with every experiment
- [ ] Trackio / MLflow logging of hyperparameters, metrics, and artifact paths

---

## 9. Result Analysis

### 9.1 Expected Performance

Based on literature benchmarks ([4] achieved 99.28% on telecom; [16] achieved strong results on bank credit risk with SMOTEENN + LightGBM), our targets are conservative and grounded:

| Dataset | Best Single Model F1 | Stacking Ensemble F1 | Expected Ξ” |
|---------|---------------------|---------------------|------------|
| **Telco** | 0.82–0.84 (XGBoost/CatBoost) | **0.86–0.88** | +0.03–0.04 |
| **Bank** | 0.78–0.81 (LightGBM/XGBoost) | **0.82–0.85** | +0.03–0.04 |

### 9.2 SHAP Analysis β€” Expected Insights

Based on prior churn research, we anticipate the following feature importance rankings:

**Telco (Expected Top 5 SHAP Features):**
1. `Contract` (Month-to-Month vs. longer) β€” strongest predictor
2. `Tenure in Months` β€” inverse relationship with churn
3. `Monthly Charge` / `Total Charges` β€” price sensitivity
4. `Internet Type` (Fiber Optic churns more than DSL)
5. `Payment Method` (Electronic check = high risk)

**Bank (Expected Top 5 SHAP Features):**
1. `Total_Trans_Ct` (transaction frequency decline)
2. `Total_Trans_Amt` (monetary decline)
3. `Months_Inactive_12_mon` (recency of activity)
4. `Total_Relationship_Count` (cross-product engagement)
5. `Contacts_Count_12_mon` (complaint/contact proxy)

### 9.3 Business Impact Projections

Assuming a hypothetical telecom with:
- 100,000 customers
- 25% annual churn rate
- Average CLV = $3,000
- Retention campaign cost = $50 per targeted customer
- Campaign success rate (if well-targeted) = 30%

| Scenario | Customers Targeted | Campaign Cost | Churners Caught | Revenue Saved | Net ROI |
|----------|-------------------|---------------|-----------------|---------------|---------|
| Random targeting (25% churn) | 20,000 | $1,000,000 | 1,500 | $4,500,000 | 4.5Γ— |
| Model-guided (top 20% by RPS) | 20,000 | $1,000,000 | 4,200 | $12,600,000 | **12.6Γ—** |

*Model-guided targeting improves ROI by ~2.8Γ— over random selection by focusing on high-value, high-probability churners.*

### 9.4 Visualization Plan

| Visualization | Purpose |
|--------------|---------|
| ROC & PR curves (all models overlaid) | Comparative discrimination |
| Confusion matrices | Error type analysis |
| SHAP summary plot (beeswarm) | Global feature importance |
| SHAP force plots (sample customers) | Local explanations for stakeholders |
| SHAP dependence plots | Feature interaction discovery |
| Calibration plot (predicted vs. actual) | Probability reliability |
| CLV-RPS scatter plot | Segmentation visualization |
| Metric bar chart with 95% CIs | Statistical comparison |

---

## 10. Iterative Improvement

### 10.1 Six Iteration Cycles

| Iteration | Focus | Action | Expected Outcome |
|-----------|-------|--------|------------------|
| **Iter 1** | Feature Engineering Deep-Dive | Add polynomial features (tenureΒ², chargeΒ²); interaction terms (contract Γ— monthly charge); binning (tenure quartiles) | +1–2% F1 from non-linear feature capture |
| **Iter 2** | Advanced Sampling | Replace SMOTEENN with ADASYN + Edited Nearest Neighbours; test BorderlineSMOTE | Better synthetic sample quality near decision boundary |
| **Iter 3** | Deep Learning Augmentation | Replace MLP with TabNet or FT-Transformer for tabular deep learning; compare against MLP base | Validate whether deep tabular models improve ensemble diversity |
| **Iter 4** | Temporal Modeling | For Telco: add LSTM/GRU on quarterly customer journey sequences; for Bank: add transaction time-series | Capture temporal churn dynamics; +2–3% F1 on time-sensitive subsets |
| **Iter 5** | Ensemble Expansion | Add 6th base model (Random Forest or Extra Trees) for additional variance reduction; test blending vs. stacking | Further variance reduction; marginal F1 gain of +0.5–1% |
| **Iter 6** | Production Hardening | Dockerize inference; add A/B test framework; build automated retraining trigger on drift detection; write full production documentation | Deployable system with monitoring, retraining, and compliance docs |

### 10.2 Production Documentation Deliverables

| Document | Contents | Audience |
|----------|----------|----------|
| **Model Card** | Intended use, training data summary, performance metrics, limitations, bias assessment, ethical considerations | Data scientists, regulators |
| **API Documentation** | Endpoint specs, request/response schemas, rate limits, error codes | Engineering teams |
| **SHAP Dashboard Guide** | How to read force plots, summary plots, and dependence plots | Business stakeholders, customer success |
| **Retention Playbook** | How to act on RPS segments; recommended interventions per churn reason | Marketing, customer success |
| **Retraining SOP** | When and how to retrain; drift detection thresholds; rollback procedures | MLOps, data engineering |
| **Compliance Checklist** | GDPR Article 22 (automated decision-making), CCPA, internal audit requirements | Legal, compliance |

---

## Appendix A: Key Equations

**Retention Priority Score:**
$$
\text{RPS}_i = P(\text{churn}_i) \times \text{CLV}_i
$$

**F1-Score:**
$$
F1 = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
$$

**Matthews Correlation Coefficient:**
$$
\text{MCC} = \frac{TP \times TN - FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}
$$

**Expected Calibration Error:**
$$
\text{ECE} = \sum_{m=1}^{M} \frac{|B_m|}{n} \left| \text{acc}(B_m) - \text{conf}(B_m) \right|
$$

---

*Document compiled for the ChurnPredict Pro project. All datasets verified on Hugging Face Hub. All 21 references span peer-reviewed and high-impact arXiv publications from 2016–2024.*