File size: 13,212 Bytes
77ee06a edc7217 77ee06a edc7217 77ee06a edc7217 77ee06a edc7217 219c8fd 77ee06a 219c8fd 77ee06a 219c8fd edc7217 219c8fd edc7217 219c8fd edc7217 219c8fd edc7217 219c8fd edc7217 219c8fd edc7217 219c8fd edc7217 219c8fd edc7217 219c8fd edc7217 77ee06a edc7217 77ee06a edc7217 77ee06a edc7217 219c8fd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 | ---
license: cc-by-nc-4.0
library_name: joblib
pipeline_tag: tabular-classification
tags:
- relationships
- gottman
- survival-analysis
- cox-proportional-hazards
- xgboost
- lightgbm
- catboost
- shap
- ensemble
- tabular-classification
- couples
- social-science
datasets:
- mstz/speeddating
- vedastro-org/15000-Famous-People-Marriage-Divorce-Info
metrics:
- roc_auc
- accuracy
- f1
model-index:
- name: relationship-longevity-predictor-v2
results:
- task:
type: tabular-classification
name: Relationship Longevity Prediction
dataset:
name: Speed Dating + Gottman Divorce + Vedastro Marriages (composite)
type: custom
metrics:
- type: roc_auc
value: 0.8896
name: AUC-ROC
- type: accuracy
value: 0.859
name: Accuracy
- type: f1
value: 0.630
name: F1
---
# π Relationship Longevity Predictor β v2.0
**An ensemble ML model that predicts long-term relationship compatibility from two people's profiles, grounded in Gottman's Four Horsemen and Cox proportional hazards survival analysis.**
π **[Try the live demo β](https://huggingface.co/spaces/Builder-Neekhil/relationship-longevity-predictor-demo)**
---
## What this is (and isn't)
**Is:** A well-calibrated research artifact. An ensemble (XGBoost + LightGBM + CatBoost) trained on three open datasets, with Gottman behavioral proxies and survival priors layered in. Think of it as a **mirror** that reflects patterns the literature has documented β not a crystal ball.
**Isn't:** A decision tool. Don't break up, propose, or pick a partner based on its output. The interesting question isn't "what score did I get" β it's "which of the Four Horsemen showed up in my top factors, and why."
**Training data is narrow:** Columbia speed-daters (2002β2004), 170 Turkish couples from the YΓΆntem Gottman study, and 14,688 public-figure marriages pulled from a dataset originally compiled by Vedastro for unrelated research (we used only the marriage/divorce metadata β no astrological features). Generalization beyond these cohorts is unverified. See Limitations.
---
## π Headline Results
| Metric | v1.0 Baseline | v2.0 Enhanced | Change |
| --- | --- | --- | --- |
| **AUC-ROC** | 0.8842 | **0.8896** | +0.0055 β
|
| **AUC-PR** | 0.6933 | **0.7108** | +0.0175 β
|
| **Brier Score** | 0.0960 | **0.0934** | -0.0026 β
|
| **Accuracy** | 83.5% | **85.9%** | +2.4% β
|
| **F1 Score** | 0.620 | **0.630** | +0.010 β
|
| **Precision** | 52.4% | **58.9%** | +6.5% β
|
**Key improvement: +12.3% precision boost** β far fewer false positives than v1.
### What Changed
v2.0 adds **20 new features** from two additional data sources:
| Source | Features Added | Signal |
|--------|:-:|---|
| **Gottman Behavioral Model** (Phase 1) | 13 | Contempt, criticism, defensiveness, stonewalling proxy scores derived from 170-couple divorce study |
| **Marriage Duration Survival Model** (Phase 2) | 7 | Longevity priors from 14,688 real marriages (age-risk, relationship-history risk, timing hazard) |
**8 of the 20 new features ranked in the top 30** most important features by SHAP:
| Rank | New Feature | SHAP | Source |
|:---:|---|:---:|---|
| 3 | `gottman_proxy_love_maps` | 0.447 | π΄ Gottman |
| 4 | `gottman_proxy_contempt_x_stonewalling` | 0.403 | π΄ Gottman |
| 8 | `gottman_proxy_ratio` | 0.306 | π΄ Gottman |
| 10 | `gottman_proxy_stonewalling` | 0.279 | π΄ Gottman |
| 12 | `gottman_proxy_horsemen` | 0.264 | π΄ Gottman |
| 21 | `gottman_proxy_net_risk` | 0.189 | π΄ Gottman |
| 27 | `survival_age_gap_risk` | 0.163 | π΅ Survival |
| 29 | `gottman_proxy_contempt` | 0.160 | π΄ Gottman |
---
## Phase 1: Gottman Behavioral Model
**Dataset:** YΓΆntem et al. Divorce Predictors β 170 married/divorced Turkish couples, 54 Gottman-mapped behavioral questions.
**Standalone performance:** AUC = **0.998**, Accuracy = **98.2%** on predicting divorce from behavioral patterns.
The 54 questions map to Gottman's relationship theory:
| Gottman Dimension | Questions | What It Measures |
|---|:-:|---|
| **Shared Goals** | Q1-Q10 | Aligned life direction, quality time, common objectives |
| **Love Maps** | Q11-Q20 | Values alignment, role expectations, compatibility beliefs |
| **Love Maps Deep** | Q21-Q30 | Knowing partner's inner world, stress, hopes, anxieties |
| **Criticism** | Q31-Q32, Q37-Q38 | Attacking character, negative statements, sudden arguments |
| **Contempt** | Q33-Q36, Q39-Q40 | Insults, humiliation, anger escalation, hatred |
| **Defensiveness** | Q41, Q45-Q46, Q48-Q50 | Blame-shifting, victimhood, refusing responsibility |
| **Stonewalling** | Q42-Q44, Q47 | Silence, withdrawal, leaving, shutting down |
| **Deep Contempt** | Q51-Q54 | Attributing meanness, vindictiveness, pathology to partner |
**Top divorce predictor by SHAP:** `love_maps Γ shared_goals` interaction β couples who *both* lack shared goals *and* don't know each other's inner world face the highest divorce risk.
### Gottman Proxy Features (mapped to speed dating data)
Since speed dating participants didn't answer the 54 Gottman questions, we created **proxy scores** by mapping their existing personality/perception data to Gottman dimensions:
| Proxy | Derived From |
|---|---|
| `gottman_proxy_contempt` | Low mutual scores + high perception gaps |
| `gottman_proxy_criticism` | Misaligned values + asymmetric ratings |
| `gottman_proxy_defensiveness` | Self-rating inflation vs partner perception |
| `gottman_proxy_stonewalling` | Low engagement, low liking, no shared interests |
| `gottman_proxy_love_maps` | Interest correlation + shared interests + mutual perception accuracy |
| `gottman_proxy_shared_goals` | Value alignment + interest overlap |
| `gottman_proxy_ratio` | The famous Gottman 5:1 positive-to-negative ratio |
---
## Phase 2: Marriage Duration Survival Model
**Dataset:** [vedastro-org/15000-Famous-People-Marriage-Divorce-Info](https://hf.co/datasets/vedastro-org/15000-Famous-People-Marriage-Divorce-Info) β 14,688 marriage records from 12,353 famous people.
### Key Findings
| Finding | Statistic |
|---|---|
| **Overall divorce rate** | 34.5% |
| **Median divorce timing** | 7 years |
| **Most dangerous period** | 3-7 years (41.1% of all divorces) |
| **Love marriage divorce rate** | 34.1% |
| **Arranged marriage divorce rate** | 23.4% (p=0.006, significantly lower) |
| **First marriage divorce rate** | 27.8% |
| **Subsequent marriage divorce rate** | **69.3%** |
### Cox Proportional Hazards Model (Concordance = 0.64)
| Factor | Hazard Ratio | p-value | Meaning |
|---|:---:|:---:|---|
| **Is first marriage** | **0.26** | <0.001 | 74% lower divorce hazard than subsequent marriages |
| **Is love marriage** | **0.77** | 0.002 | 23% lower hazard than non-love marriages |
| **Age at marriage** | **0.96** | <0.001 | Each year older β 4% lower divorce hazard |
| **Marriage number** | **1.34** | <0.001 | Each additional marriage β 34% higher hazard |
### Divorce Timing Distribution

### Kaplan-Meier Survival Curves


---
## Model Architecture (v2.0)
**Ensemble of 3 gradient-boosted tree models** with **133 engineered features** (113 original + 13 Gottman + 7 survival):
| Model | Weight | v1 AUC | v2 AUC | Change |
|-------|:---:|:---:|:---:|:---:|
| XGBoost | 0.40 | 0.8852 | 0.8920 | +0.0068 |
| LightGBM | 0.35 | 0.8912 | **0.9011** | +0.0099 |
| CatBoost | 0.25 | 0.8661 | 0.8688 | +0.0027 |
| **Ensemble** | β | 0.8842 | **0.8896** | +0.0055 |
## Visualizations
### v1 vs v2 ROC Comparison

### Metrics Comparison

### Feature Source Contribution

### Enhanced SHAP Summary (v2)

### v1 Visualizations
| | |
|---|---|
|  |  |
|  |  |
---
## Training Data
| Dataset | Records | Role |
|---|:---:|---|
| [mstz/speeddating](https://hf.co/datasets/mstz/speeddating) | 1,048 encounters | Primary training data β individual profiles + match outcome |
| YΓΆntem et al. Divorce Predictors (Kaggle) | 170 couples | Phase 1 β Gottman behavioral feature engineering |
| [vedastro-org/15000-Famous-People-Marriage-Divorce-Info](https://hf.co/datasets/vedastro-org/15000-Famous-People-Marriage-Divorce-Info) | 14,688 marriages | Phase 2 β Longevity priors + survival analysis |
## Literature Basis
| Paper | Contribution |
|-------|-------------|
| Grinsztajn et al. (NeurIPS 2022) β *"Why do tree-based models still outperform deep learning on tabular data?"* | Validated XGBoost/LightGBM as SOTA for medium-sized tabular data |
| Fisman et al. (QJE 2006) β *"Gender Differences in Mate Selection"* | Original speed dating experiment; ~70% accuracy with logistic regression |
| **Gottman & Silver (1999) β *"The Seven Principles for Making Marriage Work"*** | **Four Horsemen framework: contempt, criticism, defensiveness, stonewalling** |
| **YΓΆntem et al. (2019) β *"Divorce Prediction Using Correlation Based Feature Selection"*** | **54-question Gottman-mapped divorce predictor; published 97.7% accuracy** |
| Savcisens et al. (Nature Human Behaviour 2024) β *"Using Sequences of Life-events to Predict Human Lives"* | life2vec β longitudinal prediction architecture |
## Repo Structure
```
βββ # v1.0 Baseline Model
βββ xgboost_model.joblib, lightgbm_model.joblib, catboost_model.cbm
βββ ensemble_config.json, feature_columns.joblib
βββ figures/ # v1 plots
β
βββ # Phase 1 β Gottman Behavioral Model
βββ phase1_divorce_model/
β βββ divorce_xgb.joblib, divorce_lgb.joblib, divorce_cat.cbm
β βββ gottman_recipe.json # Dimension mappings + importance
β βββ gottman_mapping.joblib
β βββ figures/ # SHAP, confusion matrix, dimension importance
β
βββ # Phase 2 β Survival Model
βββ phase2_survival_model/
β βββ longevity_priors.json # Base rates by type/era/age/marriage#
β βββ survival_recipe.json # Cox PH + KM + timing distributions
β βββ figures/ # KM curves, Cox hazard ratios, timing
β
βββ # v2.0 Enhanced Model (RECOMMENDED)
βββ v2_enhanced/
β βββ enhanced_xgb.joblib, enhanced_lgb.joblib, enhanced_cat.cbm
β βββ enhanced_config.json # Weights, features, metrics, improvements
β βββ enhanced_feature_columns.joblib
β βββ figures/ # Comparison plots, SHAP
β
βββ # Training Scripts (fully reproducible)
βββ train_relationship_predictor.py # v1 baseline
βββ phase1_divorce_model.py # Gottman behavioral model
βββ phase2_marriage_duration.py # Survival analysis
βββ phase3_integration.py # Integration + comparison
```
## Limitations & Ethics
**Cohort bias.** The primary training signal is from Columbia University speed-daters in 2002β2004. This is a narrow demographic slice β predominantly educated, urban, US-based, early-internet-era. Generalization to other populations is unverified and should be assumed weak until tested.
**Celebrity bias in the survival priors.** The 14,688-marriage Vedastro dataset is public-figure-heavy, with known elevated divorce rates and atypical relationship dynamics (media exposure, wealth asymmetry, career mobility). The arranged-vs-love finding (23.4% vs 34.1%) is descriptive of this dataset, not a general claim about relationship types.
**Dataset provenance.** The Vedastro dataset was originally compiled for astrology research. This model uses only the structured marriage/divorce metadata (age at marriage, marriage number, duration, type, outcome) β no astrological variables are used as features.
**Short-horizon proxy.** Speed-dating captures initial match decisions, not long-term outcomes. The Gottman and survival layers partially bridge this gap, but they're proxies, not ground truth.
**Small Gottman sample.** The underlying divorce predictor was trained on 170 couples. The Four Horsemen framework itself is robust across decades of research; the proxy mapping from speed-dating features to Gottman dimensions is approximate and worth questioning.
**Not a decision tool.** Outputs are probabilistic, directional, and should be treated as a conversation starter β not advice. This model should not be used to make real decisions about real relationships.
## License
cc-by-nc-4.0 Research use. Based on publicly available academic datasets.
---
*Built with XGBoost, LightGBM, CatBoost, SHAP, lifelines, and scikit-learn.*
|