| --- |
| license: cc-by-nc-4.0 |
| library_name: joblib |
| pipeline_tag: tabular-classification |
| tags: |
| - relationships |
| - gottman |
| - survival-analysis |
| - cox-proportional-hazards |
| - xgboost |
| - lightgbm |
| - catboost |
| - shap |
| - ensemble |
| - tabular-classification |
| - couples |
| - social-science |
| datasets: |
| - mstz/speeddating |
| - vedastro-org/15000-Famous-People-Marriage-Divorce-Info |
| metrics: |
| - roc_auc |
| - accuracy |
| - f1 |
| model-index: |
| - name: relationship-longevity-predictor-v2 |
| results: |
| - task: |
| type: tabular-classification |
| name: Relationship Longevity Prediction |
| dataset: |
| name: Speed Dating + Gottman Divorce + Vedastro Marriages (composite) |
| type: custom |
| metrics: |
| - type: roc_auc |
| value: 0.8896 |
| name: AUC-ROC |
| - type: accuracy |
| value: 0.859 |
| name: Accuracy |
| - type: f1 |
| value: 0.630 |
| name: F1 |
| --- |
| |
| # π Relationship Longevity Predictor β v2.0 |
|
|
| **An ensemble ML model that predicts long-term relationship compatibility from two people's profiles, grounded in Gottman's Four Horsemen and Cox proportional hazards survival analysis.** |
|
|
| π **[Try the live demo β](https://huggingface.co/spaces/Builder-Neekhil/relationship-longevity-predictor-demo)** |
|
|
| --- |
|
|
| ## What this is (and isn't) |
|
|
| **Is:** A well-calibrated research artifact. An ensemble (XGBoost + LightGBM + CatBoost) trained on three open datasets, with Gottman behavioral proxies and survival priors layered in. Think of it as a **mirror** that reflects patterns the literature has documented β not a crystal ball. |
|
|
| **Isn't:** A decision tool. Don't break up, propose, or pick a partner based on its output. The interesting question isn't "what score did I get" β it's "which of the Four Horsemen showed up in my top factors, and why." |
|
|
| **Training data is narrow:** Columbia speed-daters (2002β2004), 170 Turkish couples from the YΓΆntem Gottman study, and 14,688 public-figure marriages pulled from a dataset originally compiled by Vedastro for unrelated research (we used only the marriage/divorce metadata β no astrological features). Generalization beyond these cohorts is unverified. See Limitations. |
|
|
| --- |
|
|
| ## π Headline Results |
|
|
| | Metric | v1.0 Baseline | v2.0 Enhanced | Change | |
| | --- | --- | --- | --- | |
| | **AUC-ROC** | 0.8842 | **0.8896** | +0.0055 β
| |
| | **AUC-PR** | 0.6933 | **0.7108** | +0.0175 β
| |
| | **Brier Score** | 0.0960 | **0.0934** | -0.0026 β
| |
| | **Accuracy** | 83.5% | **85.9%** | +2.4% β
| |
| | **F1 Score** | 0.620 | **0.630** | +0.010 β
| |
| | **Precision** | 52.4% | **58.9%** | +6.5% β
| |
|
|
| **Key improvement: +12.3% precision boost** β far fewer false positives than v1. |
|
|
| ### What Changed |
|
|
| v2.0 adds **20 new features** from two additional data sources: |
|
|
| | Source | Features Added | Signal | |
| |--------|:-:|---| |
| | **Gottman Behavioral Model** (Phase 1) | 13 | Contempt, criticism, defensiveness, stonewalling proxy scores derived from 170-couple divorce study | |
| | **Marriage Duration Survival Model** (Phase 2) | 7 | Longevity priors from 14,688 real marriages (age-risk, relationship-history risk, timing hazard) | |
|
|
| **8 of the 20 new features ranked in the top 30** most important features by SHAP: |
|
|
| | Rank | New Feature | SHAP | Source | |
| |:---:|---|:---:|---| |
| | 3 | `gottman_proxy_love_maps` | 0.447 | π΄ Gottman | |
| | 4 | `gottman_proxy_contempt_x_stonewalling` | 0.403 | π΄ Gottman | |
| | 8 | `gottman_proxy_ratio` | 0.306 | π΄ Gottman | |
| | 10 | `gottman_proxy_stonewalling` | 0.279 | π΄ Gottman | |
| | 12 | `gottman_proxy_horsemen` | 0.264 | π΄ Gottman | |
| | 21 | `gottman_proxy_net_risk` | 0.189 | π΄ Gottman | |
| | 27 | `survival_age_gap_risk` | 0.163 | π΅ Survival | |
| | 29 | `gottman_proxy_contempt` | 0.160 | π΄ Gottman | |
|
|
| --- |
|
|
| ## Phase 1: Gottman Behavioral Model |
|
|
| **Dataset:** YΓΆntem et al. Divorce Predictors β 170 married/divorced Turkish couples, 54 Gottman-mapped behavioral questions. |
|
|
| **Standalone performance:** AUC = **0.998**, Accuracy = **98.2%** on predicting divorce from behavioral patterns. |
|
|
| The 54 questions map to Gottman's relationship theory: |
|
|
| | Gottman Dimension | Questions | What It Measures | |
| |---|:-:|---| |
| | **Shared Goals** | Q1-Q10 | Aligned life direction, quality time, common objectives | |
| | **Love Maps** | Q11-Q20 | Values alignment, role expectations, compatibility beliefs | |
| | **Love Maps Deep** | Q21-Q30 | Knowing partner's inner world, stress, hopes, anxieties | |
| | **Criticism** | Q31-Q32, Q37-Q38 | Attacking character, negative statements, sudden arguments | |
| | **Contempt** | Q33-Q36, Q39-Q40 | Insults, humiliation, anger escalation, hatred | |
| | **Defensiveness** | Q41, Q45-Q46, Q48-Q50 | Blame-shifting, victimhood, refusing responsibility | |
| | **Stonewalling** | Q42-Q44, Q47 | Silence, withdrawal, leaving, shutting down | |
| | **Deep Contempt** | Q51-Q54 | Attributing meanness, vindictiveness, pathology to partner | |
|
|
| **Top divorce predictor by SHAP:** `love_maps Γ shared_goals` interaction β couples who *both* lack shared goals *and* don't know each other's inner world face the highest divorce risk. |
|
|
| ### Gottman Proxy Features (mapped to speed dating data) |
|
|
| Since speed dating participants didn't answer the 54 Gottman questions, we created **proxy scores** by mapping their existing personality/perception data to Gottman dimensions: |
|
|
| | Proxy | Derived From | |
| |---|---| |
| | `gottman_proxy_contempt` | Low mutual scores + high perception gaps | |
| | `gottman_proxy_criticism` | Misaligned values + asymmetric ratings | |
| | `gottman_proxy_defensiveness` | Self-rating inflation vs partner perception | |
| | `gottman_proxy_stonewalling` | Low engagement, low liking, no shared interests | |
| | `gottman_proxy_love_maps` | Interest correlation + shared interests + mutual perception accuracy | |
| | `gottman_proxy_shared_goals` | Value alignment + interest overlap | |
| | `gottman_proxy_ratio` | The famous Gottman 5:1 positive-to-negative ratio | |
|
|
| --- |
|
|
| ## Phase 2: Marriage Duration Survival Model |
|
|
| **Dataset:** [vedastro-org/15000-Famous-People-Marriage-Divorce-Info](https://hf.co/datasets/vedastro-org/15000-Famous-People-Marriage-Divorce-Info) β 14,688 marriage records from 12,353 famous people. |
|
|
| ### Key Findings |
|
|
| | Finding | Statistic | |
| |---|---| |
| | **Overall divorce rate** | 34.5% | |
| | **Median divorce timing** | 7 years | |
| | **Most dangerous period** | 3-7 years (41.1% of all divorces) | |
| | **Love marriage divorce rate** | 34.1% | |
| | **Arranged marriage divorce rate** | 23.4% (p=0.006, significantly lower) | |
| | **First marriage divorce rate** | 27.8% | |
| | **Subsequent marriage divorce rate** | **69.3%** | |
|
|
| ### Cox Proportional Hazards Model (Concordance = 0.64) |
|
|
| | Factor | Hazard Ratio | p-value | Meaning | |
| |---|:---:|:---:|---| |
| | **Is first marriage** | **0.26** | <0.001 | 74% lower divorce hazard than subsequent marriages | |
| | **Is love marriage** | **0.77** | 0.002 | 23% lower hazard than non-love marriages | |
| | **Age at marriage** | **0.96** | <0.001 | Each year older β 4% lower divorce hazard | |
| | **Marriage number** | **1.34** | <0.001 | Each additional marriage β 34% higher hazard | |
|
|
| ### Divorce Timing Distribution |
|
|
|  |
|
|
| ### Kaplan-Meier Survival Curves |
|
|
|  |
|  |
|
|
| --- |
|
|
| ## Model Architecture (v2.0) |
|
|
| **Ensemble of 3 gradient-boosted tree models** with **133 engineered features** (113 original + 13 Gottman + 7 survival): |
|
|
| | Model | Weight | v1 AUC | v2 AUC | Change | |
| |-------|:---:|:---:|:---:|:---:| |
| | XGBoost | 0.40 | 0.8852 | 0.8920 | +0.0068 | |
| | LightGBM | 0.35 | 0.8912 | **0.9011** | +0.0099 | |
| | CatBoost | 0.25 | 0.8661 | 0.8688 | +0.0027 | |
| | **Ensemble** | β | 0.8842 | **0.8896** | +0.0055 | |
|
|
| ## Visualizations |
|
|
| ### v1 vs v2 ROC Comparison |
|  |
|
|
| ### Metrics Comparison |
|  |
|
|
| ### Feature Source Contribution |
|  |
|
|
| ### Enhanced SHAP Summary (v2) |
|  |
|
|
| ### v1 Visualizations |
| | | | |
| |---|---| |
| |  |  | |
| |  |  | |
|
|
| --- |
|
|
| ## Training Data |
|
|
| | Dataset | Records | Role | |
| |---|:---:|---| |
| | [mstz/speeddating](https://hf.co/datasets/mstz/speeddating) | 1,048 encounters | Primary training data β individual profiles + match outcome | |
| | YΓΆntem et al. Divorce Predictors (Kaggle) | 170 couples | Phase 1 β Gottman behavioral feature engineering | |
| | [vedastro-org/15000-Famous-People-Marriage-Divorce-Info](https://hf.co/datasets/vedastro-org/15000-Famous-People-Marriage-Divorce-Info) | 14,688 marriages | Phase 2 β Longevity priors + survival analysis | |
|
|
| ## Literature Basis |
|
|
| | Paper | Contribution | |
| |-------|-------------| |
| | Grinsztajn et al. (NeurIPS 2022) β *"Why do tree-based models still outperform deep learning on tabular data?"* | Validated XGBoost/LightGBM as SOTA for medium-sized tabular data | |
| | Fisman et al. (QJE 2006) β *"Gender Differences in Mate Selection"* | Original speed dating experiment; ~70% accuracy with logistic regression | |
| | **Gottman & Silver (1999) β *"The Seven Principles for Making Marriage Work"*** | **Four Horsemen framework: contempt, criticism, defensiveness, stonewalling** | |
| | **YΓΆntem et al. (2019) β *"Divorce Prediction Using Correlation Based Feature Selection"*** | **54-question Gottman-mapped divorce predictor; published 97.7% accuracy** | |
| | Savcisens et al. (Nature Human Behaviour 2024) β *"Using Sequences of Life-events to Predict Human Lives"* | life2vec β longitudinal prediction architecture | |
|
|
| ## Repo Structure |
|
|
| ``` |
| βββ # v1.0 Baseline Model |
| βββ xgboost_model.joblib, lightgbm_model.joblib, catboost_model.cbm |
| βββ ensemble_config.json, feature_columns.joblib |
| βββ figures/ # v1 plots |
| β |
| βββ # Phase 1 β Gottman Behavioral Model |
| βββ phase1_divorce_model/ |
| β βββ divorce_xgb.joblib, divorce_lgb.joblib, divorce_cat.cbm |
| β βββ gottman_recipe.json # Dimension mappings + importance |
| β βββ gottman_mapping.joblib |
| β βββ figures/ # SHAP, confusion matrix, dimension importance |
| β |
| βββ # Phase 2 β Survival Model |
| βββ phase2_survival_model/ |
| β βββ longevity_priors.json # Base rates by type/era/age/marriage# |
| β βββ survival_recipe.json # Cox PH + KM + timing distributions |
| β βββ figures/ # KM curves, Cox hazard ratios, timing |
| β |
| βββ # v2.0 Enhanced Model (RECOMMENDED) |
| βββ v2_enhanced/ |
| β βββ enhanced_xgb.joblib, enhanced_lgb.joblib, enhanced_cat.cbm |
| β βββ enhanced_config.json # Weights, features, metrics, improvements |
| β βββ enhanced_feature_columns.joblib |
| β βββ figures/ # Comparison plots, SHAP |
| β |
| βββ # Training Scripts (fully reproducible) |
| βββ train_relationship_predictor.py # v1 baseline |
| βββ phase1_divorce_model.py # Gottman behavioral model |
| βββ phase2_marriage_duration.py # Survival analysis |
| βββ phase3_integration.py # Integration + comparison |
| ``` |
|
|
| ## Limitations & Ethics |
|
|
| **Cohort bias.** The primary training signal is from Columbia University speed-daters in 2002β2004. This is a narrow demographic slice β predominantly educated, urban, US-based, early-internet-era. Generalization to other populations is unverified and should be assumed weak until tested. |
|
|
| **Celebrity bias in the survival priors.** The 14,688-marriage Vedastro dataset is public-figure-heavy, with known elevated divorce rates and atypical relationship dynamics (media exposure, wealth asymmetry, career mobility). The arranged-vs-love finding (23.4% vs 34.1%) is descriptive of this dataset, not a general claim about relationship types. |
|
|
| **Dataset provenance.** The Vedastro dataset was originally compiled for astrology research. This model uses only the structured marriage/divorce metadata (age at marriage, marriage number, duration, type, outcome) β no astrological variables are used as features. |
|
|
| **Short-horizon proxy.** Speed-dating captures initial match decisions, not long-term outcomes. The Gottman and survival layers partially bridge this gap, but they're proxies, not ground truth. |
|
|
| **Small Gottman sample.** The underlying divorce predictor was trained on 170 couples. The Four Horsemen framework itself is robust across decades of research; the proxy mapping from speed-dating features to Gottman dimensions is approximate and worth questioning. |
|
|
| **Not a decision tool.** Outputs are probabilistic, directional, and should be treated as a conversation starter β not advice. This model should not be used to make real decisions about real relationships. |
|
|
| ## License |
|
|
| cc-by-nc-4.0 Research use. Based on publicly available academic datasets. |
|
|
| --- |
|
|
| *Built with XGBoost, LightGBM, CatBoost, SHAP, lifelines, and scikit-learn.* |
|
|