πŸ₯ Hospital Readmission Prediction (Logistic Regression)

Author: Isaac Tosin Adisa

πŸ“Œ Overview

This model predicts 30-day hospital readmission risk using structured clinical features derived from the MIMIC-IV dataset. It serves as the linear baseline in an integrated multi-model comparative framework alongside XGBoost and LightGBM, designed to evaluate the trade-offs between model complexity, predictive performance, calibration quality, explainability, and subgroup fairness.

The model outputs calibrated probabilities suitable for downstream clinical risk stratification workflows. As a logistic regression model, it offers strong interpretability by design β€” coefficients map directly to feature-level log-odds, making it transparent and auditable without requiring post-hoc explanation tools.

This model is released alongside a fully reproducible pipeline and open-source implementation to facilitate independent validation and reuse.

πŸ“Š Dataset

Property Value
Source MIMIC-IV (v2.2)
Total admissions 415,231
30-day readmission prevalence ~18%
Feature count 26 structured clinical features
Split Train / Validation / Test (temporal split)

Features include demographics, admission type, primary diagnosis category, comorbidity burden (Elixhauser), length of stay, lab value summaries, procedure counts, and prior utilization history.

βš™οΈ Training

Setting Value
Framework scikit-learn
Solver lbfgs
Regularization L2 (tuned via cross-validation)
Class imbalance class_weight="balanced"
Feature scaling StandardScaler (applied pre-fit)
Calibration Platt scaling (post-hoc)

πŸ“ˆ Performance

Metric Value Notes
AUC-ROC ~0.67 Linear baseline discrimination

πŸ“Š Logistic Regression serves as the interpretable linear baseline in this framework. Its performance provides a lower-bound reference for evaluating the marginal gains of tree-based models (XGBoost, LightGBM) against the cost of reduced transparency.

πŸ” Explainability

Logistic Regression is inherently interpretable β€” no post-hoc explanation method is required.

  • Feature coefficients directly encode the direction and magnitude of each variable's contribution
  • Odds ratios can be derived directly from model weights
  • Compatible with standard clinical audit and regulatory review workflows

βš–οΈ Fairness Evaluation

The model was evaluated across 16 demographic and clinical subgroups, including stratifications by age group, sex, race/ethnicity, insurance type, and admission source.

All subgroups satisfy the following thresholds:

Metric Threshold
Ξ”AUC (vs. overall) ≀ 0.05
Ξ”FNR (vs. overall) ≀ 0.10

No subgroup exhibited clinically meaningful performance degradation under these criteria.

πŸš€ Usage

import joblib
import numpy as np

# Load model
model = joblib.load("logreg.pkl")

# Replace with your 26 clinical features (must be StandardScaler-transformed)
X = np.array([[...]])

# Returns 30-day readmission probability
pred = model.predict_proba(X)[0][1]

print(f"Readmission risk: {pred:.3f}")

⚠️ Input features must be scaled using the same StandardScaler fitted during training before inference. See the repository for the full feature schema and preprocessing pipeline.

🎯 Intended Use

  • Linear baseline benchmarking against tree-based models
  • Clinical ML interpretability research
  • Demonstration of explainable and fair AI systems
  • Reproducibility and model comparison

⚠️ Limitations

  • Linear model β€” logistic regression cannot capture non-linear feature interactions present in complex clinical data; tree-based models may outperform it on discrimination metrics.
  • Retrospective validation only β€” model was trained and evaluated on historical MIMIC-IV data; prospective validation has not been performed.
  • Single institution β€” MIMIC-IV reflects one academic medical center (BIDMC); generalizability to other institutions requires local validation.
  • No causal claims β€” feature associations do not imply clinical causation.
  • Requires local validation before any deployment in a clinical decision support context.
  • Credentialed dataset β€” MIMIC-IV requires PhysioNet credentialing; this model card does not distribute the underlying data.

πŸ”— Links

πŸ“œ Citation

@misc{adisa2025readmission,
  title={Hospital Readmission Prediction with Explainability and Fairness},
  author={Adisa, Isaac Tosin},
  year={2026},
  eprint={2604.22535},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

License

This model is released under the MIT License. The underlying MIMIC-IV dataset is subject to its own PhysioNet credentialed access agreement.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for IsaacT1992/hospital-readmission-logistic-regression

Evaluation results