🏥 Hospital Readmission Prediction (Logistic Regression)

Author: Isaac Tosin Adisa

📌 Overview

This model predicts 30-day hospital readmission risk using structured clinical features derived from the MIMIC-IV dataset. It serves as the linear baseline in an integrated multi-model comparative framework alongside XGBoost and LightGBM, designed to evaluate the trade-offs between model complexity, predictive performance, calibration quality, explainability, and subgroup fairness.

The model outputs calibrated probabilities suitable for downstream clinical risk stratification workflows. As a logistic regression model, it offers strong interpretability by design — coefficients map directly to feature-level log-odds, making it transparent and auditable without requiring post-hoc explanation tools.

This model is released alongside a fully reproducible pipeline and open-source implementation to facilitate independent validation and reuse.

📊 Dataset

Property	Value
Source	MIMIC-IV (v2.2)
Total admissions	415,231
30-day readmission prevalence	~18%
Feature count	26 structured clinical features
Split	Train / Validation / Test (temporal split)

Features include demographics, admission type, primary diagnosis category, comorbidity burden (Elixhauser), length of stay, lab value summaries, procedure counts, and prior utilization history.

⚙️ Training

Setting	Value
Framework	scikit-learn
Solver	lbfgs
Regularization	L2 (tuned via cross-validation)
Class imbalance	class_weight="balanced"
Feature scaling	StandardScaler (applied pre-fit)
Calibration	Platt scaling (post-hoc)

📈 Performance

Metric	Value	Notes
AUC-ROC	~0.67	Linear baseline discrimination

📊 Logistic Regression serves as the interpretable linear baseline in this framework. Its performance provides a lower-bound reference for evaluating the marginal gains of tree-based models (XGBoost, LightGBM) against the cost of reduced transparency.

🔍 Explainability

Logistic Regression is inherently interpretable — no post-hoc explanation method is required.

Feature coefficients directly encode the direction and magnitude of each variable's contribution
Odds ratios can be derived directly from model weights
Compatible with standard clinical audit and regulatory review workflows

⚖️ Fairness Evaluation

The model was evaluated across 16 demographic and clinical subgroups, including stratifications by age group, sex, race/ethnicity, insurance type, and admission source.

All subgroups satisfy the following thresholds:

Metric	Threshold
ΔAUC (vs. overall)	≤ 0.05
ΔFNR (vs. overall)	≤ 0.10

No subgroup exhibited clinically meaningful performance degradation under these criteria.

🚀 Usage

import joblib
import numpy as np

# Load model
model = joblib.load("logreg.pkl")

# Replace with your 26 clinical features (must be StandardScaler-transformed)
X = np.array([[...]])

# Returns 30-day readmission probability
pred = model.predict_proba(X)[0][1]

print(f"Readmission risk: {pred:.3f}")

⚠️ Input features must be scaled using the same StandardScaler fitted during training before inference. See the repository for the full feature schema and preprocessing pipeline.

🎯 Intended Use

Linear baseline benchmarking against tree-based models
Clinical ML interpretability research
Demonstration of explainable and fair AI systems
Reproducibility and model comparison

⚠️ Limitations

Linear model — logistic regression cannot capture non-linear feature interactions present in complex clinical data; tree-based models may outperform it on discrimination metrics.
Retrospective validation only — model was trained and evaluated on historical MIMIC-IV data; prospective validation has not been performed.
Single institution — MIMIC-IV reflects one academic medical center (BIDMC); generalizability to other institutions requires local validation.
No causal claims — feature associations do not imply clinical causation.
Requires local validation before any deployment in a clinical decision support context.
Credentialed dataset — MIMIC-IV requires PhysioNet credentialing; this model card does not distribute the underlying data.

🔗 Links

📄 Paper: arXiv:2604.22535
💻 Code: github.com/Tomisin92/readmission-prediction

📜 Citation

@misc{adisa2025readmission,
  title={Hospital Readmission Prediction with Explainability and Fairness},
  author={Adisa, Isaac Tosin},
  year={2026},
  eprint={2604.22535},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

License

This model is released under the MIT License. The underlying MIMIC-IV dataset is subject to its own PhysioNet credentialed access agreement.

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for IsaacT1992/hospital-readmission-logistic-regression

An Integrated Framework for Explainable, Fair, and Observable Hospital Readmission Prediction: Development and Validation on MIMIC-IV

Paper • 2604.22535 • Published 14 days ago

Evaluation results

ROC AUC on MIMIC-IV
self-reported

0.670