π₯ Hospital Readmission Prediction (Logistic Regression)
Author: Isaac Tosin Adisa
π Overview
This model predicts 30-day hospital readmission risk using structured clinical features derived from the MIMIC-IV dataset. It serves as the linear baseline in an integrated multi-model comparative framework alongside XGBoost and LightGBM, designed to evaluate the trade-offs between model complexity, predictive performance, calibration quality, explainability, and subgroup fairness.
The model outputs calibrated probabilities suitable for downstream clinical risk stratification workflows. As a logistic regression model, it offers strong interpretability by design β coefficients map directly to feature-level log-odds, making it transparent and auditable without requiring post-hoc explanation tools.
This model is released alongside a fully reproducible pipeline and open-source implementation to facilitate independent validation and reuse.
π Dataset
| Property | Value |
|---|---|
| Source | MIMIC-IV (v2.2) |
| Total admissions | 415,231 |
| 30-day readmission prevalence | ~18% |
| Feature count | 26 structured clinical features |
| Split | Train / Validation / Test (temporal split) |
Features include demographics, admission type, primary diagnosis category, comorbidity burden (Elixhauser), length of stay, lab value summaries, procedure counts, and prior utilization history.
βοΈ Training
| Setting | Value |
|---|---|
| Framework | scikit-learn |
| Solver | lbfgs |
| Regularization | L2 (tuned via cross-validation) |
| Class imbalance | class_weight="balanced" |
| Feature scaling | StandardScaler (applied pre-fit) |
| Calibration | Platt scaling (post-hoc) |
π Performance
| Metric | Value | Notes |
|---|---|---|
| AUC-ROC | ~0.67 | Linear baseline discrimination |
π Logistic Regression serves as the interpretable linear baseline in this framework. Its performance provides a lower-bound reference for evaluating the marginal gains of tree-based models (XGBoost, LightGBM) against the cost of reduced transparency.
π Explainability
Logistic Regression is inherently interpretable β no post-hoc explanation method is required.
- Feature coefficients directly encode the direction and magnitude of each variable's contribution
- Odds ratios can be derived directly from model weights
- Compatible with standard clinical audit and regulatory review workflows
βοΈ Fairness Evaluation
The model was evaluated across 16 demographic and clinical subgroups, including stratifications by age group, sex, race/ethnicity, insurance type, and admission source.
All subgroups satisfy the following thresholds:
| Metric | Threshold |
|---|---|
| ΞAUC (vs. overall) | β€ 0.05 |
| ΞFNR (vs. overall) | β€ 0.10 |
No subgroup exhibited clinically meaningful performance degradation under these criteria.
π Usage
import joblib
import numpy as np
# Load model
model = joblib.load("logreg.pkl")
# Replace with your 26 clinical features (must be StandardScaler-transformed)
X = np.array([[...]])
# Returns 30-day readmission probability
pred = model.predict_proba(X)[0][1]
print(f"Readmission risk: {pred:.3f}")
β οΈ Input features must be scaled using the same
StandardScalerfitted during training before inference. See the repository for the full feature schema and preprocessing pipeline.
π― Intended Use
- Linear baseline benchmarking against tree-based models
- Clinical ML interpretability research
- Demonstration of explainable and fair AI systems
- Reproducibility and model comparison
β οΈ Limitations
- Linear model β logistic regression cannot capture non-linear feature interactions present in complex clinical data; tree-based models may outperform it on discrimination metrics.
- Retrospective validation only β model was trained and evaluated on historical MIMIC-IV data; prospective validation has not been performed.
- Single institution β MIMIC-IV reflects one academic medical center (BIDMC); generalizability to other institutions requires local validation.
- No causal claims β feature associations do not imply clinical causation.
- Requires local validation before any deployment in a clinical decision support context.
- Credentialed dataset β MIMIC-IV requires PhysioNet credentialing; this model card does not distribute the underlying data.
π Links
- π Paper: arXiv:2604.22535
- π» Code: github.com/Tomisin92/readmission-prediction
π Citation
@misc{adisa2025readmission,
title={Hospital Readmission Prediction with Explainability and Fairness},
author={Adisa, Isaac Tosin},
year={2026},
eprint={2604.22535},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
License
This model is released under the MIT License. The underlying MIMIC-IV dataset is subject to its own PhysioNet credentialed access agreement.
Paper for IsaacT1992/hospital-readmission-logistic-regression
Evaluation results
- ROC AUC on MIMIC-IVself-reported0.670