Credit Scoring XGBoost

Summary

A baseline XGBoost model for binary credit risk classification trained on the Home Credit dataset. The model is intended for experimentation and educational use, not production.

Model Details

  • Model type: XGBoost classifier
  • Task: Binary classification (default risk)
  • Input: Tabular features engineered from Home Credit raw tables
  • Output: Probability of default
  • Artifact: credit_scoring_xgb.pkl

Intended Use

  • Demonstrate a credit scoring pipeline with MLflow tracking
  • Provide a lightweight baseline for experimentation

Limitations

  • Not validated for real-world credit decisions
  • No fairness or regulatory compliance audit
  • Performance depends on data preprocessing used in the training pipeline

How to Use

Load the artifact locally with joblib:

import joblib
model = joblib.load("credit_scoring_xgb.pkl")
preds = model.predict_proba(X)[:, 1]

Input Expectations

  • The model expects the same feature engineering used in the training pipeline.
  • See the project notebooks for the exact preprocessing steps and column names.

Reproducibility

  • Training code and experiments are tracked in the main project repo:
    • src/train.py
    • notebooks/01_data_preparation.ipynb
    • notebooks/03_model_comparison.ipynb

Training Data

Home Credit (public dataset). CSVs are tracked via Git LFS in the project repo.

Metrics

See the project repo notebooks for evaluation details and MLflow logs.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support