Home Credit - Credit Risk Model Stability

Banner Framework Status

This repository contains the complete source code, pretrained models, and documentation for a State-of-the-Art (SOTA) credit risk stability model. The solution was developed for the Home Credit - Credit Risk Model Stability competition, focusing on predicting default probability while maintaining performance stability over time.

πŸ† Key Performance

Metric Score Description
Stability Score 0.67701 Official competition metric (Gini with stability penalty).
AUC 0.8308 Raw predictive power (Single Fold).
Slope ~0.00 Performance degradation over time (near zero is ideal).

πŸ“‚ Repository Structure

This project is organized as a production-ready Python package with clear separation of concerns.

.
β”œβ”€β”€ models/                 # Pretrained CatBoost models (10GB+)
β”‚   β”œβ”€β”€ catboost_fold_1.cbm
β”‚   └── ...
β”œβ”€β”€ src/                    # Core source code
β”‚   β”œβ”€β”€ data/               # Polars-based data pipeline & aggregation logic
β”‚   β”œβ”€β”€ features/           # Feature engineering & adversarial selection
β”‚   β”œβ”€β”€ models/             # Trainer wrapper for CatBoost/LGBM
β”‚   └── validation/         # Stability-aware cross-validation splitters
β”œβ”€β”€ notebooks/              # Experimentation labs (Jupyter)
β”‚   β”œβ”€β”€ 01_baseline...      # Initial feasibility study
β”‚   β”œβ”€β”€ 02_feature...       # Deep feature engineering (Depth 0/1/2)
β”‚   β”œβ”€β”€ 05_champion...      # FINAL Training script (GPU required)
β”‚   └── ...
β”œβ”€β”€ docs/                   # Detailed technical reports
β”‚   └── reports/            # Technical evolution, summary, and appendices
β”œβ”€β”€ training_artifacts/     # Logs and OOF predictions
└── verify_model.py         # Quick inference verification script

πŸš€ Getting Started

1. Prerequisites

  • Python 3.10+
  • NVIDIA GPU (Recommended for training, optional for inference)
  • RAM: 32GB+ (for full data processing)

2. Installation

Clone the repository and install dependencies. Note that this repo uses Git LFS for model weights.

# Install Git LFS first
git lfs install

# Clone repository
git clone https://huggingface.co/Lyes930/home-credit-risk-model-v1
cd home-credit-risk-model-v1

# Install Python dependencies
pip install -r requirements.txt

3. Data Preparation

Due to licensing, the raw dataset cannot be hosted here. Please download it from Kaggle:

  1. Go to the Competition Data Page.
  2. Download and unzip the data.
  3. Place the csv_files or parquet_files folders inside a data/ directory in the root of this repo.

Structure should look like:

data/
  β”œβ”€β”€ parquet_files/
  β”‚   β”œβ”€β”€ train/
  β”‚   └── test/
  └── feature_definitions.csv

πŸ›  Usage

Inference (Verification)

To verify the pretrained models and run predictions on the training data (as a smoke test):

python verify_model.py

This script will load the models from the root directory, generate features, and output the AUC score.

Training (Retrain from Scratch)

If you have a GPU environment, you can reproduce the training process:

  1. Open notebooks/05_champion_optimization.ipynb.
  2. Ensure your data/ directory is populated.
  3. Run all cells. This will train 5 folds of CatBoost models and save them.

🧠 Technical Highlights

This solution differentiates itself through robust engineering rather than complex ensembles:

  1. Polars Data Engine: Replaced Pandas with Polars to handle 1.5M rows x 1600 columns with highly efficient memory usage (Lazy API).
  2. Depth-2 Aggregation: Implemented a double-aggregation strategy (Payment -> Contract -> User) to capture deep historical credit behavior.
  3. Adversarial Validation: Used a time-based discriminator to remove features that drift significantly over time, ensuring model stability.
  4. No "Metric Hacking": We proved that artificial score reduction (hacking) hurts performance on robust models. We stuck to honest probabilities.

For a deep dive into the architectural decisions, please read the Technical Evolution Path.


🀝 Citation & Acknowledgements

If you use this code or ideas in your research, please cite:

@misc{home-credit-risk-v1,
  author = {Lyes930},
  title = {Home Credit - Credit Risk Model Stability Solution},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/Lyes930/home-credit-risk-model-v1}}
}

Special thanks to the Kaggle community and Home Credit Group for the challenging dataset.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support