Home Credit - Credit Risk Model Stability

This repository contains the complete source code, pretrained models, and documentation for a State-of-the-Art (SOTA) credit risk stability model. The solution was developed for the Home Credit - Credit Risk Model Stability competition, focusing on predicting default probability while maintaining performance stability over time.

🏆 Key Performance

Metric	Score	Description
Stability Score	0.67701	Official competition metric (Gini with stability penalty).
AUC	0.8308	Raw predictive power (Single Fold).
Slope	~0.00	Performance degradation over time (near zero is ideal).

📂 Repository Structure

This project is organized as a production-ready Python package with clear separation of concerns.

.
├── models/                 # Pretrained CatBoost models (10GB+)
│   ├── catboost_fold_1.cbm
│   └── ...
├── src/                    # Core source code
│   ├── data/               # Polars-based data pipeline & aggregation logic
│   ├── features/           # Feature engineering & adversarial selection
│   ├── models/             # Trainer wrapper for CatBoost/LGBM
│   └── validation/         # Stability-aware cross-validation splitters
├── notebooks/              # Experimentation labs (Jupyter)
│   ├── 01_baseline...      # Initial feasibility study
│   ├── 02_feature...       # Deep feature engineering (Depth 0/1/2)
│   ├── 05_champion...      # FINAL Training script (GPU required)
│   └── ...
├── docs/                   # Detailed technical reports
│   └── reports/            # Technical evolution, summary, and appendices
├── training_artifacts/     # Logs and OOF predictions
└── verify_model.py         # Quick inference verification script

🚀 Getting Started

1. Prerequisites

Python 3.10+
NVIDIA GPU (Recommended for training, optional for inference)
RAM: 32GB+ (for full data processing)

2. Installation

Clone the repository and install dependencies. Note that this repo uses Git LFS for model weights.

# Install Git LFS first
git lfs install

# Clone repository
git clone https://huggingface.co/Lyes930/home-credit-risk-model-v1
cd home-credit-risk-model-v1

# Install Python dependencies
pip install -r requirements.txt

3. Data Preparation

Due to licensing, the raw dataset cannot be hosted here. Please download it from Kaggle:

Go to the Competition Data Page.
Download and unzip the data.
Place the csv_files or parquet_files folders inside a data/ directory in the root of this repo.

Structure should look like:

data/
  ├── parquet_files/
  │   ├── train/
  │   └── test/
  └── feature_definitions.csv

🛠 Usage

Inference (Verification)

To verify the pretrained models and run predictions on the training data (as a smoke test):

python verify_model.py

This script will load the models from the root directory, generate features, and output the AUC score.

Training (Retrain from Scratch)

If you have a GPU environment, you can reproduce the training process:

Open notebooks/05_champion_optimization.ipynb.
Ensure your data/ directory is populated.
Run all cells. This will train 5 folds of CatBoost models and save them.

🧠 Technical Highlights

This solution differentiates itself through robust engineering rather than complex ensembles:

Polars Data Engine: Replaced Pandas with Polars to handle 1.5M rows x 1600 columns with highly efficient memory usage (Lazy API).
Depth-2 Aggregation: Implemented a double-aggregation strategy (Payment -> Contract -> User) to capture deep historical credit behavior.
Adversarial Validation: Used a time-based discriminator to remove features that drift significantly over time, ensuring model stability.
No "Metric Hacking": We proved that artificial score reduction (hacking) hurts performance on robust models. We stuck to honest probabilities.

For a deep dive into the architectural decisions, please read the Technical Evolution Path.

🤝 Citation & Acknowledgements

If you use this code or ideas in your research, please cite:

@misc{home-credit-risk-v1,
  author = {Lyes930},
  title = {Home Credit - Credit Risk Model Stability Solution},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/Lyes930/home-credit-risk-model-v1}}
}

Special thanks to the Kaggle community and Home Credit Group for the challenging dataset.

Downloads last month: -; Downloads are not tracked for this model. How to track