YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

๐Ÿ” Fraud Detection System for Financial Transactions

A comprehensive end-to-end fraud detection system using machine learning, featuring 10 models, explainability analysis, and a production-ready API.

๐Ÿ“Š Results Summary

Model Precision Recall F1 ROC-AUC PR-AUC MCC
XGBoost โญ 0.9048 0.8028 0.8507 0.9735 0.8166 0.8520
Voting Ensemble 0.8636 0.8028 0.8321 0.9783 0.8007 0.8324
LightGBM (Tuned) 0.7073 0.8169 0.7582 0.9318 0.7958 0.7597
XGBoost (Tuned) 0.8382 0.8028 0.8201 0.9697 0.7929 0.8200
RF (Tuned) 0.8730 0.7746 0.8209 0.9675 0.7926 0.8221
Random Forest 0.8333 0.7746 0.8029 0.9526 0.7710 0.8031
MLP 0.6914 0.7887 0.7368 0.9433 0.7522 0.7380
Logistic Regression 0.0488 0.8873 0.0924 0.9615 0.7350 0.2042
Autoencoder 0.0033 1.0000 0.0067 0.9604 0.0442 0.0409

Best Model: XGBoost โ€” PR-AUC: 0.8166, F1: 0.8507 (0.8636 with threshold=0.55)

๐Ÿ—๏ธ System Architecture

Architecture

๐Ÿ“ Project Structure

fraud_detection/
โ”œโ”€โ”€ config.py                    # Configuration settings
โ”œโ”€โ”€ eda.py                       # Exploratory Data Analysis
โ”œโ”€โ”€ preprocessing.py             # Feature engineering & splitting
โ”œโ”€โ”€ train_all.py                 # Model training pipeline
โ”œโ”€โ”€ evaluation.py                # Comprehensive evaluation
โ”œโ”€โ”€ explainability.py            # SHAP & LIME analysis
โ”œโ”€โ”€ error_analysis.py            # FN/FP & drift analysis
โ”œโ”€โ”€ ae_model.py                  # Autoencoder model classes
โ”œโ”€โ”€ architecture.py              # Architecture diagram generator
โ”œโ”€โ”€ generate_pdf.py              # PDF paper generator
โ”œโ”€โ”€ requirements.txt             # Python dependencies
โ”œโ”€โ”€ api/
โ”‚   โ””โ”€โ”€ app.py                   # FastAPI production endpoint
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ all_models.joblib        # All trained models
โ”‚   โ”œโ”€โ”€ all_models_with_ae.joblib
โ”‚   โ”œโ”€โ”€ autoencoder.pt           # PyTorch autoencoder weights
โ”‚   โ”œโ”€โ”€ scaler.joblib            # Fitted RobustScaler
โ”‚   โ””โ”€โ”€ tuning_results.joblib    # Optuna best params
โ”œโ”€โ”€ figures/                     # All figures (PNG + PDF, 300 DPI)
โ”‚   โ”œโ”€โ”€ class_distribution.*
โ”‚   โ”œโ”€โ”€ amount_analysis.*
โ”‚   โ”œโ”€โ”€ time_analysis.*
โ”‚   โ”œโ”€โ”€ correlation_heatmap.*
โ”‚   โ”œโ”€โ”€ feature_distributions.*
โ”‚   โ”œโ”€โ”€ roc_curves.*
โ”‚   โ”œโ”€โ”€ pr_curves.*
โ”‚   โ”œโ”€โ”€ confusion_matrices.*
โ”‚   โ”œโ”€โ”€ threshold_analysis.*
โ”‚   โ”œโ”€โ”€ feature_importance.*
โ”‚   โ”œโ”€โ”€ shap_summary.*
โ”‚   โ”œโ”€โ”€ shap_top10.*
โ”‚   โ”œโ”€โ”€ lime_explanation.*
โ”‚   โ”œโ”€โ”€ error_analysis.*
โ”‚   โ”œโ”€โ”€ architecture_diagram.*
โ”‚   โ”œโ”€โ”€ model_comparison.csv
โ”‚   โ”œโ”€โ”€ business_impact.csv
โ”‚   โ””โ”€โ”€ shap_feature_importance.csv
โ”œโ”€โ”€ paper/
โ”‚   โ”œโ”€โ”€ fraud_detection_paper.tex  # IEEE LaTeX source
โ”‚   โ””โ”€โ”€ fraud_detection_paper.pdf  # Compiled PDF
โ””โ”€โ”€ data/
    โ”œโ”€โ”€ creditcard.csv             # Raw dataset
    โ”œโ”€โ”€ processed_data.joblib      # Preprocessed data
    โ””โ”€โ”€ evaluation_results.joblib  # Evaluation results

๐Ÿš€ Quick Start

Installation

pip install -r requirements.txt

Run Full Pipeline

# 1. EDA
python eda.py

# 2. Preprocessing
python preprocessing.py

# 3. Training
python train_all.py

# 4. Evaluation
python evaluation.py

# 5. Explainability
python explainability.py

# 6. Error Analysis
python error_analysis.py

Run API

cd fraud_detection
uvicorn api.app:app --host 0.0.0.0 --port 8000

API Usage

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "Time": 406.0,
    "V1": -2.312, "V2": 1.951, "V3": -1.609, "V4": 3.997,
    "V5": -0.522, "V6": -1.426, "V7": -2.537, "V8": 1.391,
    "V9": -2.770, "V10": -2.772, "V11": 3.202, "V12": -2.899,
    "V13": -0.595, "V14": -4.289, "V15": 0.389, "V16": -1.140,
    "V17": -2.830, "V18": -0.016, "V19": 0.416, "V20": 0.126,
    "V21": 0.517, "V22": -0.035, "V23": -0.465, "V24": -0.018,
    "V25": -0.010, "V26": -0.002, "V27": -0.154, "V28": -0.048,
    "Amount": 239.93
  }'

Response:

{
  "transaction_id": "TXN-1714297654321",
  "fraud_probability": 0.999943,
  "decision": "BLOCKED - SUSPECTED FRAUD",
  "risk_level": "CRITICAL",
  "top_risk_factors": [...],
  "response_time_ms": 5.62,
  "threshold_used": 0.55,
  "model_used": "XGBoost (Optimized)"
}

๐Ÿ“ˆ Key Findings

5 Key Observations from EDA

  1. Extreme Class Imbalance: Only 0.173% fraud (1:577 ratio)
  2. Amount Patterns: Fraud mean $122.21 (median $9.25) vs legit mean $88.29
  3. Temporal Patterns: Night fraud rate 0.518% vs day 0.137%
  4. Key Features: V17, V14, V12 most negatively correlated with fraud
  5. Data Quality: No missing values, 1,081 duplicates removed

Business Impact (Test Set)

  • XGBoost catches 80.3% of fraud with only 6 false positives
  • Net savings: $6,936 on test set
  • API response time: <10ms average (P95: 9.27ms)

Threshold Optimization

  • Default threshold (0.5): F1 = 0.8507
  • Optimal threshold (0.55): F1 = 0.8636 (+1.5% improvement)

๐Ÿ”ฌ Explainability

Top 10 Features (SHAP Analysis)

  1. V4 (Mean |SHAP| = 1.913)
  2. V14 (1.843)
  3. PCA_magnitude (1.113)
  4. V12 (0.834)
  5. V3 (0.749)
  6. V11 (0.638)
  7. V10 (0.582)
  8. V8 (0.516)
  9. V10_V14_interaction (0.513)
  10. V15 (0.454)

๐Ÿ”ฎ Future Scope

  • Graph Neural Networks for fraud ring detection
  • Real-time streaming with Apache Kafka
  • Federated Learning across banks
  • LLM-generated compliance explanations
  • Temporal modeling with Transformers

๐Ÿ“ IEEE Paper

Full research paper available in paper/ directory:

  • LaTeX source: paper/fraud_detection_paper.tex
  • Compiled PDF: paper/fraud_detection_paper.pdf

๐Ÿ“Š Dataset

European Cardholder Credit Card Fraud Detection โ€” 284,807 transactions with 492 fraud cases (0.173%).

๐Ÿ“œ License

MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support