EnYa32's picture
Update README.md
360f4b3 verified
metadata
title: UnsupervisedCustumerPrediction
emoji: 🧩
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Streamlit app that predicts cluster labels from uploaded CSV
license: mit

🧩 Clustering Predictor (KMeans / GMM)

This Space predicts cluster labels for uploaded tabular data using a saved preprocessing pipeline:

  • StandardScaler
  • PCA (95% explained variance)
  • A clustering model (KMeans or Gaussian Mixture Model)

✅ What this app does

  • Upload a CSV file
  • The app checks required feature columns
  • Applies scaler + PCA
  • Outputs Predicted cluster label for each row
  • Lets you download the predictions as a CSV

📦 Required files (must be in the repo root)

Place these files next to app.py:

  • feature_names.pkl
  • scaler.pkl
  • pca.pkl
  • kmeans_model_k9.pkl (optional, if you want KMeans)
  • gmm_model_k9.pkl (optional, if you want GMM)

🧾 Input format

Your CSV must include all feature columns stored in feature_names.pkl.

Optional:

  • You may include an id or Id column.
    If present, it will be included in the output as Id.

▶️ Run locally

pip install -r requirements.txt
streamlit run app.py
📝 Notes
This is an unsupervised project, so cluster quality is evaluated on Kaggle using the leaderboard score.

Visual separation in 2D does not always reflect the Kaggle metric.