--- title: UnsupervisedCustumerPrediction emoji: 🧩 colorFrom: indigo colorTo: blue sdk: docker app_port: 8501 tags: - streamlit pinned: false short_description: Streamlit app that predicts cluster labels from uploaded CSV license: mit --- # 🧩 Clustering Predictor (KMeans / GMM) This Space predicts cluster labels for uploaded tabular data using a saved preprocessing pipeline: - **StandardScaler** - **PCA (95% explained variance)** - A clustering model (**KMeans** or **Gaussian Mixture Model**) ## ✅ What this app does - Upload a CSV file - The app checks required feature columns - Applies **scaler + PCA** - Outputs **Predicted** cluster label for each row - Lets you download the predictions as a CSV ## 📦 Required files (must be in the repo root) Place these files next to `app.py`: - `feature_names.pkl` - `scaler.pkl` - `pca.pkl` - `kmeans_model_k9.pkl` *(optional, if you want KMeans)* - `gmm_model_k9.pkl` *(optional, if you want GMM)* ## 🧾 Input format Your CSV must include all feature columns stored in `feature_names.pkl`. Optional: - You may include an `id` or `Id` column. If present, it will be included in the output as `Id`. ## ▶️ Run locally ```bash pip install -r requirements.txt streamlit run app.py 📝 Notes This is an unsupervised project, so cluster quality is evaluated on Kaggle using the leaderboard score. Visual separation in 2D does not always reflect the Kaggle metric.