ash-coded-it's picture
Upload folder using huggingface_hub
1e3f942 verified

A newer version of the Gradio SDK is available: 6.12.0

Upgrade
metadata
title: Solar_Culient_Predictor
app_file: enhanced_app.py
sdk: gradio
sdk_version: 4.26.0

SOLAI Scoring Dashboard (Gradio)

A lightweight UI to train a baseline logistic regression on your solar leads dataset and generate probability_to_buy predictions. Uses the same feature candidates and preprocessing approach as scripts/batch_scoring.py.

  • Default dataset: examples/synthetic_v2
  • Outputs are always written to /Users/git/solai/scores and are also downloadable from the UI.

Features

  • Choose data source:
    • Use preset example data: examples/synthetic_v2/leads_features.csv and examples/synthetic_v2/outcomes.csv
    • Upload your own CSVs (features and outcomes)
  • Train + score with a single click
  • Evaluation metrics (test split):
    • ROC AUC, PR AUC, Brier score (gracefully handles degenerate label cases)
  • Preview:
    • predictions.csv (lead_id, probability_to_buy)
    • leads_features_scored.csv (features merged with probability_to_buy)
  • Download both files from the UI in addition to saving to disk (/Users/git/solai/scores)

Requirements

  • Python 3.9+ recommended
  • macOS (as per environment), should also work on Linux/Windows

Install dependencies (ideally in a virtual environment):

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r dashboard_gradio/requirements.txt

Run the App

python dashboard_gradio/app.py

Gradio will launch on a local URL (typically http://127.0.0.1:7860). Open it in your browser.

Usage

  1. Start the app.
  2. Select a data source:
    • Default: “Use example synthetic_v2”
    • Or switch to “Upload CSVs” and provide:
      • Features CSV (must include lead_id and a subset of feature columns listed below)
      • Outcomes CSV (must include lead_id and sold columns)
  3. Click “Train + Score”.
  4. Review metrics and preview tables.
  5. Download the generated files or find them on disk under /Users/git/solai/scores.

Expected Columns

  • Features CSV must contain lead_id and some subset of these candidate features:

    • living_area_sqft
    • average_monthly_kwh
    • average_monthly_bill_usd
    • shading_factor
    • roof_suitability_score
    • seasonality_index
    • electric_panel_amperage
    • has_pool
    • is_remote_worker_household
    • tdsp
    • rate_structure
    • credit_score_range
    • household_income_bracket
    • preferred_financing_type
    • neighborhood_type
  • Outcomes CSV must contain:

    • lead_id
    • sold (0/1)

Outputs

Saved to /Users/git/solai/scores with a timestamp suffix:

  • predictions_YYYYMMDD_HHMMSS.csv
    • Columns: lead_id, probability_to_buy
  • leads_features_scored_YYYYMMDD_HHMMSS.csv
    • Original features merged with probability_to_buy

Both files are also offered as downloads directly in the UI.

Notes and Troubleshooting

  • If the outcomes data has only a single class (all sold=0 or all sold=1), ROC AUC and PR AUC are undefined; the app shows “N/A” for those metrics but still computes Brier score and produces predictions.
  • If you see “No candidate features found”, ensure your features CSV contains at least one of the listed feature names.
  • If port 7860 is in use, Gradio will choose another port automatically, displayed in the terminal.
  • For larger datasets, training time may increase but should remain quick for typical CSV sizes.

Development

  • Core logic is in dashboard_gradio/app.py.
  • The pipeline mirrors scripts/batch_scoring.py: ColumnTransformer with passthrough numeric features and OneHotEncoder for categoricals, then LogisticRegression.
  • Extend easily with additional visualizations (e.g., calibration plots), feature importance, or a data dictionary viewer.