SuperKart Sales Predictor

A scikit-learn pipeline Random Forest (tuned) trained to forecast Product_Store_Sales_Total for a retail chain operating supermarkets and food marts across Tier 1-3 cities.

Model Description

  • Model type: Random Forest (tuned)
  • Task: Regression (continuous sales revenue prediction)
  • Framework: scikit-learn Pipeline with embedded preprocessing
  • Training size: 7,010 product-store observations
  • Features: 10 raw features (numeric + categorical)

The pipeline bundles preprocessing (StandardScaler on numerics + OneHotEncoder on categoricals) with the trained regressor. Raw input goes in, prediction comes out -- no external preprocessing required.

Intended Use

Primary use case: Per-product, per-store sales forecasting to drive inventory procurement, shelf-space allocation, and regional sales strategy decisions.

Out of scope:

  • Predicting sales for novel store formats not seen in training (only 4 stores in the training set).
  • True time-series forecasting -- this model predicts expected per-product revenue given product and store attributes, not temporal trends.
  • Sensitive decisioning where a single prediction drives irreversible outcomes.

Performance

Evaluated on a held-out 20% test set (n=1,753):

Metric Value
RMSE 277.28
MAE 107.66
0.9326
MAPE 0.0391

How to Use

from huggingface_hub import hf_hub_download
import joblib
import pandas as pd

# Download and load the pipeline
model_path = hf_hub_download(
    repo_id="jeremygracey-ai/superkart-sales-predictor",
    filename="superkart_model.joblib"
)
model = joblib.load(model_path)

# Prepare input -- all 10 features required
sample = pd.DataFrame([{
    "Product_Weight": 12.5,
    "Product_Sugar_Content": "Low Sugar",
    "Product_Allocated_Area": 0.05,
    "Product_Type": "Dairy",
    "Product_MRP": 150.0,
    "Store_Size": "Medium",
    "Store_Location_City_Type": "Tier 2",
    "Store_Type": "Supermarket Type2",
    "Product_Category": "Food",
    "Store_Age": 0,
}])

prediction = model.predict(sample)[0]
print(f"Predicted sales: ${prediction:,.2f}")

Training Data

The model was trained on SuperKart retail sales data covering 8,763 product-store observations across 4 stores (Departmental Store, Supermarket Type 1, Supermarket Type 2, Food Mart) in Tier 1-3 cities. Target variable: Product_Store_Sales_Total (continuous, range ~$33 - $8,000).

Feature Engineering

  • Product_Category derived from Product_Id prefix (FD=Food, NC=Non-Consumable, DR=Drinks)
  • Store_Age derived from Store_Establishment_Year (reference: 2009)
  • Product_Sugar_Content normalized (collapsed "reg" into "Regular", set non-consumables to "No Sugar")
  • Product_Id, Store_Id, Store_Establishment_Year dropped to prevent identity leakage / enable generalization

Limitations

  • Trained on only 4 distinct stores -- predictions for novel store configurations are extrapolation, not forecasting.
  • Snapshot data, not time series -- no seasonality or trend features.
  • No demographic, foot-traffic, or promotional data included.

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

  • Root Mean Squared Error on SuperKart retail sales
    self-reported
    277.280
  • Mean Absolute Error on SuperKart retail sales
    self-reported
    107.660
  • R-squared on SuperKart retail sales
    self-reported
    0.933
  • Mean Absolute Percentage Error on SuperKart retail sales
    self-reported
    0.039