SuperKart Sales Predictor

A scikit-learn pipeline Random Forest (tuned) trained to forecast Product_Store_Sales_Total for a retail chain operating supermarkets and food marts across Tier 1-3 cities.

Model Description

Model type: Random Forest (tuned)
Task: Regression (continuous sales revenue prediction)
Framework: scikit-learn Pipeline with embedded preprocessing
Training size: 7,010 product-store observations
Features: 10 raw features (numeric + categorical)

The pipeline bundles preprocessing (StandardScaler on numerics + OneHotEncoder on categoricals) with the trained regressor. Raw input goes in, prediction comes out -- no external preprocessing required.

Intended Use

Primary use case: Per-product, per-store sales forecasting to drive inventory procurement, shelf-space allocation, and regional sales strategy decisions.

Out of scope:

Predicting sales for novel store formats not seen in training (only 4 stores in the training set).
True time-series forecasting -- this model predicts expected per-product revenue given product and store attributes, not temporal trends.
Sensitive decisioning where a single prediction drives irreversible outcomes.

Performance

Evaluated on a held-out 20% test set (n=1,753):

Metric	Value
RMSE	277.28
MAE	107.66
R²	0.9326
MAPE	0.0391

How to Use

from huggingface_hub import hf_hub_download
import joblib
import pandas as pd

# Download and load the pipeline
model_path = hf_hub_download(
    repo_id="jeremygracey-ai/superkart-sales-predictor",
    filename="superkart_model.joblib"
)
model = joblib.load(model_path)

# Prepare input -- all 10 features required
sample = pd.DataFrame([{
    "Product_Weight": 12.5,
    "Product_Sugar_Content": "Low Sugar",
    "Product_Allocated_Area": 0.05,
    "Product_Type": "Dairy",
    "Product_MRP": 150.0,
    "Store_Size": "Medium",
    "Store_Location_City_Type": "Tier 2",
    "Store_Type": "Supermarket Type2",
    "Product_Category": "Food",
    "Store_Age": 0,
}])

prediction = model.predict(sample)[0]
print(f"Predicted sales: ${prediction:,.2f}")

Training Data

The model was trained on SuperKart retail sales data covering 8,763 product-store observations across 4 stores (Departmental Store, Supermarket Type 1, Supermarket Type 2, Food Mart) in Tier 1-3 cities. Target variable: Product_Store_Sales_Total (continuous, range ~$33 - $8,000).

Feature Engineering

Product_Category derived from Product_Id prefix (FD=Food, NC=Non-Consumable, DR=Drinks)
Store_Age derived from Store_Establishment_Year (reference: 2009)
Product_Sugar_Content normalized (collapsed "reg" into "Regular", set non-consumables to "No Sugar")
Product_Id, Store_Id, Store_Establishment_Year dropped to prevent identity leakage / enable generalization

Limitations

Trained on only 4 distinct stores -- predictions for novel store configurations are extrapolation, not forecasting.
Snapshot data, not time series -- no seasonality or trend features.
No demographic, foot-traffic, or promotional data included.

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Root Mean Squared Error on SuperKart retail sales
self-reported

277.280
Mean Absolute Error on SuperKart retail sales
self-reported

107.660
R-squared on SuperKart retail sales
self-reported

0.933
Mean Absolute Percentage Error on SuperKart retail sales
self-reported

0.039