Instructions to use jeremygracey-ai/superkart-sales-predictor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use jeremygracey-ai/superkart-sales-predictor with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("jeremygracey-ai/superkart-sales-predictor", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
SuperKart Sales Predictor
A scikit-learn pipeline Random Forest (tuned) trained to forecast
Product_Store_Sales_Total for a retail chain operating supermarkets and food
marts across Tier 1-3 cities.
Model Description
- Model type: Random Forest (tuned)
- Task: Regression (continuous sales revenue prediction)
- Framework: scikit-learn Pipeline with embedded preprocessing
- Training size: 7,010 product-store observations
- Features: 10 raw features (numeric + categorical)
The pipeline bundles preprocessing (StandardScaler on numerics + OneHotEncoder on categoricals) with the trained regressor. Raw input goes in, prediction comes out -- no external preprocessing required.
Intended Use
Primary use case: Per-product, per-store sales forecasting to drive inventory procurement, shelf-space allocation, and regional sales strategy decisions.
Out of scope:
- Predicting sales for novel store formats not seen in training (only 4 stores in the training set).
- True time-series forecasting -- this model predicts expected per-product revenue given product and store attributes, not temporal trends.
- Sensitive decisioning where a single prediction drives irreversible outcomes.
Performance
Evaluated on a held-out 20% test set (n=1,753):
| Metric | Value |
|---|---|
| RMSE | 277.28 |
| MAE | 107.66 |
| R² | 0.9326 |
| MAPE | 0.0391 |
How to Use
from huggingface_hub import hf_hub_download
import joblib
import pandas as pd
# Download and load the pipeline
model_path = hf_hub_download(
repo_id="jeremygracey-ai/superkart-sales-predictor",
filename="superkart_model.joblib"
)
model = joblib.load(model_path)
# Prepare input -- all 10 features required
sample = pd.DataFrame([{
"Product_Weight": 12.5,
"Product_Sugar_Content": "Low Sugar",
"Product_Allocated_Area": 0.05,
"Product_Type": "Dairy",
"Product_MRP": 150.0,
"Store_Size": "Medium",
"Store_Location_City_Type": "Tier 2",
"Store_Type": "Supermarket Type2",
"Product_Category": "Food",
"Store_Age": 0,
}])
prediction = model.predict(sample)[0]
print(f"Predicted sales: ${prediction:,.2f}")
Training Data
The model was trained on SuperKart retail sales data covering
8,763 product-store observations across 4
stores (Departmental Store, Supermarket Type 1, Supermarket Type 2, Food Mart)
in Tier 1-3 cities. Target variable: Product_Store_Sales_Total
(continuous, range ~$33 - $8,000).
Feature Engineering
Product_Categoryderived fromProduct_Idprefix (FD=Food, NC=Non-Consumable, DR=Drinks)Store_Agederived fromStore_Establishment_Year(reference: 2009)Product_Sugar_Contentnormalized (collapsed "reg" into "Regular", set non-consumables to "No Sugar")Product_Id,Store_Id,Store_Establishment_Yeardropped to prevent identity leakage / enable generalization
Limitations
- Trained on only 4 distinct stores -- predictions for novel store configurations are extrapolation, not forecasting.
- Snapshot data, not time series -- no seasonality or trend features.
- No demographic, foot-traffic, or promotional data included.
License
Apache 2.0
Evaluation results
- Root Mean Squared Error on SuperKart retail salesself-reported277.280
- Mean Absolute Error on SuperKart retail salesself-reported107.660
- R-squared on SuperKart retail salesself-reported0.933
- Mean Absolute Percentage Error on SuperKart retail salesself-reported0.039