Vayu β€” AQI Prediction Models

Pretrained ML model artifacts for Vayu, an end-to-end Air Quality Index prediction system for 29 Indian cities.

Models

Forecaster (/forecaster)

XGBoost regressors trained on 846,372 hourly pollutant readings (2015–2024) to predict AQI at three horizons.

File Horizon RΒ² RMSE
xgb_6h.pkl +6 hours 0.9691 6.94
xgb_12h.pkl +12 hours 0.9038 12.25
xgb_24h.pkl +24 hours 0.7764 18.68

Classifier (/classifier)

XGBoost classifier mapping current pollutant levels to CPCB AQI categories.

File Description
xgb_classifier.pkl XGBoost β€” 4-class CPCB classifier
best_classifier.pkl Deployed model (copy of xgb_classifier)
classifier_metadata.json Label maps, class names, evaluation metrics

Encoders (/encoders)

File Description
city_encoder.pkl LabelEncoder for 29 city names (0–28)
features.pkl Ordered feature list shared across all models
nmf_scaler.pkl MinMaxScaler for NMF pollutant preprocessing

SHAP & NMF (/shap)

File Description
shap_explainer_6h.pkl SHAP TreeExplainer for +6h model
shap_explainer_12h.pkl SHAP TreeExplainer for +12h model
shap_explainer_24h.pkl SHAP TreeExplainer for +24h model
nmf_model.pkl NMF model for city-level pollution source attribution

Input Features (14)

Feature Description
pm2_5_ugm3 Fine particulate matter (log1p transformed)
pm10_ugm3 Coarse particulate matter (log1p transformed)
co_ugm3 Carbon monoxide (log1p transformed)
no2_ugm3 Nitrogen dioxide (log1p transformed)
so2_ugm3 Sulfur dioxide
o3_ugm3 Ground-level ozone (log1p transformed)
hour Hour of day (0–23)
month Month (1–12)
day_of_week Day of week (0=Monday)
is_weekend 1 if Saturday or Sunday
city_enc Label-encoded city integer (0–28)
AQI_lag_1 AQI 1 hour prior
AQI_lag_6 AQI 6 hours prior
AQI_lag_24 AQI 24 hours prior

Usage

import pickle, numpy as np

with open("forecaster/xgb_6h.pkl", "rb") as f:
    model = pickle.load(f)

with open("encoders/city_encoder.pkl", "rb") as f:
    city_encoder = pickle.load(f)

features = [95.4, 142.3, 620.0, 28.5, 12.1, 45.2, 14, 4, 0, 0, 
            city_encoder.transform(["Delhi"])[0], 187, 181, 174]

predicted_aqi = model.predict(np.array(features).reshape(1, -1))
print(predicted_aqi)

Training Data

  • Source: rachitgoyell/vayu-raw on HuggingFace
  • Size: 846,372 hourly readings
  • Cities: 29 Indian urban centres
  • Period: 2015–2024
  • Pollutants: PM2.5, PM10, CO, NO2, SO2, O3

Links

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support