--- license: mit language: - en tags: - air-quality - aqi - xgboost - forecasting - classification - shap - india - environmental --- # Vayu — AQI Prediction Models Pretrained ML model artifacts for [Vayu](https://github.com/rachitgoyal14/vayu), an end-to-end Air Quality Index prediction system for 29 Indian cities. ## Models ### Forecaster (`/forecaster`) XGBoost regressors trained on 846,372 hourly pollutant readings (2015–2024) to predict AQI at three horizons. | File | Horizon | R² | RMSE | |---|---|---|---| | `xgb_6h.pkl` | +6 hours | 0.9691 | 6.94 | | `xgb_12h.pkl` | +12 hours | 0.9038 | 12.25 | | `xgb_24h.pkl` | +24 hours | 0.7764 | 18.68 | ### Classifier (`/classifier`) XGBoost classifier mapping current pollutant levels to CPCB AQI categories. | File | Description | |---|---| | `xgb_classifier.pkl` | XGBoost — 4-class CPCB classifier | | `best_classifier.pkl` | Deployed model (copy of xgb_classifier) | | `classifier_metadata.json` | Label maps, class names, evaluation metrics | ### Encoders (`/encoders`) | File | Description | |---|---| | `city_encoder.pkl` | LabelEncoder for 29 city names (0–28) | | `features.pkl` | Ordered feature list shared across all models | | `nmf_scaler.pkl` | MinMaxScaler for NMF pollutant preprocessing | ### SHAP & NMF (`/shap`) | File | Description | |---|---| | `shap_explainer_6h.pkl` | SHAP TreeExplainer for +6h model | | `shap_explainer_12h.pkl` | SHAP TreeExplainer for +12h model | | `shap_explainer_24h.pkl` | SHAP TreeExplainer for +24h model | | `nmf_model.pkl` | NMF model for city-level pollution source attribution | ## Input Features (14) | Feature | Description | |---|---| | `pm2_5_ugm3` | Fine particulate matter (log1p transformed) | | `pm10_ugm3` | Coarse particulate matter (log1p transformed) | | `co_ugm3` | Carbon monoxide (log1p transformed) | | `no2_ugm3` | Nitrogen dioxide (log1p transformed) | | `so2_ugm3` | Sulfur dioxide | | `o3_ugm3` | Ground-level ozone (log1p transformed) | | `hour` | Hour of day (0–23) | | `month` | Month (1–12) | | `day_of_week` | Day of week (0=Monday) | | `is_weekend` | 1 if Saturday or Sunday | | `city_enc` | Label-encoded city integer (0–28) | | `AQI_lag_1` | AQI 1 hour prior | | `AQI_lag_6` | AQI 6 hours prior | | `AQI_lag_24` | AQI 24 hours prior | ## Usage ```python import pickle, numpy as np with open("forecaster/xgb_6h.pkl", "rb") as f: model = pickle.load(f) with open("encoders/city_encoder.pkl", "rb") as f: city_encoder = pickle.load(f) features = [95.4, 142.3, 620.0, 28.5, 12.1, 45.2, 14, 4, 0, 0, city_encoder.transform(["Delhi"])[0], 187, 181, 174] predicted_aqi = model.predict(np.array(features).reshape(1, -1)) print(predicted_aqi) ``` ## Training Data - **Source:** `rachitgoyell/vayu-raw` on HuggingFace - **Size:** 846,372 hourly readings - **Cities:** 29 Indian urban centres - **Period:** 2015–2024 - **Pollutants:** PM2.5, PM10, CO, NO2, SO2, O3 ## Links - 🔗 GitHub: [rachitgoyal14/vayu](https://github.com/rachitgoyal14/vayu) - 🌐 Live: [vayu.rachitgoyal.in](https://vayu.rachitgoyal.in) - ⚙️ API: [vayu-6ss8.onrender.com](https://vayu-6ss8.onrender.com) ## License MIT