| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - air-quality |
| - aqi |
| - xgboost |
| - forecasting |
| - classification |
| - shap |
| - india |
| - environmental |
| --- |
| |
| # Vayu β AQI Prediction Models |
|
|
| Pretrained ML model artifacts for [Vayu](https://github.com/rachitgoyal14/vayu), an end-to-end Air Quality Index prediction system for 29 Indian cities. |
|
|
| ## Models |
|
|
| ### Forecaster (`/forecaster`) |
| XGBoost regressors trained on 846,372 hourly pollutant readings (2015β2024) to predict AQI at three horizons. |
|
|
| | File | Horizon | RΒ² | RMSE | |
| |---|---|---|---| |
| | `xgb_6h.pkl` | +6 hours | 0.9691 | 6.94 | |
| | `xgb_12h.pkl` | +12 hours | 0.9038 | 12.25 | |
| | `xgb_24h.pkl` | +24 hours | 0.7764 | 18.68 | |
|
|
| ### Classifier (`/classifier`) |
| XGBoost classifier mapping current pollutant levels to CPCB AQI categories. |
|
|
| | File | Description | |
| |---|---| |
| | `xgb_classifier.pkl` | XGBoost β 4-class CPCB classifier | |
| | `best_classifier.pkl` | Deployed model (copy of xgb_classifier) | |
| | `classifier_metadata.json` | Label maps, class names, evaluation metrics | |
|
|
| ### Encoders (`/encoders`) |
| | File | Description | |
| |---|---| |
| | `city_encoder.pkl` | LabelEncoder for 29 city names (0β28) | |
| | `features.pkl` | Ordered feature list shared across all models | |
| | `nmf_scaler.pkl` | MinMaxScaler for NMF pollutant preprocessing | |
|
|
| ### SHAP & NMF (`/shap`) |
| | File | Description | |
| |---|---| |
| | `shap_explainer_6h.pkl` | SHAP TreeExplainer for +6h model | |
| | `shap_explainer_12h.pkl` | SHAP TreeExplainer for +12h model | |
| | `shap_explainer_24h.pkl` | SHAP TreeExplainer for +24h model | |
| | `nmf_model.pkl` | NMF model for city-level pollution source attribution | |
|
|
| ## Input Features (14) |
|
|
| | Feature | Description | |
| |---|---| |
| | `pm2_5_ugm3` | Fine particulate matter (log1p transformed) | |
| | `pm10_ugm3` | Coarse particulate matter (log1p transformed) | |
| | `co_ugm3` | Carbon monoxide (log1p transformed) | |
| | `no2_ugm3` | Nitrogen dioxide (log1p transformed) | |
| | `so2_ugm3` | Sulfur dioxide | |
| | `o3_ugm3` | Ground-level ozone (log1p transformed) | |
| | `hour` | Hour of day (0β23) | |
| | `month` | Month (1β12) | |
| | `day_of_week` | Day of week (0=Monday) | |
| | `is_weekend` | 1 if Saturday or Sunday | |
| | `city_enc` | Label-encoded city integer (0β28) | |
| | `AQI_lag_1` | AQI 1 hour prior | |
| | `AQI_lag_6` | AQI 6 hours prior | |
| | `AQI_lag_24` | AQI 24 hours prior | |
|
|
| ## Usage |
|
|
| ```python |
| import pickle, numpy as np |
| |
| with open("forecaster/xgb_6h.pkl", "rb") as f: |
| model = pickle.load(f) |
| |
| with open("encoders/city_encoder.pkl", "rb") as f: |
| city_encoder = pickle.load(f) |
| |
| features = [95.4, 142.3, 620.0, 28.5, 12.1, 45.2, 14, 4, 0, 0, |
| city_encoder.transform(["Delhi"])[0], 187, 181, 174] |
| |
| predicted_aqi = model.predict(np.array(features).reshape(1, -1)) |
| print(predicted_aqi) |
| ``` |
|
|
| ## Training Data |
|
|
| - **Source:** `rachitgoyell/vayu-raw` on HuggingFace |
| - **Size:** 846,372 hourly readings |
| - **Cities:** 29 Indian urban centres |
| - **Period:** 2015β2024 |
| - **Pollutants:** PM2.5, PM10, CO, NO2, SO2, O3 |
|
|
| ## Links |
|
|
| - π GitHub: [rachitgoyal14/vayu](https://github.com/rachitgoyal14/vayu) |
| - π Live: [vayu.rachitgoyal.in](https://vayu.rachitgoyal.in) |
| - βοΈ API: [vayu-6ss8.onrender.com](https://vayu-6ss8.onrender.com) |
|
|
| ## License |
|
|
| MIT |