Vayu β AQI Prediction Models
Pretrained ML model artifacts for Vayu, an end-to-end Air Quality Index prediction system for 29 Indian cities.
Models
Forecaster (/forecaster)
XGBoost regressors trained on 846,372 hourly pollutant readings (2015β2024) to predict AQI at three horizons.
| File | Horizon | RΒ² | RMSE |
|---|---|---|---|
xgb_6h.pkl |
+6 hours | 0.9691 | 6.94 |
xgb_12h.pkl |
+12 hours | 0.9038 | 12.25 |
xgb_24h.pkl |
+24 hours | 0.7764 | 18.68 |
Classifier (/classifier)
XGBoost classifier mapping current pollutant levels to CPCB AQI categories.
| File | Description |
|---|---|
xgb_classifier.pkl |
XGBoost β 4-class CPCB classifier |
best_classifier.pkl |
Deployed model (copy of xgb_classifier) |
classifier_metadata.json |
Label maps, class names, evaluation metrics |
Encoders (/encoders)
| File | Description |
|---|---|
city_encoder.pkl |
LabelEncoder for 29 city names (0β28) |
features.pkl |
Ordered feature list shared across all models |
nmf_scaler.pkl |
MinMaxScaler for NMF pollutant preprocessing |
SHAP & NMF (/shap)
| File | Description |
|---|---|
shap_explainer_6h.pkl |
SHAP TreeExplainer for +6h model |
shap_explainer_12h.pkl |
SHAP TreeExplainer for +12h model |
shap_explainer_24h.pkl |
SHAP TreeExplainer for +24h model |
nmf_model.pkl |
NMF model for city-level pollution source attribution |
Input Features (14)
| Feature | Description |
|---|---|
pm2_5_ugm3 |
Fine particulate matter (log1p transformed) |
pm10_ugm3 |
Coarse particulate matter (log1p transformed) |
co_ugm3 |
Carbon monoxide (log1p transformed) |
no2_ugm3 |
Nitrogen dioxide (log1p transformed) |
so2_ugm3 |
Sulfur dioxide |
o3_ugm3 |
Ground-level ozone (log1p transformed) |
hour |
Hour of day (0β23) |
month |
Month (1β12) |
day_of_week |
Day of week (0=Monday) |
is_weekend |
1 if Saturday or Sunday |
city_enc |
Label-encoded city integer (0β28) |
AQI_lag_1 |
AQI 1 hour prior |
AQI_lag_6 |
AQI 6 hours prior |
AQI_lag_24 |
AQI 24 hours prior |
Usage
import pickle, numpy as np
with open("forecaster/xgb_6h.pkl", "rb") as f:
model = pickle.load(f)
with open("encoders/city_encoder.pkl", "rb") as f:
city_encoder = pickle.load(f)
features = [95.4, 142.3, 620.0, 28.5, 12.1, 45.2, 14, 4, 0, 0,
city_encoder.transform(["Delhi"])[0], 187, 181, 174]
predicted_aqi = model.predict(np.array(features).reshape(1, -1))
print(predicted_aqi)
Training Data
- Source:
rachitgoyell/vayu-rawon HuggingFace - Size: 846,372 hourly readings
- Cities: 29 Indian urban centres
- Period: 2015β2024
- Pollutants: PM2.5, PM10, CO, NO2, SO2, O3
Links
- π GitHub: rachitgoyal14/vayu
- π Live: vayu.rachitgoyal.in
- βοΈ API: vayu-6ss8.onrender.com
License
MIT
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support