File size: 3,184 Bytes
734aa85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
license: mit
language:
- en
tags:
- air-quality
- aqi
- xgboost
- forecasting
- classification
- shap
- india
- environmental
---

# Vayu β€” AQI Prediction Models

Pretrained ML model artifacts for [Vayu](https://github.com/rachitgoyal14/vayu), an end-to-end Air Quality Index prediction system for 29 Indian cities.

## Models

### Forecaster (`/forecaster`)
XGBoost regressors trained on 846,372 hourly pollutant readings (2015–2024) to predict AQI at three horizons.

| File | Horizon | RΒ² | RMSE |
|---|---|---|---|
| `xgb_6h.pkl` | +6 hours | 0.9691 | 6.94 |
| `xgb_12h.pkl` | +12 hours | 0.9038 | 12.25 |
| `xgb_24h.pkl` | +24 hours | 0.7764 | 18.68 |

### Classifier (`/classifier`)
XGBoost classifier mapping current pollutant levels to CPCB AQI categories.

| File | Description |
|---|---|
| `xgb_classifier.pkl` | XGBoost β€” 4-class CPCB classifier |
| `best_classifier.pkl` | Deployed model (copy of xgb_classifier) |
| `classifier_metadata.json` | Label maps, class names, evaluation metrics |

### Encoders (`/encoders`)
| File | Description |
|---|---|
| `city_encoder.pkl` | LabelEncoder for 29 city names (0–28) |
| `features.pkl` | Ordered feature list shared across all models |
| `nmf_scaler.pkl` | MinMaxScaler for NMF pollutant preprocessing |

### SHAP & NMF (`/shap`)
| File | Description |
|---|---|
| `shap_explainer_6h.pkl` | SHAP TreeExplainer for +6h model |
| `shap_explainer_12h.pkl` | SHAP TreeExplainer for +12h model |
| `shap_explainer_24h.pkl` | SHAP TreeExplainer for +24h model |
| `nmf_model.pkl` | NMF model for city-level pollution source attribution |

## Input Features (14)

| Feature | Description |
|---|---|
| `pm2_5_ugm3` | Fine particulate matter (log1p transformed) |
| `pm10_ugm3` | Coarse particulate matter (log1p transformed) |
| `co_ugm3` | Carbon monoxide (log1p transformed) |
| `no2_ugm3` | Nitrogen dioxide (log1p transformed) |
| `so2_ugm3` | Sulfur dioxide |
| `o3_ugm3` | Ground-level ozone (log1p transformed) |
| `hour` | Hour of day (0–23) |
| `month` | Month (1–12) |
| `day_of_week` | Day of week (0=Monday) |
| `is_weekend` | 1 if Saturday or Sunday |
| `city_enc` | Label-encoded city integer (0–28) |
| `AQI_lag_1` | AQI 1 hour prior |
| `AQI_lag_6` | AQI 6 hours prior |
| `AQI_lag_24` | AQI 24 hours prior |

## Usage

```python
import pickle, numpy as np

with open("forecaster/xgb_6h.pkl", "rb") as f:
    model = pickle.load(f)

with open("encoders/city_encoder.pkl", "rb") as f:
    city_encoder = pickle.load(f)

features = [95.4, 142.3, 620.0, 28.5, 12.1, 45.2, 14, 4, 0, 0, 
            city_encoder.transform(["Delhi"])[0], 187, 181, 174]

predicted_aqi = model.predict(np.array(features).reshape(1, -1))
print(predicted_aqi)
```

## Training Data

- **Source:** `rachitgoyell/vayu-raw` on HuggingFace
- **Size:** 846,372 hourly readings
- **Cities:** 29 Indian urban centres
- **Period:** 2015–2024
- **Pollutants:** PM2.5, PM10, CO, NO2, SO2, O3

## Links

- πŸ”— GitHub: [rachitgoyal14/vayu](https://github.com/rachitgoyal14/vayu)
- 🌐 Live: [vayu.rachitgoyal.in](https://vayu.rachitgoyal.in)
- βš™οΈ API: [vayu-6ss8.onrender.com](https://vayu-6ss8.onrender.com)

## License

MIT