vayu-models / README.md
rachitgoyell's picture
Create README.md
734aa85 verified
---
license: mit
language:
- en
tags:
- air-quality
- aqi
- xgboost
- forecasting
- classification
- shap
- india
- environmental
---
# Vayu β€” AQI Prediction Models
Pretrained ML model artifacts for [Vayu](https://github.com/rachitgoyal14/vayu), an end-to-end Air Quality Index prediction system for 29 Indian cities.
## Models
### Forecaster (`/forecaster`)
XGBoost regressors trained on 846,372 hourly pollutant readings (2015–2024) to predict AQI at three horizons.
| File | Horizon | RΒ² | RMSE |
|---|---|---|---|
| `xgb_6h.pkl` | +6 hours | 0.9691 | 6.94 |
| `xgb_12h.pkl` | +12 hours | 0.9038 | 12.25 |
| `xgb_24h.pkl` | +24 hours | 0.7764 | 18.68 |
### Classifier (`/classifier`)
XGBoost classifier mapping current pollutant levels to CPCB AQI categories.
| File | Description |
|---|---|
| `xgb_classifier.pkl` | XGBoost β€” 4-class CPCB classifier |
| `best_classifier.pkl` | Deployed model (copy of xgb_classifier) |
| `classifier_metadata.json` | Label maps, class names, evaluation metrics |
### Encoders (`/encoders`)
| File | Description |
|---|---|
| `city_encoder.pkl` | LabelEncoder for 29 city names (0–28) |
| `features.pkl` | Ordered feature list shared across all models |
| `nmf_scaler.pkl` | MinMaxScaler for NMF pollutant preprocessing |
### SHAP & NMF (`/shap`)
| File | Description |
|---|---|
| `shap_explainer_6h.pkl` | SHAP TreeExplainer for +6h model |
| `shap_explainer_12h.pkl` | SHAP TreeExplainer for +12h model |
| `shap_explainer_24h.pkl` | SHAP TreeExplainer for +24h model |
| `nmf_model.pkl` | NMF model for city-level pollution source attribution |
## Input Features (14)
| Feature | Description |
|---|---|
| `pm2_5_ugm3` | Fine particulate matter (log1p transformed) |
| `pm10_ugm3` | Coarse particulate matter (log1p transformed) |
| `co_ugm3` | Carbon monoxide (log1p transformed) |
| `no2_ugm3` | Nitrogen dioxide (log1p transformed) |
| `so2_ugm3` | Sulfur dioxide |
| `o3_ugm3` | Ground-level ozone (log1p transformed) |
| `hour` | Hour of day (0–23) |
| `month` | Month (1–12) |
| `day_of_week` | Day of week (0=Monday) |
| `is_weekend` | 1 if Saturday or Sunday |
| `city_enc` | Label-encoded city integer (0–28) |
| `AQI_lag_1` | AQI 1 hour prior |
| `AQI_lag_6` | AQI 6 hours prior |
| `AQI_lag_24` | AQI 24 hours prior |
## Usage
```python
import pickle, numpy as np
with open("forecaster/xgb_6h.pkl", "rb") as f:
model = pickle.load(f)
with open("encoders/city_encoder.pkl", "rb") as f:
city_encoder = pickle.load(f)
features = [95.4, 142.3, 620.0, 28.5, 12.1, 45.2, 14, 4, 0, 0,
city_encoder.transform(["Delhi"])[0], 187, 181, 174]
predicted_aqi = model.predict(np.array(features).reshape(1, -1))
print(predicted_aqi)
```
## Training Data
- **Source:** `rachitgoyell/vayu-raw` on HuggingFace
- **Size:** 846,372 hourly readings
- **Cities:** 29 Indian urban centres
- **Period:** 2015–2024
- **Pollutants:** PM2.5, PM10, CO, NO2, SO2, O3
## Links
- πŸ”— GitHub: [rachitgoyal14/vayu](https://github.com/rachitgoyal14/vayu)
- 🌐 Live: [vayu.rachitgoyal.in](https://vayu.rachitgoyal.in)
- βš™οΈ API: [vayu-6ss8.onrender.com](https://vayu-6ss8.onrender.com)
## License
MIT