๐ฎ๐ณ India Air Quality Prediction System
AI-powered Artificial Neural Network for forecasting air pollution in Indian tier-2 cities.
This project implements a Bidirectional LSTM with Attention neural network to predict PM2.5, PM10, NO2, SO2, O3, CO, and AQI for the next 6 hours based on 24 hours of historical data.
๐ฏ Target Cities
| City | Profile | Typical AQI |
|---|---|---|
| Lucknow | Industrial + vehicular | Moderate-Poor |
| Noida | Delhi-NCR corridor, severe winters | Poor-Very Poor |
| Bengaluru | Traffic-dominated, moderate | Satisfactory-Moderate |
| Jaipur | Dust + industrial | Moderate |
| Srinagar | Low-moderate, seasonal | Good-Satisfactory |
| Mumbai | Coastal, monsoon washout | Satisfactory-Moderate |
๐ง Model Architecture
Input (24h ร 93 features)
โ
Bidirectional LSTM (64 hidden, 1 layer)
โ
Attention Mechanism
โ
MLP Decoder (256 โ 128 โ 42)
โ
Output (6h ร 7 targets: PM2.5, PM10, NO2, SO2, O3, CO, AQI)
Parameters: ~161K
Framework: PyTorch
๐ Performance Metrics
| Metric | PM2.5 | PM10 | NO2 | SO2 | O3 | CO | AQI |
|---|---|---|---|---|---|---|---|
| RMSE | 0.25 | 0.25 | 0.38 | 0.15 | 0.89 | 0.23 | 18.71 |
| MAE | 0.20 | 0.20 | 0.34 | 0.12 | 0.89 | 0.20 | 15.13 |
| Rยฒ | 0.84 |
Metrics computed on held-out test set with inverse-transformed (original scale) values.
๐๏ธ Dataset
The dataset was generated using realistic pollution profiles based on CPCB (Central Pollution Control Board) documented patterns for Indian cities:
- Period: 3 years of hourly data (2023-2025)
- Records: 157,680 per city
- Features: Pollutants, meteorology, temporal cyclical encodings, lag features, rolling statistics
- Source: Synthetic, parameterized from published Indian air quality literature
Dataset Structure
data/india_air_quality_synthetic.csv
โโโ datetime, city
โโโ pm25, pm10, no2, so2, o3, co
โโโ temperature, humidity, wind_speed
โโโ hour, day_of_week, month, day_of_year
โโโ aqi, aqi_category
๐ Usage
Training the Model
pip install -r requirements.txt
python generate_synthetic_data.py # Create dataset
python preprocess.py # Feature engineering
python train.py # Train ANN
Running the Prediction App
python app.py
Python API
import torch
from model import get_model
model = get_model(model_type='lstm_attn', input_dim=93)
model.load_state_dict(torch.load('models/best_model.pt')['model_state_dict'])
# Input: (batch, 24h, 93_features) -> Output: (batch, 6h, 7_targets)
๐ Repository Structure
โโโ data/ # Raw and processed data
โโโ processed/ # Scaled sequences for training
โโโ models/ # Trained checkpoints
โโโ model.py # ANN architectures
โโโ train.py # Training script
โโโ preprocess.py # Data preprocessing pipeline
โโโ generate_synthetic_data.py # Dataset generator
โโโ app.py # Gradio prediction interface
โโโ requirements.txt # Dependencies
๐ References
- CPCB Air Quality Portal: https://airquality.cpcb.gov.in
- CPCB AQI Calculation Methodology (CPCB, 2014)
- AirPhyNet: Physics-Guided Neural Networks for Air Quality Prediction (Hettige et al., 2024)
- AirCast: Multi-Variable Air Pollution Forecasting (Nedungadi et al., 2025)
โ๏ธ Disclaimer
The current dataset is synthetic and parameterized from published pollution profiles for demonstration and model development. For production deployment, integrate with real-time CPCB CAAQMS data feeds or OpenAQ API with proper API keys.
๐ License
Apache-2.0