๐Ÿ‡ฎ๐Ÿ‡ณ India Air Quality Prediction System

AI-powered Artificial Neural Network for forecasting air pollution in Indian tier-2 cities.

This project implements a Bidirectional LSTM with Attention neural network to predict PM2.5, PM10, NO2, SO2, O3, CO, and AQI for the next 6 hours based on 24 hours of historical data.


๐ŸŽฏ Target Cities

City Profile Typical AQI
Lucknow Industrial + vehicular Moderate-Poor
Noida Delhi-NCR corridor, severe winters Poor-Very Poor
Bengaluru Traffic-dominated, moderate Satisfactory-Moderate
Jaipur Dust + industrial Moderate
Srinagar Low-moderate, seasonal Good-Satisfactory
Mumbai Coastal, monsoon washout Satisfactory-Moderate

๐Ÿง  Model Architecture

Input (24h ร— 93 features)
    โ†“
Bidirectional LSTM (64 hidden, 1 layer)
    โ†“
Attention Mechanism
    โ†“
MLP Decoder (256 โ†’ 128 โ†’ 42)
    โ†“
Output (6h ร— 7 targets: PM2.5, PM10, NO2, SO2, O3, CO, AQI)

Parameters: ~161K
Framework: PyTorch


๐Ÿ“Š Performance Metrics

Metric PM2.5 PM10 NO2 SO2 O3 CO AQI
RMSE 0.25 0.25 0.38 0.15 0.89 0.23 18.71
MAE 0.20 0.20 0.34 0.12 0.89 0.20 15.13
Rยฒ 0.84

Metrics computed on held-out test set with inverse-transformed (original scale) values.


๐Ÿ—‚๏ธ Dataset

The dataset was generated using realistic pollution profiles based on CPCB (Central Pollution Control Board) documented patterns for Indian cities:

  • Period: 3 years of hourly data (2023-2025)
  • Records: 157,680 per city
  • Features: Pollutants, meteorology, temporal cyclical encodings, lag features, rolling statistics
  • Source: Synthetic, parameterized from published Indian air quality literature

Dataset Structure

data/india_air_quality_synthetic.csv
โ”œโ”€โ”€ datetime, city
โ”œโ”€โ”€ pm25, pm10, no2, so2, o3, co
โ”œโ”€โ”€ temperature, humidity, wind_speed
โ”œโ”€โ”€ hour, day_of_week, month, day_of_year
โ”œโ”€โ”€ aqi, aqi_category

๐Ÿš€ Usage

Training the Model

pip install -r requirements.txt
python generate_synthetic_data.py  # Create dataset
python preprocess.py               # Feature engineering
python train.py                    # Train ANN

Running the Prediction App

python app.py

Python API

import torch
from model import get_model

model = get_model(model_type='lstm_attn', input_dim=93)
model.load_state_dict(torch.load('models/best_model.pt')['model_state_dict'])
# Input: (batch, 24h, 93_features) -> Output: (batch, 6h, 7_targets)

๐Ÿ“ Repository Structure

โ”œโ”€โ”€ data/                          # Raw and processed data
โ”œโ”€โ”€ processed/                     # Scaled sequences for training
โ”œโ”€โ”€ models/                        # Trained checkpoints
โ”œโ”€โ”€ model.py                       # ANN architectures
โ”œโ”€โ”€ train.py                       # Training script
โ”œโ”€โ”€ preprocess.py                  # Data preprocessing pipeline
โ”œโ”€โ”€ generate_synthetic_data.py     # Dataset generator
โ”œโ”€โ”€ app.py                         # Gradio prediction interface
โ””โ”€โ”€ requirements.txt               # Dependencies

๐Ÿ”— References

  • CPCB Air Quality Portal: https://airquality.cpcb.gov.in
  • CPCB AQI Calculation Methodology (CPCB, 2014)
  • AirPhyNet: Physics-Guided Neural Networks for Air Quality Prediction (Hettige et al., 2024)
  • AirCast: Multi-Variable Air Pollution Forecasting (Nedungadi et al., 2025)

โš–๏ธ Disclaimer

The current dataset is synthetic and parameterized from published pollution profiles for demonstration and model development. For production deployment, integrate with real-time CPCB CAAQMS data feeds or OpenAQ API with proper API keys.


๐Ÿ“œ License

Apache-2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support