MoWE: Mixture of Weather Experts

A lightweight gating network that combines forecasts from multiple AI weather models into a single, superior prediction.

Results

Trained on WeatherBench2 pre-computed forecasts (GraphCast, FuXi, Pangu-Weather), evaluated on 2020 ERA5 at 64×32 resolution.

Method Avg RMSE vs Mean
MoWE 6.18 -9.7%
Mean of experts 6.84 baseline
GraphCast 7.36 +7.7%
FuXi 7.45 +9.0%
Pangu 8.15 +19.2%

At 48h lead time: MoWE achieves 11%+ lower RMSE than the best individual expert.

Architecture

  • Type: Vision Transformer gating network (DiT blocks with adaLN-Zero)
  • Parameters: 4.95M
  • Input: Stacked expert forecasts (N experts × 11 channels × H × W) + lead time
  • Output: Per-expert weight maps (softmax-normalized) + bias map
  • Equation: Y = Σ(Wᵢ × Eᵢ) + b

Usage

import torch
from hayati.model import MoWEGatingNetwork, mowe_combine

model = MoWEGatingNetwork(n_experts=3, n_channels=11, img_size=(32, 64))
ckpt = torch.load("model.pt", weights_only=False)
model.load_state_dict(ckpt["model_state_dict"])

# expert_forecasts: (batch, 3, 11, 32, 64)
# lead_time: (batch,) in hours
weights, bias = model(expert_forecasts, lead_time)
prediction = mowe_combine(expert_forecasts, weights, bias)

Variables

11 channels: t2m, t500, t850, u10m, u500, u850, v10m, v500, v850, z500, z850

Training

  • Data: WeatherBench2 (GraphCast + FuXi + Pangu forecasts, 2020)
  • Ground truth: ERA5 reanalysis
  • Loss: MSE with per-channel normalization
  • Optimizer: Adam, lr=1e-4
  • Epochs: 50
  • Resolution: 64×32 (5.6°)

Paper

Based on MoWE: A Mixture of Weather Experts (Chakraborty et al., 2025).

Limitations

  • Trained at low resolution (64×32). Higher resolution models in progress.
  • Uses 2020 data only. More training data will improve generalization.
  • Current experts: GraphCast, FuXi, Pangu. Future: Aurora, AIFS, FCN3, Berkeley SFNO.
Downloads last month
40
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for farnasirim/mowe-weather-v1