ModernTCN for Probabilistic Multivariate Forecasting

The backbone of this model is based on:

Donghao Luo, Xue Wang. ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis. ICLR 2024 Spotlight. OpenReview

This version is adapted for probabilistic forecasting. Instead of only predicting future values, it predicts a Gaussian distribution for each future step and each variable.

Give the model a window of past values and it returns:

loc: the predicted mean
scale: the predicted standard deviation

That means you get both a forecast and uncertainty bands.

The current checkpoint in this repo is a starter model trained on the hourly ETTh1 benchmark. It is a practical first release, not a heavily tuned benchmark run.

Intended use

This model is a good fit for:

energy and load forecasting
traffic and mobility forecasting
retail demand forecasting
industrial sensor forecasting
other regularly sampled multivariate operational time series

Inputs and outputs

Input:

past_values: [batch_size, context_length, num_input_channels]

Optional training target:

future_values: [batch_size, prediction_length, num_input_channels]

Output:

loc: [batch_size, prediction_length, num_input_channels]
scale: [batch_size, prediction_length, num_input_channels]
loss: scalar, when future_values is provided

Current checkpoint

Current released checkpoint details:

Dataset: ETTh1
Frequency: hourly
Context length: 512
Prediction length: 96
Variables: HUFL,HULL,MUFL,MULL,LUFL,LULL,OT
External normalization: dataset-level standardization
In-model normalization: RevIN
Objective: Gaussian negative log-likelihood
Optimizer: AdamW
Learning rate: 1e-3
Batch size: 32
Backbone width: stage_dims=[16, 32, 64]
Backbone depth: blocks_per_stage=[1, 1, 2]
Best checkpoint: epoch 1

This checkpoint was trained on CPU-friendly settings so we could get a complete end-to-end model ready for the Hub.

Evaluation

These are the numbers for the current starter checkpoint:

Split	Dataset	Horizon	MAE	RMSE	NLL
Validation	ETTh1	96	1.6252	3.0474	0.8281
Test	ETTh1	96	1.7555	3.0745	1.0163

For reference, this is a probabilistic adaptation of ModernTCN, so point-forecast metrics from the original paper are only a rough reference point, not a direct apples-to-apples comparison.

How to use it

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained(
    "your-username/modern-tcn-probabilistic-etth1-cpu-starter",
    trust_remote_code=True,
)

past_values = torch.randn(1, 512, 7)
outputs = model(past_values=past_values)

print(outputs.loc.shape)
print(outputs.scale.shape)

If you want simple Gaussian uncertainty bands:

lower_95 = outputs.loc - 1.96 * outputs.scale
upper_95 = outputs.loc + 1.96 * outputs.scale

If you want forecast samples:

samples = model.sample(past_values, num_samples=100)
print(samples.shape)

Train on your own data

The training script supports:

.csv with a header row
.npy shaped [time, channels]
.npz with a values array or a single 2D array

Example:

python scripts/train.py --data-path data/series.csv --timestamp-column timestamp --value-columns load,temperature,price --context-length 512 --prediction-length 96 --epochs 20 --batch-size 32 --output-dir runs/modern-tcn-probabilistic

After training, the script writes:

runs/.../best_model
runs/.../last_model
runs/.../history.json
runs/.../summary.json
runs/.../data_config.json

If you want dataset-level scaling:

python scripts/train.py --data-path data/series.csv --normalization standard

Limitations

The model expects the same variable ordering used in training.
Uncertainty quality still needs calibration checks before real deployment.
Performance can drop under strong distribution shift.
The current checkpoint is a starter CPU-trained run, not a fully tuned benchmark model.
The Keras implementation mirrors the architecture, but the Hugging Face packaging path is centered on PyTorch.

Citation

If you use this model, please cite the original ModernTCN paper:

@inproceedings{
  donghao2024moderntcn,
  title={Modern{TCN}: A Modern Pure Convolution Structure for General Time Series Analysis},
  author={Donghao Luo and Xue Wang},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=vpJMJerXHU}
}

Downloads last month: 52

Safetensors

Model size

543k params

Tensor type

F32