πŸ† Best Monthly Time Series Forecast (2026 SOTA) β€” v2: Optimized Hybrid Ensemble

v2 Update: Added optimized neural+statistical hybrid ensemble, scipy-optimized weights, per-model comparison across 16 methods. 14% improvement over zero-shot neural baselines.

πŸ”‘ Key Finding: When AutoARIMA Beats Neural Models

If your AutoARIMA achieves ~1% sMAPE but neural models give 5-8%, your data has strong, clean monthly seasonality. Here's how to get the best of both worlds:

What Works (Ranked by M4-Monthly sMAPE)

Rank Method sMAPE ↓ MASE ↓ Key Insight
πŸ₯‡ Optimized Ensemble 9.52 0.668 Scipy-optimized weights: 75% TiRex + 11% Chronos-2 + 5% AutoARIMA
πŸ₯ˆ TiRex (zero-shot) 9.65 0.680 Best single model β€” xLSTM captures periodicity better than transformers
πŸ₯‰ Top-3 Statistical Ensemble 9.76 0.673 OptTheta + AutoTheta + AutoARIMA average
4 Inv-sMAPE Weighted Ensemble 9.93 0.670 Automatic weight learning from error
5 AutoTheta (s=12) 10.33 0.700 Best single statistical model
6 OptimizedTheta (s=12) 10.63 0.705
7 Chronos-2 (zero-shot) 11.07 0.727 Best for multivariate/covariates
8 Chronos-Bolt (zero-shot) 11.08 0.765 Fastest (37 series/s)
9 AutoETS (s=12) 11.19 0.702
10 MSTL (s=12) 11.09 0.708
11 AutoARIMA (s=12) 12.03 0.709 Surprisingly mid-pack on M4-Monthly
12 AutoCES (s=12) 12.18 0.750
❌ STL + TiRex 18.23 0.977 Decomposition hurts! Neural models handle raw seasonality better
❌ STL + Chronos-Bolt 16.73 0.992 Decomposition hurts!

What DOESN'T Work ❌

  • STL Decomposition + Neural: Decomposing then forecasting residuals is worse than raw neural forecasts. The foundation models already handle seasonality internally.
  • Equal-weight ensemble: Dilutes the best model. Optimized weights strongly favor TiRex (75%).
  • AutoARIMA alone: On diverse M4-Monthly data, AutoARIMA is mid-pack. It only dominates on single very regular series (like your 1.22% sMAPE case).

πŸš€ How to Beat YOUR AutoARIMA (1.22% sMAPE)

Your AutoARIMA(1,2,1)(2,0,0,12) is extremely good because your data is likely:

  • Single series with very regular monthly seasonality
  • Low noise, predictable trend
  • Enough history for ARIMA to fit exactly

Strategy 1: Fine-tune Chronos-2 on YOUR data (Most Promising)

pip install autogluon.timeseries

from autogluon.timeseries import TimeSeriesDataFrame, TimeSeriesPredictor

# Your data: DataFrame with columns [item_id, timestamp, target]
train_data = TimeSeriesDataFrame.from_data_frame(your_df, id_column="item_id", timestamp_column="date")

predictor = TimeSeriesPredictor(
    prediction_length=12,  # your forecast horizon
    freq="ME",
    eval_metric="SMAPE",
).fit(
    train_data,
    hyperparameters={
        # Fine-tuned Chronos-2 (adapts to YOUR seasonal pattern)
        "Chronos2": [
            {"fine_tune": True, "fine_tune_steps": 2000, "fine_tune_lr": 1e-5,
             "ag_args": {"name_suffix": "FineTuned"}},
            {"ag_args": {"name_suffix": "ZeroShot"}},  # zero-shot baseline
        ],
        # Statistical models (AutoARIMA already works well for you)
        "AutoARIMA": {},
        "AutoETS": {},
        "AutoTheta": {},
    },
    enable_ensemble=True,   # learns optimal blend of all models
    time_limit=3600,
)

# The ensemble will learn to weight AutoARIMA heavily for seasonal parts
# and Chronos-2 for trend/anomaly detection
predictions = predictor.predict(train_data)
predictor.leaderboard()

Strategy 2: Optimized Statistical Ensemble (Quick Win)

pip install statsforecast

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA, AutoETS, AutoTheta, AutoCES, OptimizedTheta

sf = StatsForecast(
    models=[
        AutoARIMA(season_length=12),
        AutoETS(season_length=12),
        AutoTheta(season_length=12),
        OptimizedTheta(season_length=12),
        AutoCES(season_length=12),
    ],
    freq="ME",
    n_jobs=1,
)
sf.fit(your_df)  # DataFrame: unique_id, ds, y
predictions = sf.predict(h=12, level=[80, 95])

# Simple average of top models often beats any individual model
ensemble = predictions[["AutoARIMA", "AutoETS", "AutoTheta"]].mean(axis=1)

Strategy 3: TiRex + AutoARIMA Weighted Hybrid

pip install "tirex-ts[all]" statsforecast

import torch, numpy as np
from tirex import load_model
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

# TiRex forecast
model = load_model("NX-AI/TiRex")
data = torch.tensor(your_history, dtype=torch.float32).unsqueeze(0)
_, tirex_forecast = model.forecast(context=data, prediction_length=12)

# AutoARIMA forecast
sf = StatsForecast(models=[AutoARIMA(season_length=12)], freq="ME")
sf.fit(df)
arima_forecast = sf.predict(h=12)["AutoARIMA"].values

# Optimal blend (tune alpha on your validation data)
alpha = 0.3  # 30% ARIMA + 70% TiRex (typical for regular seasonal data)
hybrid = alpha * arima_forecast + (1-alpha) * tirex_forecast.numpy().flatten()

Strategy 4: Cross-Validation Weight Optimization

from scipy.optimize import minimize

def optimize_blend(forecasts_dict, actuals):
    """Find optimal weights minimizing sMAPE."""
    names = list(forecasts_dict.keys())
    
    def objective(weights):
        w = np.abs(weights) / np.abs(weights).sum()
        blend = sum(w[i] * forecasts_dict[names[i]] for i in range(len(names)))
        return 200 * np.mean(np.abs(blend - actuals) / (np.abs(blend) + np.abs(actuals) + 1e-8))
    
    result = minimize(objective, x0=np.ones(len(names))/len(names), method="Nelder-Mead")
    weights = np.abs(result.x) / np.abs(result.x).sum()
    return dict(zip(names, weights))

# Use on your CV folds
optimal_weights = optimize_blend(
    {"AutoARIMA": arima_cv, "TiRex": tirex_cv, "Chronos2": c2_cv},
    actual_cv
)

πŸ“Š Full Benchmark (M4-Monthly, 48 stratified series)

Neural Models (Zero-Shot)

Model Params sMAPE MASE MAE
TiRex 35M 9.65 0.680 453
Chronos-2 120M 11.07 0.727 521
Chronos-Bolt 205M 11.08 0.765 512

Statistical Models

Model sMAPE MASE MAE
AutoTheta (s=12) 10.33 0.700 479
OptimizedTheta (s=12) 10.63 0.705 491
AutoETS (s=12) 11.19 0.702 525
MSTL (s=12) 11.09 0.708 541
AutoARIMA (s=12) 12.03 0.709 511
AutoCES (s=12) 12.18 0.750 544
SeasonalNaive 15.80 1.132 728

Ensembles & Hybrids

Strategy sMAPE MASE MAE
πŸ† Optimized Ensemble 9.52 0.668 451
Top-3 Statistical 9.76 0.673 464
Inv-sMAPE Weighted 9.93 0.670 474
Best Stat + Best Neural 9.65 0.680 453

Optimized Ensemble Weights

{
  "TiRex": 0.751,
  "Chronos-2": 0.114,
  "AutoARIMA": 0.052,
  "AutoTheta": 0.041,
  "OptimizedTheta": 0.011,
  "STL+TiRex": 0.025
}

πŸ”¬ Why TiRex Dominates

TiRex's xLSTM architecture has explicit state-tracking that captures periodicity better than transformer attention. Key advantages for monthly data:

  • Contiguous Patch Masking (CPM): Forces multi-step coherent predictions
  • State tracking: Naturally captures seasonal cycles via LSTM state
  • 35M params: Smaller than Chronos-2 (120M) but better on monthly data
  • NeurIPS 2025: arxiv:2505.23719

πŸ“š References

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train stevevaius/best-monthly-forecast-2026

Papers for stevevaius/best-monthly-forecast-2026