---
tags:
- time-series-forecasting
- foundation-models
- pretrained-models
- time-series
- timeseries
- forecasting
- observability
- safetensors
- pytorch_model_hub_mixin
license: apache-2.0
pipeline_tag: time-series-forecasting
thumbnail:
results:
  - task: time-series-forecasting
    dataset: 
      name: GIFT-Eval
    metrics:
      - name: MASE
        type: mase
        value: 0.757
      - name: CRPS
        type: brier_score
        value: 0.524
    source:
      name: GIFT-Eval Time Series Forecasting Leaderboard
      url: https://huggingface.co/spaces/Salesforce/GIFT-Eval

  - task: time-series-forecasting
    dataset:
      name: BOOM
    metrics:
      - name: MASE
        type: mase
        value: 0.624
      - name: CRPS
        type: brier_score
        value: 0.717
    source:
      name: BOOM 💥 Observability Time-Series Forecasting Leaderboard
      url: https://huggingface.co/spaces/Datadog/BOOM
---
# Toto-2.0-4m

Toto (**T**ime Series **O**ptimized **T**ransformer for [**O**bservability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/). **Toto 2.0** is the current generation, featuring u-μP-scaled transformers ranging from 4M to 2.5B parameters.

---

## ✨ Key Features

- **Zero-Shot Forecasting**: Forecast without fine-tuning on your specific time series.
- **Multi-Variate Support**: Efficiently process multiple variables using alternating time/variate attention.
- **Probabilistic Predictions**: Generate point forecasts and uncertainty estimates via a quantile output head.
- **Decoder-Only Architecture**: Support for variable prediction horizons and context lengths.
- **u-μP Scaling**: Stable training transfer across all model sizes.

<div style="width: 100%; margin: auto; padding: 1rem;">
  <img src="figures/architecture.png" alt="Toto 2.0 architecture" style="width: 100%; height: auto;" />
  <em style="display: block; margin-top: 0.5rem; text-align: center;">
    Overview of the Toto 2.0 architecture.
  </em>
</div>

---

## ⚡ Quick Start

Inference code is available on [GitHub](https://github.com/DataDog/toto).

### Installation

```bash
pip install "toto-2 @ git+https://github.com/DataDog/toto.git#subdirectory=toto2"
```

### Inference Example

```python
import torch
from toto2 import Toto2Model

model = Toto2Model.from_pretrained("Datadog/Toto-2.0-22m")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device).eval()

# (batch, n_variates, time_steps)
target = torch.randn(1, 1, 512, device=device)
target_mask = torch.ones_like(target, dtype=torch.bool)
series_ids = torch.zeros(1, 1, dtype=torch.long, device=device)

# Returns quantiles of shape (9, batch, n_variates, horizon)
# Quantile levels: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
quantiles = model.forecast(
    {"target": target, "target_mask": target_mask, "series_ids": series_ids},
    horizon=96,
    decode_block_size=768,
    has_missing_values=False,
)
```

For more examples, see the [Quick Start notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/quick_start.ipynb) and [GluonTS integration notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/gluonts_integration.ipynb).

---

## 💾 Available Checkpoints

| Checkpoint | Parameters |
|---|---|
| [Toto-2.0-4m](https://huggingface.co/Datadog/Toto-2.0-4m) | 4M |
| [Toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) | 22M |
| [Toto-2.0-313m](https://huggingface.co/Datadog/Toto-2.0-313m) | 313M |
| [Toto-2.0-1B](https://huggingface.co/Datadog/Toto-2.0-1B) | 1B |
| [Toto-2.0-2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B |

---

## 🔗 Additional Resources

- **[GitHub Repository](https://github.com/DataDog/toto)**
- **[BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM)**
- **[Toto 1.0 Weights](https://huggingface.co/Datadog/Toto-Open-Base-1.0)**

---

## 📖 Citation

```bibtex
(citation coming soon)
```