--- tags: - time-series-forecasting - foundation-models - pretrained-models - time-series - timeseries - forecasting - observability - safetensors - pytorch_model_hub_mixin license: apache-2.0 pipeline_tag: time-series-forecasting thumbnail: results: - task: time-series-forecasting dataset: name: GIFT-Eval metrics: - name: MASE type: mase value: 0.757 - name: CRPS type: brier_score value: 0.524 source: name: GIFT-Eval Time Series Forecasting Leaderboard url: https://huggingface.co/spaces/Salesforce/GIFT-Eval - task: time-series-forecasting dataset: name: BOOM metrics: - name: MASE type: mase value: 0.624 - name: CRPS type: brier_score value: 0.717 source: name: BOOM 💥 Observability Time-Series Forecasting Leaderboard url: https://huggingface.co/spaces/Datadog/BOOM --- # Toto-2.0-4m Toto (**T**ime Series **O**ptimized **T**ransformer for [**O**bservability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/). **Toto 2.0** is the current generation, featuring u-μP-scaled transformers ranging from 4M to 2.5B parameters. --- ## ✨ Key Features - **Zero-Shot Forecasting**: Forecast without fine-tuning on your specific time series. - **Multi-Variate Support**: Efficiently process multiple variables using alternating time/variate attention. - **Probabilistic Predictions**: Generate point forecasts and uncertainty estimates via a quantile output head. - **Decoder-Only Architecture**: Support for variable prediction horizons and context lengths. - **u-μP Scaling**: Stable training transfer across all model sizes.
Toto 2.0 architecture Overview of the Toto 2.0 architecture.
--- ## ⚡ Quick Start Inference code is available on [GitHub](https://github.com/DataDog/toto). ### Installation ```bash pip install "toto-2 @ git+https://github.com/DataDog/toto.git#subdirectory=toto2" ``` ### Inference Example ```python import torch from toto2 import Toto2Model model = Toto2Model.from_pretrained("Datadog/Toto-2.0-22m") device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device).eval() # (batch, n_variates, time_steps) target = torch.randn(1, 1, 512, device=device) target_mask = torch.ones_like(target, dtype=torch.bool) series_ids = torch.zeros(1, 1, dtype=torch.long, device=device) # Returns quantiles of shape (9, batch, n_variates, horizon) # Quantile levels: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] quantiles = model.forecast( {"target": target, "target_mask": target_mask, "series_ids": series_ids}, horizon=96, decode_block_size=768, has_missing_values=False, ) ``` For more examples, see the [Quick Start notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/quick_start.ipynb) and [GluonTS integration notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/gluonts_integration.ipynb). --- ## 💾 Available Checkpoints | Checkpoint | Parameters | |---|---| | [Toto-2.0-4m](https://huggingface.co/Datadog/Toto-2.0-4m) | 4M | | [Toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) | 22M | | [Toto-2.0-313m](https://huggingface.co/Datadog/Toto-2.0-313m) | 313M | | [Toto-2.0-1B](https://huggingface.co/Datadog/Toto-2.0-1B) | 1B | | [Toto-2.0-2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B | --- ## 🔗 Additional Resources - **[GitHub Repository](https://github.com/DataDog/toto)** - **[BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM)** - **[Toto 1.0 Weights](https://huggingface.co/Datadog/Toto-Open-Base-1.0)** --- ## 📖 Citation ```bibtex (citation coming soon) ```