Datadog
/

Toto-2.0-4m

@@ -1,10 +1,116 @@
 ---
 tags:
-- model_hub_mixin
 - pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: [More Information Needed]
-- Paper: [More Information Needed]
-- Docs: [More Information Needed]

 ---
 tags:
+- time-series-forecasting
+- foundation-models
+- pretrained-models
+- time-series
+- timeseries
+- forecasting
+- observability
+- safetensors
 - pytorch_model_hub_mixin
+license: apache-2.0
+pipeline_tag: time-series-forecasting
+datasets:
+- Datadog/BOOM
 ---
+# Toto-2.0-4m
+Toto (**T**ime Series **O**ptimized **T**ransformer for [**O**bservability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/). **Toto 2.0** is the current generation, featuring u-μP-scaled transformers ranging from 4M to 2.5B parameters.
+**Toto-2.0-4m** is the 4M-parameter variant — the smallest in the family, suitable for low-latency or resource-constrained settings.
+---
+## ✨ Key Features
+- **Zero-Shot Forecasting**: Forecast without fine-tuning on your specific time series.
+- **Multi-Variate Support**: Efficiently process multiple variables using alternating time/variate attention.
+- **Probabilistic Predictions**: Generate point forecasts and uncertainty estimates via a quantile output head.
+- **Decoder-Only Architecture**: Support for variable prediction horizons and context lengths.
+- **u-μP Scaling**: Stable training transfer across all model sizes.
+<div style="width: 100%; margin: auto; padding: 1rem;">
+  <img src="figures/architecture.png" alt="Toto 2.0 architecture" style="width: 100%; height: auto;" />
+  <em style="display: block; margin-top: 0.5rem; text-align: center;">
+    Overview of the Toto 2.0 architecture.
+  </em>
+</div>
+---
+## ⚡ Quick Start
+Inference code is available on [GitHub](https://github.com/DataDog/toto).
+### Installation
+```bash
+pip install "toto-2 @ git+https://github.com/DataDog/toto.git#subdirectory=toto2"
+```
+### Inference Example
+```python
+import torch
+from toto2 import Toto2Model
+model = Toto2Model.from_pretrained("Datadog/Toto-2.0-4m")
+model = model.to("cuda").eval()
+# (batch, n_variates, time_steps)
+target = torch.randn(1, 1, 512, device="cuda")
+target_mask = torch.ones_like(target, dtype=torch.bool)
+series_ids = torch.zeros(1, 1, dtype=torch.long, device="cuda")
+# Returns quantiles of shape (9, batch, n_variates, horizon)
+# Quantile levels: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
+quantiles = model.forecast(
+    {"target": target, "target_mask": target_mask, "series_ids": series_ids},
+    horizon=96,
+)
+```
+For more examples, see the [Quick Start notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/quick_start.ipynb) and [GluonTS integration notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/gluonts_integration.ipynb).
+---
+## 💾 Available Checkpoints
+| Checkpoint | Parameters |
+|---|---|
+| [Toto-2.0-4m](https://huggingface.co/Datadog/Toto-2.0-4m) | 4M |
+| [Toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) | 22M |
+| [Toto-2.0-313m](https://huggingface.co/Datadog/Toto-2.0-313m) | 313M |
+| [Toto-2.0-1B](https://huggingface.co/Datadog/Toto-2.0-1B) | 1B |
+| [Toto-2.0-2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B |
+---
+## 🔗 Additional Resources
+- **[GitHub Repository](https://github.com/DataDog/toto)**
+- **[BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM)**
+- **[Toto 1.0 Weights](https://huggingface.co/Datadog/Toto-Open-Base-1.0)**
+---
+## 📖 Citation
+```bibtex
+(citation coming soon)
+```
+For Toto 1.0:
+```bibtex
+@misc{cohen2025timedifferentobservabilityperspective,
+      title={This Time is Different: An Observability Perspective on Time Series Foundation Models},
+      author={Ben Cohen and Emaad Khwaja and Youssef Doubli and Salahidine Lemaachi and Chris Lettieri and Charles Masson and Hugo Miccinilli and Elise Ramé and Qiqi Ren and Afshin Rostamizadeh and Jean Ogier du Terrail and Anna-Monica Toon and Kan Wang and Stephan Xie and Zongzhe Xu and Viktoriya Zhukova and David Asker and Ameet Talwalkar and Othmane Abou-Amal},
+      year={2025},
+      eprint={2505.14766},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2505.14766},
+}
+```