gorold commited on
Commit
025dc59
·
verified ·
1 Parent(s): 8127dec

Update model card README

Browse files
Files changed (1) hide show
  1. README.md +111 -5
README.md CHANGED
@@ -1,10 +1,116 @@
1
  ---
2
  tags:
3
- - model_hub_mixin
 
 
 
 
 
 
 
4
  - pytorch_model_hub_mixin
 
 
 
 
5
  ---
 
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: [More Information Needed]
9
- - Paper: [More Information Needed]
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  tags:
3
+ - time-series-forecasting
4
+ - foundation-models
5
+ - pretrained-models
6
+ - time-series
7
+ - timeseries
8
+ - forecasting
9
+ - observability
10
+ - safetensors
11
  - pytorch_model_hub_mixin
12
+ license: apache-2.0
13
+ pipeline_tag: time-series-forecasting
14
+ datasets:
15
+ - Datadog/BOOM
16
  ---
17
+ # Toto-2.0-4m
18
 
19
+ Toto (**T**ime Series **O**ptimized **T**ransformer for [**O**bservability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/). **Toto 2.0** is the current generation, featuring u-μP-scaled transformers ranging from 4M to 2.5B parameters.
20
+
21
+ **Toto-2.0-4m** is the 4M-parameter variant — the smallest in the family, suitable for low-latency or resource-constrained settings.
22
+
23
+ ---
24
+
25
+ ## ✨ Key Features
26
+
27
+ - **Zero-Shot Forecasting**: Forecast without fine-tuning on your specific time series.
28
+ - **Multi-Variate Support**: Efficiently process multiple variables using alternating time/variate attention.
29
+ - **Probabilistic Predictions**: Generate point forecasts and uncertainty estimates via a quantile output head.
30
+ - **Decoder-Only Architecture**: Support for variable prediction horizons and context lengths.
31
+ - **u-μP Scaling**: Stable training transfer across all model sizes.
32
+
33
+ <div style="width: 100%; margin: auto; padding: 1rem;">
34
+ <img src="figures/architecture.png" alt="Toto 2.0 architecture" style="width: 100%; height: auto;" />
35
+ <em style="display: block; margin-top: 0.5rem; text-align: center;">
36
+ Overview of the Toto 2.0 architecture.
37
+ </em>
38
+ </div>
39
+
40
+ ---
41
+
42
+ ## ⚡ Quick Start
43
+
44
+ Inference code is available on [GitHub](https://github.com/DataDog/toto).
45
+
46
+ ### Installation
47
+
48
+ ```bash
49
+ pip install "toto-2 @ git+https://github.com/DataDog/toto.git#subdirectory=toto2"
50
+ ```
51
+
52
+ ### Inference Example
53
+
54
+ ```python
55
+ import torch
56
+ from toto2 import Toto2Model
57
+
58
+ model = Toto2Model.from_pretrained("Datadog/Toto-2.0-4m")
59
+ model = model.to("cuda").eval()
60
+
61
+ # (batch, n_variates, time_steps)
62
+ target = torch.randn(1, 1, 512, device="cuda")
63
+ target_mask = torch.ones_like(target, dtype=torch.bool)
64
+ series_ids = torch.zeros(1, 1, dtype=torch.long, device="cuda")
65
+
66
+ # Returns quantiles of shape (9, batch, n_variates, horizon)
67
+ # Quantile levels: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
68
+ quantiles = model.forecast(
69
+ {"target": target, "target_mask": target_mask, "series_ids": series_ids},
70
+ horizon=96,
71
+ )
72
+ ```
73
+
74
+ For more examples, see the [Quick Start notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/quick_start.ipynb) and [GluonTS integration notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/gluonts_integration.ipynb).
75
+
76
+ ---
77
+
78
+ ## 💾 Available Checkpoints
79
+
80
+ | Checkpoint | Parameters |
81
+ |---|---|
82
+ | [Toto-2.0-4m](https://huggingface.co/Datadog/Toto-2.0-4m) | 4M |
83
+ | [Toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) | 22M |
84
+ | [Toto-2.0-313m](https://huggingface.co/Datadog/Toto-2.0-313m) | 313M |
85
+ | [Toto-2.0-1B](https://huggingface.co/Datadog/Toto-2.0-1B) | 1B |
86
+ | [Toto-2.0-2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B |
87
+
88
+ ---
89
+
90
+ ## 🔗 Additional Resources
91
+
92
+ - **[GitHub Repository](https://github.com/DataDog/toto)**
93
+ - **[BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM)**
94
+ - **[Toto 1.0 Weights](https://huggingface.co/Datadog/Toto-Open-Base-1.0)**
95
+
96
+ ---
97
+
98
+ ## 📖 Citation
99
+
100
+ ```bibtex
101
+ (citation coming soon)
102
+ ```
103
+
104
+ For Toto 1.0:
105
+
106
+ ```bibtex
107
+ @misc{cohen2025timedifferentobservabilityperspective,
108
+ title={This Time is Different: An Observability Perspective on Time Series Foundation Models},
109
+ author={Ben Cohen and Emaad Khwaja and Youssef Doubli and Salahidine Lemaachi and Chris Lettieri and Charles Masson and Hugo Miccinilli and Elise Ramé and Qiqi Ren and Afshin Rostamizadeh and Jean Ogier du Terrail and Anna-Monica Toon and Kan Wang and Stephan Xie and Zongzhe Xu and Viktoriya Zhukova and David Asker and Ameet Talwalkar and Othmane Abou-Amal},
110
+ year={2025},
111
+ eprint={2505.14766},
112
+ archivePrefix={arXiv},
113
+ primaryClass={cs.LG},
114
+ url={https://arxiv.org/abs/2505.14766},
115
+ }
116
+ ```