Emaad commited on
Commit
962b925
Β·
verified Β·
1 Parent(s): 2372183

Refresh model card: add Pareto + architecture figures, TIME metrics, latency table

Browse files
Files changed (4) hide show
  1. .gitattributes +2 -0
  2. README.md +62 -30
  3. assets/architecture.png +3 -0
  4. assets/pareto.png +3 -0
.gitattributes CHANGED
@@ -34,3 +34,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  figures/architecture.png filter=lfs diff=lfs merge=lfs -text
 
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  figures/architecture.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/architecture.png filter=lfs diff=lfs merge=lfs -text
38
+ assets/pareto.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -15,7 +15,7 @@ thumbnail: https://corp.dd-static.net/img/about/presskit/kit/press_kit.png
15
  model-index:
16
  - name: Toto-2.0-4m
17
  results:
18
- - task:
19
  type: time-series-forecasting
20
  dataset:
21
  name: BOOM
@@ -23,16 +23,16 @@ model-index:
23
  metrics:
24
  - name: CRPS
25
  type: CRPS
26
- value: 0.717
27
  - name: MASE
28
  type: MASE
29
  value: 0.624
30
  source:
31
  name: BOOM πŸ’₯ Observability Time-Series Forecasting Leaderboard
32
- url: https://huggingface.co/spaces/Datadog/BOOM
33
- - task:
34
  type: time-series-forecasting
35
- dataset:
36
  name: GIFT-Eval
37
  type: GIFT-Eval
38
  metrics:
@@ -45,28 +45,54 @@ model-index:
45
  source:
46
  name: GIFT-Eval Time Series Forecasting Leaderboard
47
  url: https://huggingface.co/spaces/Salesforce/GIFT-Eval
48
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ---
 
50
  # Toto-2.0-4m
51
 
52
- Toto (**T**ime Series **O**ptimized **T**ransformer for [**O**bservability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/). **Toto 2.0** is the current generation, featuring u-ΞΌP-scaled transformers ranging from 4M to 2.5B parameters.
 
 
53
 
54
  ---
55
 
56
  ## ✨ Key Features
57
 
58
- - **Zero-Shot Forecasting**: Forecast without fine-tuning on your specific time series.
59
- - **Multi-Variate Support**: Efficiently process multiple variables using alternating time/variate attention.
60
- - **Probabilistic Predictions**: Generate point forecasts and uncertainty estimates via a quantile output head.
61
- - **Decoder-Only Architecture**: Support for variable prediction horizons and context lengths.
62
- - **u-ΞΌP Scaling**: Stable training transfer across all model sizes.
63
 
64
- <div style="width: 100%; margin: auto; padding: 1rem;">
65
- <img src="figures/architecture.png" alt="Toto 2.0 architecture" style="width: 100%; height: auto;" />
66
- <em style="display: block; margin-top: 0.5rem; text-align: center;">
67
- Overview of the Toto 2.0 architecture.
68
- </em>
69
- </div>
 
 
 
 
 
 
 
 
 
70
 
71
  ---
72
 
@@ -86,7 +112,7 @@ pip install "toto-2 @ git+https://github.com/DataDog/toto.git#subdirectory=toto2
86
  import torch
87
  from toto2 import Toto2Model
88
 
89
- model = Toto2Model.from_pretrained("Datadog/Toto-2.0-22m")
90
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
91
  model = model.to(device).eval()
92
 
@@ -111,22 +137,28 @@ For more examples, see the [Quick Start notebook](https://github.com/DataDog/tot
111
 
112
  ## πŸ’Ύ Available Checkpoints
113
 
114
- | Checkpoint | Parameters |
115
- |---|---|
116
- | [Toto-2.0-4m](https://huggingface.co/Datadog/Toto-2.0-4m) | 4M |
117
- | [Toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) | 22M |
118
- | [Toto-2.0-313m](https://huggingface.co/Datadog/Toto-2.0-313m) | 313M |
119
- | [Toto-2.0-1B](https://huggingface.co/Datadog/Toto-2.0-1B) | 1B |
120
- | [Toto-2.0-2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B |
 
 
 
 
121
 
122
  ---
123
 
124
  ## πŸ”— Additional Resources
125
 
126
- - **[Blog Post](https://www.datadoghq.com/blog/ai/toto-2/)**
127
- - **[GitHub Repository](https://github.com/DataDog/toto)**
128
- - **[BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM)**
129
- - **[Toto 1.0 Weights](https://huggingface.co/Datadog/Toto-Open-Base-1.0)**
 
 
130
 
131
  ---
132
 
 
15
  model-index:
16
  - name: Toto-2.0-4m
17
  results:
18
+ - task:
19
  type: time-series-forecasting
20
  dataset:
21
  name: BOOM
 
23
  metrics:
24
  - name: CRPS
25
  type: CRPS
26
+ value: 0.377
27
  - name: MASE
28
  type: MASE
29
  value: 0.624
30
  source:
31
  name: BOOM πŸ’₯ Observability Time-Series Forecasting Leaderboard
32
+ url: https://huggingface.co/spaces/Datadog/BOOM
33
+ - task:
34
  type: time-series-forecasting
35
+ dataset:
36
  name: GIFT-Eval
37
  type: GIFT-Eval
38
  metrics:
 
45
  source:
46
  name: GIFT-Eval Time Series Forecasting Leaderboard
47
  url: https://huggingface.co/spaces/Salesforce/GIFT-Eval
48
+ - task:
49
+ type: time-series-forecasting
50
+ dataset:
51
+ name: TIME
52
+ type: TIME
53
+ metrics:
54
+ - name: CRPS
55
+ type: CRPS
56
+ value: 0.574
57
+ - name: MASE
58
+ type: MASE
59
+ value: 0.689
60
+ source:
61
+ name: TIME Benchmark Leaderboard
62
+ url: https://huggingface.co/spaces/Real-TSF/TIME-leaderboard
63
  ---
64
+
65
  # Toto-2.0-4m
66
 
67
+ Toto (Time Series Optimized Transformer for [Observability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/). Toto 2.0 is the current generation, featuring u-ΞΌP-scaled transformers ranging from 4M to 2.5B parameters, all trained from a single recipe. Forecast quality improves reliably with parameter count across the family, with no sign of saturation at 2.5B.
68
+
69
+ The family sets a new state of the art on three forecasting benchmarks: [BOOM](https://huggingface.co/spaces/Datadog/BOOM), our observability benchmark; [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), the standard general-purpose benchmark; and the recent contamination-resistant [TIME](https://arxiv.org/abs/2602.12147) benchmark.
70
 
71
  ---
72
 
73
  ## ✨ Key Features
74
 
75
+ - **Zero-Shot Forecasting:** Forecast without fine-tuning on your specific time series.
76
+ - **Multi-Variate Support:** Efficiently process multiple variables using alternating time/variate attention.
77
+ - **Probabilistic Predictions:** Generate point forecasts and uncertainty estimates via a quantile output head.
78
+ - **Decoder-Only Architecture:** Support for variable prediction horizons and context lengths.
79
+ - **u-ΞΌP Scaling:** A single training recipe transfers cleanly across all five sizes (4M β†’ 2.5B).
80
 
81
+ ---
82
+
83
+ ## πŸ—οΈ Architecture
84
+
85
+ ![Overview of the Toto 2.0 architecture.](assets/architecture.png)
86
+
87
+ A decoder-only patched transformer whose attention layers alternate between time-axis (causal) and variate-axis (full) views of the input. Toto 2.0 adds **contiguous patch masking (CPM)** for single-pass parallel decoding, a **quantile output head** trained with pinball loss, a robust arcsinh input scaler, residual MLP patch projections, and is trained with NorMuon. See the [technical report](#-additional-resources) for details.
88
+
89
+ ---
90
+
91
+ ## πŸ“Š Performance
92
+
93
+ ![Pareto frontier on BOOM and GIFT-Eval](assets/pareto.png)
94
+
95
+ Every Toto 2.0 size sits on or near the Pareto frontier on both BOOM and GIFT-Eval. The three largest sizes rank first, second, and third among foundation models on GIFT-Eval CRPS rank. On TIME, Toto 2.0 sizes take the top three spots on every metric, ahead of every other external foundation model evaluated.
96
 
97
  ---
98
 
 
112
  import torch
113
  from toto2 import Toto2Model
114
 
115
+ model = Toto2Model.from_pretrained("Datadog/Toto-2.0-4m")
116
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
117
  model = model.to(device).eval()
118
 
 
137
 
138
  ## πŸ’Ύ Available Checkpoints
139
 
140
+ All five Toto 2.0 sizes share the same training recipe; pick a size based on your accuracy/latency budget. Latencies are forward-pass time for a 1,024-step forecast at batch size 8 on a single A100.
141
+
142
+ | Model | Params | Single-pass latency<br>(1,024 horizon) | Block decoding<br>(block=768) | Recommended for |
143
+ |---|---|---|---|---|
144
+ | [Toto-2.0-4m](https://huggingface.co/Datadog/Toto-2.0-4m) | 4m | ~3.8 ms | ~10.0 ms | Edge / CPU deployment; tightest latency or memory budgets. |
145
+ | [Toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) | 22m | ~5.0 ms | ~12.8 ms | Efficient default β€” matches or beats Toto 1.0 quality with ~7Γ— fewer parameters. |
146
+ | [Toto-2.0-313m](https://huggingface.co/Datadog/Toto-2.0-313m) | 313m | ~15.4 ms | ~32.4 ms | Strong general-purpose checkpoint; top-3 foundation model on GIFT-Eval. |
147
+ | [Toto-2.0-1B](https://huggingface.co/Datadog/Toto-2.0-1B) | 1B | ~20.9 ms | ~46.3 ms | Best quality / cost tradeoff for production workloads. |
148
+ | [Toto-2.0-2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B | ~36.2 ms | ~78.0 ms | Highest accuracy; #1 foundation model on every benchmark. |
149
+
150
+ > Single-pass decoding fills the entire horizon in one forward pass and is recommended up to ~768 steps. Block decoding generates the horizon in 768-step segments conditioned on the previous segment's median (with KV caching); it is slower but more stable at long horizons. Both modes use the same checkpoint.
151
 
152
  ---
153
 
154
  ## πŸ”— Additional Resources
155
 
156
+ - **Technical Report** β€” *(coming soon)*
157
+ - [Blog Post](https://www.datadoghq.com/blog/ai/toto-2/)
158
+ - [GitHub Repository](https://github.com/DataDog/toto)
159
+ - [Toto 2.0 Collection](https://huggingface.co/collections/Datadog/toto-20) β€” all five base checkpoints
160
+ - [BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM) β€” Datadog's observability time-series benchmark
161
+ - [Toto 1.0 Weights](https://huggingface.co/Datadog/Toto-Open-Base-1.0)
162
 
163
  ---
164
 
assets/architecture.png ADDED

Git LFS Details

  • SHA256: 973196289f6036b880ec7fdb00fe0b1078215232bf58f0bdd6a27eeebfca46ef
  • Pointer size: 131 Bytes
  • Size of remote file: 437 kB
assets/pareto.png ADDED

Git LFS Details

  • SHA256: 756a059027357cf224effc92995755b2c50ca8396b68918135ef0e5226798294
  • Pointer size: 131 Bytes
  • Size of remote file: 302 kB