Tighten spacing: drop section HRs; convert figure prose to captions
Browse files
README.md
CHANGED
|
@@ -68,15 +68,12 @@ Toto (Time Series Optimized Transformer for [Observability](https://www.datadogh
|
|
| 68 |
|
| 69 |
The family sets a new state of the art on three forecasting benchmarks: [BOOM](https://huggingface.co/spaces/Datadog/BOOM), our observability benchmark; [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), the standard general-purpose benchmark; and the recent contamination-resistant [TIME](https://arxiv.org/abs/2602.12147) benchmark.
|
| 70 |
|
| 71 |
-
---
|
| 72 |
-
|
| 73 |
## 📊 Performance
|
| 74 |
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
Every Toto 2.0 size sits on or near the Pareto frontier on both BOOM and GIFT-Eval. The three largest sizes rank first, second, and third among foundation models on GIFT-Eval CRPS rank. On TIME, Toto 2.0 sizes take the top three spots on every metric, ahead of every other external foundation model evaluated.
|
| 78 |
-
|
| 79 |
-
---
|
| 80 |
|
| 81 |
## ⚡ Quick Start
|
| 82 |
|
|
@@ -115,8 +112,6 @@ quantiles = model.forecast(
|
|
| 115 |
|
| 116 |
For more examples, see the [Quick Start notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/quick_start.ipynb) and [GluonTS integration notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/gluonts_integration.ipynb).
|
| 117 |
|
| 118 |
-
---
|
| 119 |
-
|
| 120 |
## 💾 Available Checkpoints
|
| 121 |
|
| 122 |
All five Toto 2.0 sizes share the same training recipe; pick a size based on your accuracy/latency budget. Latency is forward-pass time for a 1,024-step single-pass forecast at batch size 8 on a single A100.
|
|
@@ -129,8 +124,6 @@ All five Toto 2.0 sizes share the same training recipe; pick a size based on you
|
|
| 129 |
| [Toto‑2.0‑1B](https://huggingface.co/Datadog/Toto-2.0-1B) | 1B | 3.9 GB | ~20.9 ms | Best quality / cost tradeoff for production workloads. |
|
| 130 |
| [Toto‑2.0‑2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B | 9.1 GB | ~36.2 ms | Highest accuracy; #1 foundation model on every benchmark. |
|
| 131 |
|
| 132 |
-
---
|
| 133 |
-
|
| 134 |
## ✨ Key Features
|
| 135 |
|
| 136 |
- **Zero-Shot Forecasting:** Forecast without fine-tuning on your specific time series.
|
|
@@ -139,15 +132,12 @@ All five Toto 2.0 sizes share the same training recipe; pick a size based on you
|
|
| 139 |
- **Decoder-Only Architecture:** Support for variable prediction horizons and context lengths.
|
| 140 |
- **u-μP Scaling:** A single training recipe transfers cleanly across all five sizes (4m → 2.5B).
|
| 141 |
|
| 142 |
-
---
|
| 143 |
-
|
| 144 |
## 🏗️ Architecture
|
| 145 |
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
A decoder-only patched transformer whose attention layers alternate between time-axis (causal) and variate-axis (full) views of the input. Toto 2.0 adds
|
| 149 |
-
|
| 150 |
-
---
|
| 151 |
|
| 152 |
## 🔗 Additional Resources
|
| 153 |
|
|
@@ -158,8 +148,6 @@ A decoder-only patched transformer whose attention layers alternate between time
|
|
| 158 |
- [BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM) — Datadog's observability time-series benchmark
|
| 159 |
- [Toto 1.0 Weights](https://huggingface.co/Datadog/Toto-Open-Base-1.0)
|
| 160 |
|
| 161 |
-
---
|
| 162 |
-
|
| 163 |
## 📖 Citation
|
| 164 |
|
| 165 |
```bibtex
|
|
|
|
| 68 |
|
| 69 |
The family sets a new state of the art on three forecasting benchmarks: [BOOM](https://huggingface.co/spaces/Datadog/BOOM), our observability benchmark; [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), the standard general-purpose benchmark; and the recent contamination-resistant [TIME](https://arxiv.org/abs/2602.12147) benchmark.
|
| 70 |
|
|
|
|
|
|
|
| 71 |
## 📊 Performance
|
| 72 |
|
| 73 |
+
<figure>
|
| 74 |
+
<img src="assets/pareto.png" alt="Pareto frontier on BOOM and GIFT-Eval">
|
| 75 |
+
<figcaption>Every Toto 2.0 size sits on or near the Pareto frontier on both BOOM and GIFT-Eval. The three largest sizes rank first, second, and third among foundation models on GIFT-Eval CRPS rank. On TIME, Toto 2.0 sizes take the top three spots on every metric, ahead of every other external foundation model evaluated.</figcaption>
|
| 76 |
+
</figure>
|
|
|
|
| 77 |
|
| 78 |
## ⚡ Quick Start
|
| 79 |
|
|
|
|
| 112 |
|
| 113 |
For more examples, see the [Quick Start notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/quick_start.ipynb) and [GluonTS integration notebook](https://github.com/DataDog/toto/blob/main/toto2/notebooks/gluonts_integration.ipynb).
|
| 114 |
|
|
|
|
|
|
|
| 115 |
## 💾 Available Checkpoints
|
| 116 |
|
| 117 |
All five Toto 2.0 sizes share the same training recipe; pick a size based on your accuracy/latency budget. Latency is forward-pass time for a 1,024-step single-pass forecast at batch size 8 on a single A100.
|
|
|
|
| 124 |
| [Toto‑2.0‑1B](https://huggingface.co/Datadog/Toto-2.0-1B) | 1B | 3.9 GB | ~20.9 ms | Best quality / cost tradeoff for production workloads. |
|
| 125 |
| [Toto‑2.0‑2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B | 9.1 GB | ~36.2 ms | Highest accuracy; #1 foundation model on every benchmark. |
|
| 126 |
|
|
|
|
|
|
|
| 127 |
## ✨ Key Features
|
| 128 |
|
| 129 |
- **Zero-Shot Forecasting:** Forecast without fine-tuning on your specific time series.
|
|
|
|
| 132 |
- **Decoder-Only Architecture:** Support for variable prediction horizons and context lengths.
|
| 133 |
- **u-μP Scaling:** A single training recipe transfers cleanly across all five sizes (4m → 2.5B).
|
| 134 |
|
|
|
|
|
|
|
| 135 |
## 🏗️ Architecture
|
| 136 |
|
| 137 |
+
<figure>
|
| 138 |
+
<img src="assets/architecture.png" alt="Overview of the Toto 2.0 architecture.">
|
| 139 |
+
<figcaption>A decoder-only patched transformer whose attention layers alternate between time-axis (causal) and variate-axis (full) views of the input. Toto 2.0 adds <b>contiguous patch masking (CPM)</b> for single-pass parallel decoding, a <b>quantile output head</b> trained with pinball loss, a robust arcsinh input scaler, residual MLP patch projections, and is trained with NorMuon. See the <a href="#-additional-resources">technical report</a> for details.</figcaption>
|
| 140 |
+
</figure>
|
|
|
|
| 141 |
|
| 142 |
## 🔗 Additional Resources
|
| 143 |
|
|
|
|
| 148 |
- [BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM) — Datadog's observability time-series benchmark
|
| 149 |
- [Toto 1.0 Weights](https://huggingface.co/Datadog/Toto-Open-Base-1.0)
|
| 150 |
|
|
|
|
|
|
|
| 151 |
## 📖 Citation
|
| 152 |
|
| 153 |
```bibtex
|