Add checkpoint size (fp32 weights) column to Available Checkpoints
Browse files
README.md
CHANGED
|
@@ -121,13 +121,13 @@ For more examples, see the [Quick Start notebook](https://github.com/DataDog/tot
|
|
| 121 |
|
| 122 |
All five Toto 2.0 sizes share the same training recipe; pick a size based on your accuracy/latency budget. Latency is forward-pass time for a 1,024-step single-pass forecast at batch size 8 on a single A100.
|
| 123 |
|
| 124 |
-
| Model | Params | Latency | Recommended for |
|
| 125 |
-
|---|---|---|---|
|
| 126 |
-
| [Toto‑2.0‑4m](https://huggingface.co/Datadog/Toto-2.0-4m) | 4m | ~3.8 ms | Edge / CPU deployment; tightest latency or memory budgets. |
|
| 127 |
-
| [Toto‑2.0‑22m](https://huggingface.co/Datadog/Toto-2.0-22m) | 22m | ~5.0 ms | Efficient default — matches or beats Toto 1.0 quality with ~7× fewer parameters. |
|
| 128 |
-
| [Toto‑2.0‑313m](https://huggingface.co/Datadog/Toto-2.0-313m) | 313m | ~15.4 ms | Strong general-purpose checkpoint; top-3 foundation model on GIFT-Eval. |
|
| 129 |
-
| [Toto‑2.0‑1B](https://huggingface.co/Datadog/Toto-2.0-1B) | 1B | ~20.9 ms | Best quality / cost tradeoff for production workloads. |
|
| 130 |
-
| [Toto‑2.0‑2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B | ~36.2 ms | Highest accuracy; #1 foundation model on every benchmark. |
|
| 131 |
|
| 132 |
---
|
| 133 |
|
|
@@ -137,7 +137,7 @@ All five Toto 2.0 sizes share the same training recipe; pick a size based on you
|
|
| 137 |
- **Multi-Variate Support:** Efficiently process multiple variables using alternating time/variate attention.
|
| 138 |
- **Probabilistic Predictions:** Generate point forecasts and uncertainty estimates via a quantile output head.
|
| 139 |
- **Decoder-Only Architecture:** Support for variable prediction horizons and context lengths.
|
| 140 |
-
- **u-μP Scaling:** A single training recipe transfers cleanly across all five sizes (
|
| 141 |
|
| 142 |
---
|
| 143 |
|
|
|
|
| 121 |
|
| 122 |
All five Toto 2.0 sizes share the same training recipe; pick a size based on your accuracy/latency budget. Latency is forward-pass time for a 1,024-step single-pass forecast at batch size 8 on a single A100.
|
| 123 |
|
| 124 |
+
| Model | Params | Weights (fp32) | Latency | Recommended for |
|
| 125 |
+
|---|---|---|---|---|
|
| 126 |
+
| [Toto‑2.0‑4m](https://huggingface.co/Datadog/Toto-2.0-4m) | 4m | 16 MB | ~3.8 ms | Edge / CPU deployment; tightest latency or memory budgets. |
|
| 127 |
+
| [Toto‑2.0‑22m](https://huggingface.co/Datadog/Toto-2.0-22m) | 22m | 84 MB | ~5.0 ms | Efficient default — matches or beats Toto 1.0 quality with ~7× fewer parameters. |
|
| 128 |
+
| [Toto‑2.0‑313m](https://huggingface.co/Datadog/Toto-2.0-313m) | 313m | 1.2 GB | ~15.4 ms | Strong general-purpose checkpoint; top-3 foundation model on GIFT-Eval. |
|
| 129 |
+
| [Toto‑2.0‑1B](https://huggingface.co/Datadog/Toto-2.0-1B) | 1B | 3.9 GB | ~20.9 ms | Best quality / cost tradeoff for production workloads. |
|
| 130 |
+
| [Toto‑2.0‑2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B | 9.1 GB | ~36.2 ms | Highest accuracy; #1 foundation model on every benchmark. |
|
| 131 |
|
| 132 |
---
|
| 133 |
|
|
|
|
| 137 |
- **Multi-Variate Support:** Efficiently process multiple variables using alternating time/variate attention.
|
| 138 |
- **Probabilistic Predictions:** Generate point forecasts and uncertainty estimates via a quantile output head.
|
| 139 |
- **Decoder-Only Architecture:** Support for variable prediction horizons and context lengths.
|
| 140 |
+
- **u-μP Scaling:** A single training recipe transfers cleanly across all five sizes (4m → 2.5B).
|
| 141 |
|
| 142 |
---
|
| 143 |
|