Datadog
/

Toto-2.0-2.5B

@@ -119,17 +119,15 @@ For more examples, see the [Quick Start notebook](https://github.com/DataDog/tot
 ## 💾 Available Checkpoints
-All five Toto 2.0 sizes share the same training recipe; pick a size based on your accuracy/latency budget. Latencies are forward-pass time for a 1,024-step forecast at batch size 8 on a single A100.
-| Model | Params | Single-pass latency<br>(1,024 horizon) | Block decoding<br>(block=768) | Recommended for |
-|---|---|---|---|---|
-| [Toto‑2.0‑4m](https://huggingface.co/Datadog/Toto-2.0-4m)     | 4m     | ~3.8 ms  | ~10.0 ms | Edge / CPU deployment; tightest latency or memory budgets. |
-| [Toto‑2.0‑22m](https://huggingface.co/Datadog/Toto-2.0-22m)   | 22m    | ~5.0 ms  | ~12.8 ms | Efficient default — matches or beats Toto 1.0 quality with ~7× fewer parameters. |
-| [Toto‑2.0‑313m](https://huggingface.co/Datadog/Toto-2.0-313m) | 313m   | ~15.4 ms | ~32.4 ms | Strong general-purpose checkpoint; top-3 foundation model on GIFT-Eval. |
-| [Toto‑2.0‑1B](https://huggingface.co/Datadog/Toto-2.0-1B)     | 1B     | ~20.9 ms | ~46.3 ms | Best quality / cost tradeoff for production workloads. |
-| [Toto‑2.0‑2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B   | ~36.2 ms | ~78.0 ms | Highest accuracy; #1 foundation model on every benchmark. |
-> Single-pass decoding fills the entire horizon in one forward pass and is recommended up to ~768 steps. Block decoding generates the horizon in 768-step segments conditioned on the previous segment's median (with KV caching); it is slower but more stable at long horizons. Both modes use the same checkpoint.
 ---

 ## 💾 Available Checkpoints
+All five Toto 2.0 sizes share the same training recipe; pick a size based on your accuracy/latency budget. Latency is forward-pass time for a 1,024-step single-pass forecast at batch size 8 on a single A100.
+| Model | Params | Latency | Recommended for |
+|---|---|---|---|
+| [Toto‑2.0‑4m](https://huggingface.co/Datadog/Toto-2.0-4m)     | 4m   | ~3.8 ms  | Edge / CPU deployment; tightest latency or memory budgets. |
+| [Toto‑2.0‑22m](https://huggingface.co/Datadog/Toto-2.0-22m)   | 22m  | ~5.0 ms  | Efficient default — matches or beats Toto 1.0 quality with ~7× fewer parameters. |
+| [Toto‑2.0‑313m](https://huggingface.co/Datadog/Toto-2.0-313m) | 313m | ~15.4 ms | Strong general-purpose checkpoint; top-3 foundation model on GIFT-Eval. |
+| [Toto‑2.0‑1B](https://huggingface.co/Datadog/Toto-2.0-1B)     | 1B   | ~20.9 ms | Best quality / cost tradeoff for production workloads. |
+| [Toto‑2.0‑2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) | 2.5B | ~36.2 ms | Highest accuracy; #1 foundation model on every benchmark. |
 ---