File size: 4,277 Bytes
4d38d64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4718b51
4d38d64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8049720
4d38d64
 
 
 
 
 
 
 
 
 
589b504
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
tags:
- time-series-forecasting
- foundation-models
- finetuned
- time-series
- timeseries
- forecasting
- observability
- gift-eval
- safetensors
- pytorch_model_hub_mixin
license: apache-2.0
pipeline_tag: time-series-forecasting
thumbnail: https://corp.dd-static.net/img/about/presskit/kit/press_kit.png
base_model: Datadog/Toto-2.0-2.5B
model-index:
- name: Toto-2.0-2.5B-FT
  results:
    - task:
        type: time-series-forecasting
      dataset:
        name: GIFT-Eval
        type: GIFT-Eval
      metrics:
        - name: CRPS
          type: CRPS
          value: 0.463
        - name: MASE
          type: MASE
          value: 0.679
      source:
        name: GIFT-Eval Time Series Forecasting Leaderboard
        url: https://huggingface.co/spaces/Salesforce/GIFT-Eval
---

# Toto-2.0-2.5B-FT

> [!WARNING]
> **This is a benchmarking checkpoint, not a general-purpose model.**
> Toto-2.0-2.5B-FT is the [Toto 2.0 2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) base model finetuned on the GIFT-Eval training split for our **#2-on-GIFT-Eval-leaderboard** submission. It is released for reproducibility only.
>
> For real workloads, please use the base [Toto 2.0 collection](https://huggingface.co/collections/Datadog/toto-20). The base checkpoints are pretrained without any public data, generalize to every benchmark we have evaluated, and are what we recommend deploying.

## ✨ What this is?

A single Toto 2.0 2.5B base checkpoint finetuned on a mix that **includes the GIFT-Eval training split**, used to probe how far the base model can be pushed on a single in-distribution benchmark.

<figure>
<img src="assets/bar_metrics_gift_eval.png" alt="GIFT-Eval bar metrics β€” Toto 2.0 2.5B-FT highlighted">
<figcaption>On the full GIFT-Eval leaderboard (foundation models + finetuned + ensemble + agentic), Toto-2.0-2.5B-FT places <b>#2 on CRPS rank, MASE rank, and #3 on raw CRPS / MASE</b>, behind only the <a href="https://huggingface.co/Datadog/Toto-2.0-Family-and-Friends">Toto 2.0 Family-and-Friends</a> ensemble.</figcaption>
</figure>

## πŸ” Finetuning recipe

Starting from a fully-decayed [Toto-2.0-2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) base checkpoint, we finetuned for 10,000 steps on a mix designed to expose the model to in-distribution structure without overfitting to GIFT-Eval alone:

| Source | Share |
|---|---:|
| GIFT-Eval Pretrain | 45% |
| Datadog 5-minute+ observability metrics | 25% |
| GIFT-Eval train split | 15% |
| Synthetic (TempoPFN) | 10% |
| Datadog 10s observability metrics | 2.5% |
| Datadog 60s observability metrics | 2.5% |

The public portion (45% GIFT-Eval Pretrain) is drawn from the Toto 1.0 mix of GIFT-Eval Pretrain and the Chronos pretraining corpus, and is non-leaking with respect to the GIFT-Eval test split.

NorMuon and AdamW learning rates were both dropped by roughly an order of magnitude from pretraining (to 0.05 and 0.001 respectively). All other architecture and inference settings match the base 2.5B model.

## πŸ”— Additional Resources

- [Technical Report](https://arxiv.org/abs/2605.20119)
- [Blog Post](https://www.datadoghq.com/blog/ai/toto-2/)
- [Base model: Toto-2.0-2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) β€” the unfinetuned checkpoint, which is what we recommend deploying
- [Toto 2.0 Collection](https://huggingface.co/collections/Datadog/toto-20) β€” all five base sizes (4m β†’ 2.5B)
- [Toto 2.0 Family-and-Friends](https://huggingface.co/Datadog/Toto-2.0-Family-and-Friends) β€” companion FFORMA-ensemble submission, also benchmark-only
- [GIFT-Eval benchmark](https://huggingface.co/spaces/Salesforce/GIFT-Eval) β€” leaderboard hosting this submission
- [GitHub Repository](https://github.com/DataDog/toto)

## πŸ“– Citation

```bibtex
@misc{khwaja2026toto20timeseries,
      title={Toto 2.0: Time Series Forecasting Enters the Scaling Era}, 
      author={Emaad Khwaja and Chris Lettieri and Gerald Woo and Eden Belouadah and Marc Cenac and Guillaume Jarry and Enguerrand Paquin and Xunyi Zhao and Viktoriya Zhukov and Othmane Abou-Amal and Chenghao Liu and Ameet Talwalkar and David Asker},
      year={2026},
      eprint={2605.20119},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.20119}, 
}
```