Toto-2.0-2.5B-FT / README.md
Emaad's picture
Tighten spacing: drop section HRs; turn post-image prose into figcaption
2eb0725 verified
|
raw
history blame
4.15 kB
metadata
tags:
  - time-series-forecasting
  - foundation-models
  - finetuned
  - time-series
  - timeseries
  - forecasting
  - observability
  - gift-eval
  - safetensors
  - pytorch_model_hub_mixin
license: apache-2.0
pipeline_tag: time-series-forecasting
thumbnail: https://corp.dd-static.net/img/about/presskit/kit/press_kit.png
base_model: Datadog/Toto-2.0-2.5B
model-index:
  - name: Toto-2.0-2.5B-FT
    results:
      - task:
          type: time-series-forecasting
        dataset:
          name: GIFT-Eval
          type: GIFT-Eval
        metrics:
          - name: CRPS
            type: CRPS
            value: 0.463
          - name: MASE
            type: MASE
            value: 0.679
        source:
          name: GIFT-Eval Time Series Forecasting Leaderboard
          url: https://huggingface.co/spaces/Salesforce/GIFT-Eval

Toto-2.0-2.5B-FT

This is a benchmarking checkpoint, not a general-purpose model. Toto-2.0-2.5B-FT is the Toto 2.0 2.5B base model finetuned on the GIFT-Eval training split for our #2-on-GIFT-Eval-leaderboard submission. It is released for reproducibility only.

For real workloads, please use the base Toto 2.0 collection. The base checkpoints are pretrained without any public data, generalize to every benchmark we have evaluated, and are what we recommend deploying.

✨ What this is

A single Toto 2.0 2.5B base checkpoint finetuned on a mix that includes the GIFT-Eval training split, used to probe how far the base model can be pushed on a single in-distribution benchmark.

GIFT-Eval bar metrics β€” Toto 2.0 2.5B-FT highlighted
On the full GIFT-Eval leaderboard (foundation models + finetuned + ensemble + agentic), Toto-2.0-2.5B-FT places #2 on CRPS rank, MASE rank, and #3 on raw CRPS / MASE, behind only the Toto 2.0 Family-and-Friends ensemble.

πŸ” Finetuning recipe

Starting from a fully-decayed Toto-2.0-2.5B base checkpoint, we finetuned for 10,000 steps on a mix designed to expose the model to in-distribution structure without overfitting to GIFT-Eval alone:

Source Share
GIFT-Eval Pretrain 45%
Datadog 5-minute+ observability metrics 25%
GIFT-Eval train split 15%
Synthetic (TempoPFN) 10%
Datadog 10s observability metrics 2.5%
Datadog 60s observability metrics 2.5%

The public portion (45% GIFT-Eval Pretrain) is drawn from the Toto 1.0 mix of GIFT-Eval Pretrain and the Chronos pretraining corpus, and is non-leaking with respect to the GIFT-Eval test split.

NorMuon and AdamW learning rates were both dropped by roughly an order of magnitude from pretraining (to 0.05 and 0.001 respectively). All other architecture and inference settings match the base 2.5B model.

⚑ Quick Start

import torch
from toto2 import Toto2Model

model = Toto2Model.from_pretrained("Datadog/Toto-2.0-2.5B-FT")
model = model.to("cuda").eval()

# Same forecast() interface as the base 2.5B model.

See the base Toto-2.0-2.5B model card for the full inference example.

πŸ”— Additional Resources

πŸ“ License

Apache 2.0.

πŸ“– Citation

(citation coming soon)