Emaad commited on
Commit
72255eb
Β·
verified Β·
1 Parent(s): 8d97785

Tighten spacing: drop section HRs; turn post-image prose into figcaption

Browse files
Files changed (1) hide show
  1. README.md +17 -34
README.md CHANGED
@@ -41,49 +41,42 @@ model-index:
41
  >
42
  > For real workloads, please use the base [Toto 2.0 collection](https://huggingface.co/collections/Datadog/toto-20). The base checkpoints are pretrained without any public data, generalize to every benchmark we have evaluated, and are what we recommend deploying.
43
 
44
- ---
45
-
46
  ## ✨ What this is
47
 
48
  A per-`(frequency, term)` XGBoost gate over a pool of 10 foundation models (5 Toto 2.0 sizes + 5 external models). The meta-learner consumes lightweight tsfeatures from each forecast window and emits a softmax over the model pool; the final forecast is the weighted sum of the 10 base-model quantile predictions. Following the [FFORMA](https://www.sciencedirect.com/science/article/abs/pii/S0169207019300895) framework (Montero-Manso et al., 2020).
49
 
50
- ![GIFT-Eval bar metrics β€” Toto 2.0 FnF highlighted](assets/bar_metrics_gift_eval.png)
51
-
52
- On the full GIFT-Eval leaderboard (foundation + finetuned + ensemble + agentic), Toto-2.0-FnF takes **#1 on every metric** (tied for #1 on raw CRPS).
 
53
 
54
  The replication notebook lives in the GIFT-Eval repo at [notebooks/toto_2_0_fnf.ipynb](https://github.com/SalesforceAIResearch/gift-eval/blob/main/notebooks/toto_2_0_fnf.ipynb).
55
 
56
- ---
57
-
58
  ## 🧩 What's in the ensemble
59
 
60
  The Toto 2.0 family accounts for **39% of the assigned weight** across all predictions β€” more than any other model in the pool, ahead of Chronos-2 (32%) and more than the four remaining external models combined.
61
 
62
- | # | Model | Family |
63
- | - | --------------------------------------------------------------------- | --------- |
64
- | 0 | [chronos-2](https://huggingface.co/amazon/chronos-2) | Chronos |
65
- | 1 | [timesfm-2.5](https://huggingface.co/google/timesfm-2.5-200m-pytorch) | TimesFM |
66
- | 2 | [flowstate](https://huggingface.co/ibm-research/flowstate) | FlowState |
67
- | 3 | [tirex](https://huggingface.co/NX-AI/TiRex-1.1-gifteval) | TiRex |
68
- | 4 | [patchtst-fm](https://huggingface.co/ibm-research/patchtst-fm-r1) | PatchTST |
69
- | 5 | [toto-2.0-4m](https://huggingface.co/Datadog/Toto-2.0-4m) | Toto 2.0 |
70
- | 6 | [toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) | Toto 2.0 |
71
- | 7 | [toto-2.0-313m](https://huggingface.co/Datadog/Toto-2.0-313m) | Toto 2.0 |
72
- | 8 | [toto-2.0-1b](https://huggingface.co/Datadog/Toto-2.0-1B) | Toto 2.0 |
73
- | 9 | [toto-2.0-2.5b](https://huggingface.co/Datadog/Toto-2.0-2.5B) | Toto 2.0 |
74
 
75
  Column order matters β€” it is tied to the booster's class indices.
76
 
77
- ---
78
-
79
  ## ✨ Key Features
80
 
81
  - **Per-bucket gating:** Separate XGBoost head per `(frequency, term)` bucket β€” each bucket learns its own softmax over the model pool so the ensemble can specialize without one global gate trading off across regimes.
82
  - **No retraining at inference:** The bundle ships pre-computed base-model predictions and tsfeatures for the full GIFT-Eval test split, so replication needs neither GPUs nor the base-model libraries.
83
  - **No leakage:** tsfeatures are computed only on the lookback context preceding each forecast window; the bundle stores dataset metadata but not ground-truth labels.
84
 
85
- ---
86
-
87
  ## πŸ“¦ Bundle layout
88
 
89
  ```
@@ -101,8 +94,6 @@ test_predictions/<model>/<ds_dirname>/
101
 
102
  `ds_dirname` follows GIFT-Eval's canonical naming: `<pretty_name>_<freq>_<term>` (e.g. `m4_weekly_W_short`).
103
 
104
- ---
105
-
106
  ## ⚑ How the booster is used
107
 
108
  Per (dataset, term):
@@ -113,35 +104,27 @@ Per (dataset, term):
113
  4. Stack the 10 per-model `test_predictions.npz` arrays into a `(n_windows, 10, 9, prediction_length)` tensor; weight-sum across the model axis β†’ final quantile forecast.
114
  5. Score with `gluonts.evaluate_model` using the same call shape every other GIFT-Eval submission uses (see `evaluate_dataset` in the notebook).
115
 
116
- ---
117
-
118
  ## πŸ” Reproducing from scratch
119
 
120
  Each base model's predictions were generated by running its standard GIFT-Eval notebook (`notebooks/chronos-2.ipynb`, etc.) with a wrapper that saves the per-window quantile forecasts to `test_predictions.npz` instead of going straight into `evaluate_model`. The notebook's "Optional B" section shows the wrapper for every pool member. Time-series features come from the [tsfeatures](https://github.com/Nixtla/tsfeatures) library; "Optional A" in the notebook shows the per-window extraction call. The meta-learner boosters were trained on the corresponding train-window predictions, which are not included in this bundle.
121
 
122
- ---
123
-
124
  ## πŸ”— Additional Resources
125
 
126
  - **Technical Report** β€” *(coming soon)*
127
  - [Blog Post](https://www.datadoghq.com/blog/ai/toto-2/)
128
- - [Toto 2.0 Collection](https://huggingface.co/collections/Datadog/toto-20) β€” base Toto checkpoints (4M β†’ 2.5B), which is what we recommend deploying
129
  - [Toto-2.0-2.5B-FT](https://huggingface.co/Datadog/Toto-2.0-2.5B-FT) β€” companion benchmark-only finetune
130
  - [GIFT-Eval benchmark](https://huggingface.co/spaces/Salesforce/GIFT-Eval) β€” leaderboard hosting this submission
131
  - [Replication notebook](https://github.com/SalesforceAIResearch/gift-eval/blob/main/notebooks/toto_2_0_fnf.ipynb) β€” fast-path scoring + optional regeneration of every artifact in this bundle
132
  - [GitHub Repository](https://github.com/DataDog/toto)
133
  - [BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM)
134
 
135
- ---
136
-
137
  ## πŸ“– Citation
138
 
139
  ```bibtex
140
  (citation coming soon)
141
  ```
142
 
143
- ---
144
-
145
  ## πŸ“ License
146
 
147
  Apache 2.0. Each base model retains its original license β€” see the linked HF repos in the model pool table.
 
41
  >
42
  > For real workloads, please use the base [Toto 2.0 collection](https://huggingface.co/collections/Datadog/toto-20). The base checkpoints are pretrained without any public data, generalize to every benchmark we have evaluated, and are what we recommend deploying.
43
 
 
 
44
  ## ✨ What this is
45
 
46
  A per-`(frequency, term)` XGBoost gate over a pool of 10 foundation models (5 Toto 2.0 sizes + 5 external models). The meta-learner consumes lightweight tsfeatures from each forecast window and emits a softmax over the model pool; the final forecast is the weighted sum of the 10 base-model quantile predictions. Following the [FFORMA](https://www.sciencedirect.com/science/article/abs/pii/S0169207019300895) framework (Montero-Manso et al., 2020).
47
 
48
+ <figure>
49
+ <img src="assets/bar_metrics_gift_eval.png" alt="GIFT-Eval bar metrics β€” Toto 2.0 FnF highlighted">
50
+ <figcaption>On the full GIFT-Eval leaderboard (foundation + finetuned + ensemble + agentic), Toto-2.0-FnF takes <b>#1 on every metric</b> (tied for #1 on raw CRPS).</figcaption>
51
+ </figure>
52
 
53
  The replication notebook lives in the GIFT-Eval repo at [notebooks/toto_2_0_fnf.ipynb](https://github.com/SalesforceAIResearch/gift-eval/blob/main/notebooks/toto_2_0_fnf.ipynb).
54
 
 
 
55
  ## 🧩 What's in the ensemble
56
 
57
  The Toto 2.0 family accounts for **39% of the assigned weight** across all predictions β€” more than any other model in the pool, ahead of Chronos-2 (32%) and more than the four remaining external models combined.
58
 
59
+ | # | Model | Family |
60
+ |:---:|---|:---:|
61
+ | 0 | [chronos-2](https://huggingface.co/amazon/chronos-2) | Chronos |
62
+ | 1 | [timesfm-2.5](https://huggingface.co/google/timesfm-2.5-200m-pytorch) | TimesFM |
63
+ | 2 | [flowstate](https://huggingface.co/ibm-research/flowstate) | FlowState |
64
+ | 3 | [tirex](https://huggingface.co/NX-AI/TiRex-1.1-gifteval) | TiRex |
65
+ | 4 | [patchtst-fm](https://huggingface.co/ibm-research/patchtst-fm-r1) | PatchTST |
66
+ | 5 | [toto-2.0-4m](https://huggingface.co/Datadog/Toto-2.0-4m) | Toto 2.0 |
67
+ | 6 | [toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) | Toto 2.0 |
68
+ | 7 | [toto-2.0-313m](https://huggingface.co/Datadog/Toto-2.0-313m) | Toto 2.0 |
69
+ | 8 | [toto-2.0-1b](https://huggingface.co/Datadog/Toto-2.0-1B) | Toto 2.0 |
70
+ | 9 | [toto-2.0-2.5b](https://huggingface.co/Datadog/Toto-2.0-2.5B) | Toto 2.0 |
71
 
72
  Column order matters β€” it is tied to the booster's class indices.
73
 
 
 
74
  ## ✨ Key Features
75
 
76
  - **Per-bucket gating:** Separate XGBoost head per `(frequency, term)` bucket β€” each bucket learns its own softmax over the model pool so the ensemble can specialize without one global gate trading off across regimes.
77
  - **No retraining at inference:** The bundle ships pre-computed base-model predictions and tsfeatures for the full GIFT-Eval test split, so replication needs neither GPUs nor the base-model libraries.
78
  - **No leakage:** tsfeatures are computed only on the lookback context preceding each forecast window; the bundle stores dataset metadata but not ground-truth labels.
79
 
 
 
80
  ## πŸ“¦ Bundle layout
81
 
82
  ```
 
94
 
95
  `ds_dirname` follows GIFT-Eval's canonical naming: `<pretty_name>_<freq>_<term>` (e.g. `m4_weekly_W_short`).
96
 
 
 
97
  ## ⚑ How the booster is used
98
 
99
  Per (dataset, term):
 
104
  4. Stack the 10 per-model `test_predictions.npz` arrays into a `(n_windows, 10, 9, prediction_length)` tensor; weight-sum across the model axis β†’ final quantile forecast.
105
  5. Score with `gluonts.evaluate_model` using the same call shape every other GIFT-Eval submission uses (see `evaluate_dataset` in the notebook).
106
 
 
 
107
  ## πŸ” Reproducing from scratch
108
 
109
  Each base model's predictions were generated by running its standard GIFT-Eval notebook (`notebooks/chronos-2.ipynb`, etc.) with a wrapper that saves the per-window quantile forecasts to `test_predictions.npz` instead of going straight into `evaluate_model`. The notebook's "Optional B" section shows the wrapper for every pool member. Time-series features come from the [tsfeatures](https://github.com/Nixtla/tsfeatures) library; "Optional A" in the notebook shows the per-window extraction call. The meta-learner boosters were trained on the corresponding train-window predictions, which are not included in this bundle.
110
 
 
 
111
  ## πŸ”— Additional Resources
112
 
113
  - **Technical Report** β€” *(coming soon)*
114
  - [Blog Post](https://www.datadoghq.com/blog/ai/toto-2/)
115
+ - [Toto 2.0 Collection](https://huggingface.co/collections/Datadog/toto-20) β€” base Toto checkpoints (4m β†’ 2.5B), which is what we recommend deploying
116
  - [Toto-2.0-2.5B-FT](https://huggingface.co/Datadog/Toto-2.0-2.5B-FT) β€” companion benchmark-only finetune
117
  - [GIFT-Eval benchmark](https://huggingface.co/spaces/Salesforce/GIFT-Eval) β€” leaderboard hosting this submission
118
  - [Replication notebook](https://github.com/SalesforceAIResearch/gift-eval/blob/main/notebooks/toto_2_0_fnf.ipynb) β€” fast-path scoring + optional regeneration of every artifact in this bundle
119
  - [GitHub Repository](https://github.com/DataDog/toto)
120
  - [BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM)
121
 
 
 
122
  ## πŸ“– Citation
123
 
124
  ```bibtex
125
  (citation coming soon)
126
  ```
127
 
 
 
128
  ## πŸ“ License
129
 
130
  Apache 2.0. Each base model retains its original license β€” see the linked HF repos in the model pool table.