Tighten spacing: drop section HRs; turn post-image prose into figcaption
Browse files
README.md
CHANGED
|
@@ -41,49 +41,42 @@ model-index:
|
|
| 41 |
>
|
| 42 |
> For real workloads, please use the base [Toto 2.0 collection](https://huggingface.co/collections/Datadog/toto-20). The base checkpoints are pretrained without any public data, generalize to every benchmark we have evaluated, and are what we recommend deploying.
|
| 43 |
|
| 44 |
-
---
|
| 45 |
-
|
| 46 |
## β¨ What this is
|
| 47 |
|
| 48 |
A per-`(frequency, term)` XGBoost gate over a pool of 10 foundation models (5 Toto 2.0 sizes + 5 external models). The meta-learner consumes lightweight tsfeatures from each forecast window and emits a softmax over the model pool; the final forecast is the weighted sum of the 10 base-model quantile predictions. Following the [FFORMA](https://www.sciencedirect.com/science/article/abs/pii/S0169207019300895) framework (Montero-Manso et al., 2020).
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
On the full GIFT-Eval leaderboard (foundation + finetuned + ensemble + agentic), Toto-2.0-FnF takes
|
|
|
|
| 53 |
|
| 54 |
The replication notebook lives in the GIFT-Eval repo at [notebooks/toto_2_0_fnf.ipynb](https://github.com/SalesforceAIResearch/gift-eval/blob/main/notebooks/toto_2_0_fnf.ipynb).
|
| 55 |
|
| 56 |
-
---
|
| 57 |
-
|
| 58 |
## π§© What's in the ensemble
|
| 59 |
|
| 60 |
The Toto 2.0 family accounts for **39% of the assigned weight** across all predictions β more than any other model in the pool, ahead of Chronos-2 (32%) and more than the four remaining external models combined.
|
| 61 |
|
| 62 |
-
| # | Model
|
| 63 |
-
|
|
| 64 |
-
| 0 | [chronos-2](https://huggingface.co/amazon/chronos-2)
|
| 65 |
-
| 1 | [timesfm-2.5](https://huggingface.co/google/timesfm-2.5-200m-pytorch) | TimesFM
|
| 66 |
-
| 2 | [flowstate](https://huggingface.co/ibm-research/flowstate)
|
| 67 |
-
| 3 | [tirex](https://huggingface.co/NX-AI/TiRex-1.1-gifteval)
|
| 68 |
-
| 4 | [patchtst-fm](https://huggingface.co/ibm-research/patchtst-fm-r1)
|
| 69 |
-
| 5 | [toto-2.0-4m](https://huggingface.co/Datadog/Toto-2.0-4m)
|
| 70 |
-
| 6 | [toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m)
|
| 71 |
-
| 7 | [toto-2.0-313m](https://huggingface.co/Datadog/Toto-2.0-313m)
|
| 72 |
-
| 8 | [toto-2.0-1b](https://huggingface.co/Datadog/Toto-2.0-1B)
|
| 73 |
-
| 9 | [toto-2.0-2.5b](https://huggingface.co/Datadog/Toto-2.0-2.5B)
|
| 74 |
|
| 75 |
Column order matters β it is tied to the booster's class indices.
|
| 76 |
|
| 77 |
-
---
|
| 78 |
-
|
| 79 |
## β¨ Key Features
|
| 80 |
|
| 81 |
- **Per-bucket gating:** Separate XGBoost head per `(frequency, term)` bucket β each bucket learns its own softmax over the model pool so the ensemble can specialize without one global gate trading off across regimes.
|
| 82 |
- **No retraining at inference:** The bundle ships pre-computed base-model predictions and tsfeatures for the full GIFT-Eval test split, so replication needs neither GPUs nor the base-model libraries.
|
| 83 |
- **No leakage:** tsfeatures are computed only on the lookback context preceding each forecast window; the bundle stores dataset metadata but not ground-truth labels.
|
| 84 |
|
| 85 |
-
---
|
| 86 |
-
|
| 87 |
## π¦ Bundle layout
|
| 88 |
|
| 89 |
```
|
|
@@ -101,8 +94,6 @@ test_predictions/<model>/<ds_dirname>/
|
|
| 101 |
|
| 102 |
`ds_dirname` follows GIFT-Eval's canonical naming: `<pretty_name>_<freq>_<term>` (e.g. `m4_weekly_W_short`).
|
| 103 |
|
| 104 |
-
---
|
| 105 |
-
|
| 106 |
## β‘ How the booster is used
|
| 107 |
|
| 108 |
Per (dataset, term):
|
|
@@ -113,35 +104,27 @@ Per (dataset, term):
|
|
| 113 |
4. Stack the 10 per-model `test_predictions.npz` arrays into a `(n_windows, 10, 9, prediction_length)` tensor; weight-sum across the model axis β final quantile forecast.
|
| 114 |
5. Score with `gluonts.evaluate_model` using the same call shape every other GIFT-Eval submission uses (see `evaluate_dataset` in the notebook).
|
| 115 |
|
| 116 |
-
---
|
| 117 |
-
|
| 118 |
## π Reproducing from scratch
|
| 119 |
|
| 120 |
Each base model's predictions were generated by running its standard GIFT-Eval notebook (`notebooks/chronos-2.ipynb`, etc.) with a wrapper that saves the per-window quantile forecasts to `test_predictions.npz` instead of going straight into `evaluate_model`. The notebook's "Optional B" section shows the wrapper for every pool member. Time-series features come from the [tsfeatures](https://github.com/Nixtla/tsfeatures) library; "Optional A" in the notebook shows the per-window extraction call. The meta-learner boosters were trained on the corresponding train-window predictions, which are not included in this bundle.
|
| 121 |
|
| 122 |
-
---
|
| 123 |
-
|
| 124 |
## π Additional Resources
|
| 125 |
|
| 126 |
- **Technical Report** β *(coming soon)*
|
| 127 |
- [Blog Post](https://www.datadoghq.com/blog/ai/toto-2/)
|
| 128 |
-
- [Toto 2.0 Collection](https://huggingface.co/collections/Datadog/toto-20) β base Toto checkpoints (
|
| 129 |
- [Toto-2.0-2.5B-FT](https://huggingface.co/Datadog/Toto-2.0-2.5B-FT) β companion benchmark-only finetune
|
| 130 |
- [GIFT-Eval benchmark](https://huggingface.co/spaces/Salesforce/GIFT-Eval) β leaderboard hosting this submission
|
| 131 |
- [Replication notebook](https://github.com/SalesforceAIResearch/gift-eval/blob/main/notebooks/toto_2_0_fnf.ipynb) β fast-path scoring + optional regeneration of every artifact in this bundle
|
| 132 |
- [GitHub Repository](https://github.com/DataDog/toto)
|
| 133 |
- [BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM)
|
| 134 |
|
| 135 |
-
---
|
| 136 |
-
|
| 137 |
## π Citation
|
| 138 |
|
| 139 |
```bibtex
|
| 140 |
(citation coming soon)
|
| 141 |
```
|
| 142 |
|
| 143 |
-
---
|
| 144 |
-
|
| 145 |
## π License
|
| 146 |
|
| 147 |
Apache 2.0. Each base model retains its original license β see the linked HF repos in the model pool table.
|
|
|
|
| 41 |
>
|
| 42 |
> For real workloads, please use the base [Toto 2.0 collection](https://huggingface.co/collections/Datadog/toto-20). The base checkpoints are pretrained without any public data, generalize to every benchmark we have evaluated, and are what we recommend deploying.
|
| 43 |
|
|
|
|
|
|
|
| 44 |
## β¨ What this is
|
| 45 |
|
| 46 |
A per-`(frequency, term)` XGBoost gate over a pool of 10 foundation models (5 Toto 2.0 sizes + 5 external models). The meta-learner consumes lightweight tsfeatures from each forecast window and emits a softmax over the model pool; the final forecast is the weighted sum of the 10 base-model quantile predictions. Following the [FFORMA](https://www.sciencedirect.com/science/article/abs/pii/S0169207019300895) framework (Montero-Manso et al., 2020).
|
| 47 |
|
| 48 |
+
<figure>
|
| 49 |
+
<img src="assets/bar_metrics_gift_eval.png" alt="GIFT-Eval bar metrics β Toto 2.0 FnF highlighted">
|
| 50 |
+
<figcaption>On the full GIFT-Eval leaderboard (foundation + finetuned + ensemble + agentic), Toto-2.0-FnF takes <b>#1 on every metric</b> (tied for #1 on raw CRPS).</figcaption>
|
| 51 |
+
</figure>
|
| 52 |
|
| 53 |
The replication notebook lives in the GIFT-Eval repo at [notebooks/toto_2_0_fnf.ipynb](https://github.com/SalesforceAIResearch/gift-eval/blob/main/notebooks/toto_2_0_fnf.ipynb).
|
| 54 |
|
|
|
|
|
|
|
| 55 |
## π§© What's in the ensemble
|
| 56 |
|
| 57 |
The Toto 2.0 family accounts for **39% of the assigned weight** across all predictions β more than any other model in the pool, ahead of Chronos-2 (32%) and more than the four remaining external models combined.
|
| 58 |
|
| 59 |
+
| # | Model | Family |
|
| 60 |
+
|:---:|---|:---:|
|
| 61 |
+
| 0 | [chronos-2](https://huggingface.co/amazon/chronos-2) | Chronos |
|
| 62 |
+
| 1 | [timesfm-2.5](https://huggingface.co/google/timesfm-2.5-200m-pytorch) | TimesFM |
|
| 63 |
+
| 2 | [flowstate](https://huggingface.co/ibm-research/flowstate) | FlowState |
|
| 64 |
+
| 3 | [tirex](https://huggingface.co/NX-AI/TiRex-1.1-gifteval) | TiRex |
|
| 65 |
+
| 4 | [patchtst-fm](https://huggingface.co/ibm-research/patchtst-fm-r1) | PatchTST |
|
| 66 |
+
| 5 | [toto-2.0-4m](https://huggingface.co/Datadog/Toto-2.0-4m) | Toto 2.0 |
|
| 67 |
+
| 6 | [toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) | Toto 2.0 |
|
| 68 |
+
| 7 | [toto-2.0-313m](https://huggingface.co/Datadog/Toto-2.0-313m) | Toto 2.0 |
|
| 69 |
+
| 8 | [toto-2.0-1b](https://huggingface.co/Datadog/Toto-2.0-1B) | Toto 2.0 |
|
| 70 |
+
| 9 | [toto-2.0-2.5b](https://huggingface.co/Datadog/Toto-2.0-2.5B) | Toto 2.0 |
|
| 71 |
|
| 72 |
Column order matters β it is tied to the booster's class indices.
|
| 73 |
|
|
|
|
|
|
|
| 74 |
## β¨ Key Features
|
| 75 |
|
| 76 |
- **Per-bucket gating:** Separate XGBoost head per `(frequency, term)` bucket β each bucket learns its own softmax over the model pool so the ensemble can specialize without one global gate trading off across regimes.
|
| 77 |
- **No retraining at inference:** The bundle ships pre-computed base-model predictions and tsfeatures for the full GIFT-Eval test split, so replication needs neither GPUs nor the base-model libraries.
|
| 78 |
- **No leakage:** tsfeatures are computed only on the lookback context preceding each forecast window; the bundle stores dataset metadata but not ground-truth labels.
|
| 79 |
|
|
|
|
|
|
|
| 80 |
## π¦ Bundle layout
|
| 81 |
|
| 82 |
```
|
|
|
|
| 94 |
|
| 95 |
`ds_dirname` follows GIFT-Eval's canonical naming: `<pretty_name>_<freq>_<term>` (e.g. `m4_weekly_W_short`).
|
| 96 |
|
|
|
|
|
|
|
| 97 |
## β‘ How the booster is used
|
| 98 |
|
| 99 |
Per (dataset, term):
|
|
|
|
| 104 |
4. Stack the 10 per-model `test_predictions.npz` arrays into a `(n_windows, 10, 9, prediction_length)` tensor; weight-sum across the model axis β final quantile forecast.
|
| 105 |
5. Score with `gluonts.evaluate_model` using the same call shape every other GIFT-Eval submission uses (see `evaluate_dataset` in the notebook).
|
| 106 |
|
|
|
|
|
|
|
| 107 |
## π Reproducing from scratch
|
| 108 |
|
| 109 |
Each base model's predictions were generated by running its standard GIFT-Eval notebook (`notebooks/chronos-2.ipynb`, etc.) with a wrapper that saves the per-window quantile forecasts to `test_predictions.npz` instead of going straight into `evaluate_model`. The notebook's "Optional B" section shows the wrapper for every pool member. Time-series features come from the [tsfeatures](https://github.com/Nixtla/tsfeatures) library; "Optional A" in the notebook shows the per-window extraction call. The meta-learner boosters were trained on the corresponding train-window predictions, which are not included in this bundle.
|
| 110 |
|
|
|
|
|
|
|
| 111 |
## π Additional Resources
|
| 112 |
|
| 113 |
- **Technical Report** β *(coming soon)*
|
| 114 |
- [Blog Post](https://www.datadoghq.com/blog/ai/toto-2/)
|
| 115 |
+
- [Toto 2.0 Collection](https://huggingface.co/collections/Datadog/toto-20) β base Toto checkpoints (4m β 2.5B), which is what we recommend deploying
|
| 116 |
- [Toto-2.0-2.5B-FT](https://huggingface.co/Datadog/Toto-2.0-2.5B-FT) β companion benchmark-only finetune
|
| 117 |
- [GIFT-Eval benchmark](https://huggingface.co/spaces/Salesforce/GIFT-Eval) β leaderboard hosting this submission
|
| 118 |
- [Replication notebook](https://github.com/SalesforceAIResearch/gift-eval/blob/main/notebooks/toto_2_0_fnf.ipynb) β fast-path scoring + optional regeneration of every artifact in this bundle
|
| 119 |
- [GitHub Repository](https://github.com/DataDog/toto)
|
| 120 |
- [BOOM Dataset](https://huggingface.co/datasets/Datadog/BOOM)
|
| 121 |
|
|
|
|
|
|
|
| 122 |
## π Citation
|
| 123 |
|
| 124 |
```bibtex
|
| 125 |
(citation coming soon)
|
| 126 |
```
|
| 127 |
|
|
|
|
|
|
|
| 128 |
## π License
|
| 129 |
|
| 130 |
Apache 2.0. Each base model retains its original license β see the linked HF repos in the model pool table.
|