Update README.md
Browse files
README.md
CHANGED
|
@@ -42,7 +42,7 @@ model-index:
|
|
| 42 |
>
|
| 43 |
> For real workloads, please use the base [Toto 2.0 collection](https://huggingface.co/collections/Datadog/toto-20). The base checkpoints are pretrained without any public data, generalize to every benchmark we have evaluated, and are what we recommend deploying.
|
| 44 |
|
| 45 |
-
## β¨ What this is
|
| 46 |
|
| 47 |
A single Toto 2.0 2.5B base checkpoint finetuned on a mix that **includes the GIFT-Eval training split**, used to probe how far the base model can be pushed on a single in-distribution benchmark.
|
| 48 |
|
|
@@ -68,20 +68,6 @@ The public portion (45% GIFT-Eval Pretrain) is drawn from the Toto 1.0 mix of GI
|
|
| 68 |
|
| 69 |
NorMuon and AdamW learning rates were both dropped by roughly an order of magnitude from pretraining (to 0.05 and 0.001 respectively). All other architecture and inference settings match the base 2.5B model.
|
| 70 |
|
| 71 |
-
## β‘ Quick Start
|
| 72 |
-
|
| 73 |
-
```python
|
| 74 |
-
import torch
|
| 75 |
-
from toto2 import Toto2Model
|
| 76 |
-
|
| 77 |
-
model = Toto2Model.from_pretrained("Datadog/Toto-2.0-2.5B-FT")
|
| 78 |
-
model = model.to("cuda").eval()
|
| 79 |
-
|
| 80 |
-
# Same forecast() interface as the base 2.5B model.
|
| 81 |
-
```
|
| 82 |
-
|
| 83 |
-
See the base [Toto-2.0-2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) model card for the full inference example.
|
| 84 |
-
|
| 85 |
## π Additional Resources
|
| 86 |
|
| 87 |
- **Technical Report** β *(coming soon)*
|
|
@@ -92,9 +78,6 @@ See the base [Toto-2.0-2.5B](https://huggingface.co/Datadog/Toto-2.0-2.5B) model
|
|
| 92 |
- [GIFT-Eval benchmark](https://huggingface.co/spaces/Salesforce/GIFT-Eval) β leaderboard hosting this submission
|
| 93 |
- [GitHub Repository](https://github.com/DataDog/toto)
|
| 94 |
|
| 95 |
-
## π License
|
| 96 |
-
|
| 97 |
-
Apache 2.0.
|
| 98 |
|
| 99 |
## π Citation
|
| 100 |
|
|
|
|
| 42 |
>
|
| 43 |
> For real workloads, please use the base [Toto 2.0 collection](https://huggingface.co/collections/Datadog/toto-20). The base checkpoints are pretrained without any public data, generalize to every benchmark we have evaluated, and are what we recommend deploying.
|
| 44 |
|
| 45 |
+
## β¨ What this is?
|
| 46 |
|
| 47 |
A single Toto 2.0 2.5B base checkpoint finetuned on a mix that **includes the GIFT-Eval training split**, used to probe how far the base model can be pushed on a single in-distribution benchmark.
|
| 48 |
|
|
|
|
| 68 |
|
| 69 |
NorMuon and AdamW learning rates were both dropped by roughly an order of magnitude from pretraining (to 0.05 and 0.001 respectively). All other architecture and inference settings match the base 2.5B model.
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
## π Additional Resources
|
| 72 |
|
| 73 |
- **Technical Report** β *(coming soon)*
|
|
|
|
| 78 |
- [GIFT-Eval benchmark](https://huggingface.co/spaces/Salesforce/GIFT-Eval) β leaderboard hosting this submission
|
| 79 |
- [GitHub Repository](https://github.com/DataDog/toto)
|
| 80 |
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
## π Citation
|
| 83 |
|