Improve model card metadata and add paper reference

This PR improves the model card for `Toto-1.0-QA-Experimental` by:
- Updating the `pipeline_tag` to `image-text-to-text` for better discoverability.
- Adding `library_name: transformers` as the model is compatible with the Transformers library.
- Moving the paper reference from the YAML metadata to the Markdown section per Hugging Face recommendations.
- Adding the full list of authors and linking the official repository.

Files changed (1) hide show

README.md +30 -35

README.md CHANGED Viewed

@@ -1,5 +1,15 @@
 ---
-model_id: Toto-1.0-QA-Experimental
 tags:
 - visual-question-answering
 - time-series
@@ -9,56 +19,41 @@ tags:
 - anomaly-reasoning
 - arfbench
 - observability
-paper:
-- https://arxiv.org/abs/2604.21199
-datasets:
-- Datadog/ARFBench
 leaderboards:
 - ARFBench
-license: apache-2.0
-pipeline_tag: visual-question-answering
-metrics:
-- accuracy
-- f1
-base_model:
-- Qwen/Qwen3-VL-32B-Instruct
-- Datadog/Toto-Open-Base-1.0
 ---
 # Toto-1.0-QA-Experimental
-`Toto-1.0-QA-Experimental` is a hybrid time-series foundation model (TSFM) and vision-language model (VLM) for ARFBench. It achieves comparable macro F1 and accuracy to top frontier models on ARFBench:
-|![arfbench-accuracy-f1-combined](https://cdn-uploads.huggingface.co/production/uploads/681d68309722c5341cd3fa59/Fs1zeUOkZ6G_yPpOyvlYq.png)|
-|:-:|
-|Overall accuracy and F1 on the ARFBench time series question-answering benchmark, as of paper release. Toto-1.0-QA-Experimental achieves the top accuracy and comparable F1 to top frontier models.|
-It combines:
-- a vision-language backbone (`Qwen/Qwen3-VL-32B-Instruct`) for image-conditioned question answering,
-- Toto time-series representations (`Datadog/Toto-Open-Base-1.0`),
-- lightweight projection modules that inject time-series signals into VLM inference.
 |![toto-vlm-arch](https://cdn-uploads.huggingface.co/production/uploads/681d68309722c5341cd3fa59/VOihICj_-HTNdbNyNseD_.png)|
 |:-:|
 |Overview of the Toto-1.0-QA-Experimental Architecture.|
-This model repository stores inference artifacts, including:
-- `vlm/` (merged vision-language model weights),
-- `ts_modules.pt` (time-series modules),
-- `config.json` and processor files.
 ---
 ## Basic Inference Example
-The example below assumes you already have:
-- time-series tensors,
-- one or more image paths,
-- a text question.
 ```python
 import torch
@@ -168,10 +163,10 @@ Running Toto-1.0-QA-Experimental typically requires multi-GPU setup (tested on 4
 ## Resources
-- [ARFBench Paper](https://arxiv.org/abs/2604.21199)
-- [Dataset](https://huggingface.co/datasets/Datadog/ARFBench)
-- [Leaderboard](https://huggingface.co/spaces/Datadog/ARFBench)
-- [Code](https://github.com/DataDog/arfbench)
 ---

 ---
+base_model:
+- Qwen/Qwen3-VL-32B-Instruct
+- Datadog/Toto-Open-Base-1.0
+datasets:
+- Datadog/ARFBench
+license: apache-2.0
+metrics:
+- accuracy
+- f1
+pipeline_tag: image-text-to-text
+library_name: transformers
 tags:
 - visual-question-answering
 - time-series
 - anomaly-reasoning
 - arfbench
 - observability
+model_id: Toto-1.0-QA-Experimental
 leaderboards:
 - ARFBench
 ---
 # Toto-1.0-QA-Experimental
+`Toto-1.0-QA-Experimental` is a hybrid time-series foundation model (TSFM) and vision-language model (VLM) for ARFBench.
+The model was introduced in the paper [ARFBench: Benchmarking Time Series Question Answering Ability for Software Incident Response](https://arxiv.org/abs/2604.21199).
+**Authors:** Stephan Xie, Ben Cohen, Mononito Goswami, Junhong Shen, Emaad Khwaja, Chenghao Liu, David Asker, Othmane Abou-Amal, Ameet Talwalkar.
+## Model Description
+The model achieves comparable macro F1 and accuracy to top frontier models on ARFBench by combining:
+- A vision-language backbone (`Qwen/Qwen3-VL-32B-Instruct`) for image-conditioned question answering.
+- Toto time-series representations (`Datadog/Toto-Open-Base-1.0`).
+- Lightweight projection modules that inject time-series signals into VLM inference.
+|![arfbench-accuracy-f1-combined](https://cdn-uploads.huggingface.co/production/uploads/681d68309722c5341cd3fa59/Fs1zeUOkZ6G_yPpOyvlYq.png)|
+|:-:|
+|Overall accuracy and F1 on the ARFBench time series question-answering benchmark, as of paper release. Toto-1.0-QA-Experimental achieves the top accuracy and comparable F1 to top frontier models.|
 |![toto-vlm-arch](https://cdn-uploads.huggingface.co/production/uploads/681d68309722c5341cd3fa59/VOihICj_-HTNdbNyNseD_.png)|
 |:-:|
 |Overview of the Toto-1.0-QA-Experimental Architecture.|
+This model repository stores inference artifacts, including merged vision-language model weights, time-series modules, and configuration files.
 ---
 ## Basic Inference Example
+The example below assumes you already have time-series tensors, one or more image paths, and a text question. The required components are available in the [official Github repository](https://github.com/DataDog/arfbench).
 ```python
 import torch
 ## Resources
+- **Paper:** [ARFBench on ArXiv](https://arxiv.org/abs/2604.21199)
+- **Code:** [GitHub - DataDog/arfbench](https://github.com/DataDog/arfbench)
+- **Dataset:** [Datadog/ARFBench](https://huggingface.co/datasets/Datadog/ARFBench)
+- **Leaderboard:** [ARFBench Space](https://huggingface.co/spaces/Datadog/ARFBench)
 ---