ChrisHarig commited on
Commit
60c4f27
Β·
verified Β·
1 Parent(s): 5a91d17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -1
README.md CHANGED
@@ -7,4 +7,112 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # EPI-Eval
11
+
12
+ A curated collection of large epidemiological datasets, normalized to a single
13
+ schema so they can be searched, joined, and benchmarked against each other.
14
+
15
+ ## What we track
16
+
17
+ Time-series surveillance data on infectious disease β€” primarily respiratory
18
+ viruses (flu, COVID-19, RSV) and arboviral disease (dengue, Zika,
19
+ chikungunya), with smaller coverage of notifiable, mortality, wastewater, and
20
+ behavioural / search signals. Sources come from CDC, WHO, ECDC, PAHO, OWID,
21
+ and national public-health agencies; we re-publish them as Parquet with a
22
+ consistent set of row-level columns (`date`, `location_id`, `location_level`,
23
+ optional `condition` / `case_status` / `as_of`) and a metadata header
24
+ describing pathogens, geography, cadence, and per-column units.
25
+
26
+ ## Why
27
+
28
+ Forecasting and modeling work routinely stalls on data plumbing β€” finding the
29
+ canonical version of a series, normalizing geography codes, reconciling
30
+ reporting cadences, tracking when a source was last revised. The goal of this
31
+ org is to do that work once, in the open.
32
+
33
+ ## Schema
34
+
35
+ Every dataset card on this org uses the same frontmatter format
36
+ ([schema v0.1](https://github.com/ChrisHarig/apart-forecasting-tool/blob/main/upload_pipeline/schema/schema_v0.1.md)),
37
+ validated against a controlled vocabulary
38
+ ([`vocabularies.yaml`](https://github.com/ChrisHarig/apart-forecasting-tool/blob/main/upload_pipeline/schema/vocabularies.yaml)).
39
+ Curated metadata (pathogens, license, units) lives alongside computed metadata
40
+ (time coverage, row count, observed cadence) generated at ingest.
41
+
42
+ ## Contributing a dataset
43
+
44
+ The ingest pipeline is in
45
+ [`apart-forecasting-tool/upload_pipeline`](https://github.com/ChrisHarig/apart-forecasting-tool/tree/main/upload_pipeline).
46
+ A new dataset is one `ingest.py` + `card.yaml` under
47
+ `upload_pipeline/sources/<source_id>/`; the validator confirms schema fit
48
+ before upload. Each new truth dataset auto-creates an empty
49
+ `<id>-predictions` companion at upload time.
50
+
51
+ ## Datasets (21)
52
+
53
+ ### Respiratory
54
+
55
+ | Dataset | Pathogens | Geography | Cadence |
56
+ | --- | --- | --- | --- |
57
+ | [CDC FluSurv-NET β€” weekly flu hospitalisation rates](https://huggingface.co/datasets/EPI-Eval/delphi-flusurv) | influenza | US | weekly |
58
+ | [CDC NHSN Hospital Respiratory Data (HRD)](https://huggingface.co/datasets/EPI-Eval/nhsn-hrd) | influenza, sars-cov-2, rsv | US | weekly |
59
+ | [CDC NREVSS β€” weekly RSV test specimens and positives](https://huggingface.co/datasets/EPI-Eval/cdc-nrevss-rsv) | rsv | US | weekly |
60
+ | [COVID Tracking Project β€” US states daily (archived)](https://huggingface.co/datasets/EPI-Eval/covid-tracking-project) | sars-cov-2 | US | daily |
61
+ | [COVID-19 Forecast Hub β€” hospital admissions target](https://huggingface.co/datasets/EPI-Eval/covid19-forecast-hub) | sars-cov-2 | US | weekly |
62
+ | [ECDC ERVISS β€” ILI/ARI primary-care consultation rates](https://huggingface.co/datasets/EPI-Eval/ecdc-erviss) | influenza, sars-cov-2, rsv | multiple (30 countries) | weekly |
63
+ | [Flu MetroCast Hub β€” sub-state flu hosp forecast target](https://huggingface.co/datasets/EPI-Eval/flu-metrocast-hub) | influenza | US | weekly |
64
+ | [FluSight Forecast Hub β€” flu hospital admission target](https://huggingface.co/datasets/EPI-Eval/flusight-forecast-hub) | influenza | US | weekly |
65
+ | [JHU CSSE COVID-19 β€” global daily (archived)](https://huggingface.co/datasets/EPI-Eval/jhu-csse-covid) | sars-cov-2 | multiple | daily |
66
+ | [NYT COVID-19 β€” US county daily](https://huggingface.co/datasets/EPI-Eval/nyt-covid) | sars-cov-2 | US | daily |
67
+ | [OWID COVID-19 β€” global daily compiled](https://huggingface.co/datasets/EPI-Eval/owid-covid) | sars-cov-2 | multiple | daily |
68
+ | [PHAC Respiratory Virus Detection Surveillance β€” Canada weekly](https://huggingface.co/datasets/EPI-Eval/canada-fluwatch) | influenza, influenza-a, influenza-b +7 | CA | weekly |
69
+ | [RSV Forecast Hub β€” RSV hospital admissions target](https://huggingface.co/datasets/EPI-Eval/rsv-forecast-hub) | rsv | US | weekly |
70
+ | [UKHSA Dashboard β€” England COVID-19 daily metrics](https://huggingface.co/datasets/EPI-Eval/ukhsa-covid-daily) | sars-cov-2 | GB | daily |
71
+ | [UKHSA Dashboard β€” England flu / COVID-19 / RSV weekly](https://huggingface.co/datasets/EPI-Eval/ukhsa-respiratory) | influenza, sars-cov-2, rsv | GB | weekly |
72
+
73
+ ### Syndromic / ED
74
+
75
+ | Dataset | Pathogens | Geography | Cadence |
76
+ | --- | --- | --- | --- |
77
+ | [CDC NSSP / ESSENCE β€” ED visits for ILI / COVID / RSV](https://huggingface.co/datasets/EPI-Eval/cdc-nssp) | influenza, sars-cov-2, rsv | US | weekly |
78
+
79
+ ### Arboviral
80
+
81
+ | Dataset | Pathogens | Geography | Cadence |
82
+ | --- | --- | --- | --- |
83
+ | [OpenDengue β€” national dengue case counts (V1.3)](https://huggingface.co/datasets/EPI-Eval/opendengue) | dengue | multiple | irregular |
84
+
85
+ ### Mobility & contact
86
+
87
+ | Dataset | Pathogens | Geography | Cadence |
88
+ | --- | --- | --- | --- |
89
+ | [Google Community Mobility Reports β€” global daily](https://huggingface.co/datasets/EPI-Eval/global-mobility) | β€” | multiple | daily |
90
+
91
+ ### Search & behavioural
92
+
93
+ | Dataset | Pathogens | Geography | Cadence |
94
+ | --- | --- | --- | --- |
95
+ | [Wikipedia pageviews β€” disease-article daily views](https://huggingface.co/datasets/EPI-Eval/wikipedia-pageviews) | influenza, sars-cov-2, rsv +6 | multiple | daily |
96
+
97
+ ### Notifiable / other
98
+
99
+ | Dataset | Pathogens | Geography | Cadence |
100
+ | --- | --- | --- | --- |
101
+ | [OWID Mpox β€” global daily compiled](https://huggingface.co/datasets/EPI-Eval/owid-mpox) | mpox | multiple | daily |
102
+ | [WHO Global TB β€” annual country estimates](https://huggingface.co/datasets/EPI-Eval/who-tb-burden) | tuberculosis | multiple | annual |
103
+
104
+ ## Predictions
105
+
106
+ Each truth dataset has a companion `EPI-Eval/<id>-predictions` repo that
107
+ accumulates community-submitted forecasts. Schema is long-format: one row per
108
+ `(target_date, [dim values…], quantile, value)`, with `quantile = NULL`
109
+ reserved for the point estimate. Forecasters submit through the
110
+ [EPI-Eval dashboard](https://github.com/ChrisHarig/apart-forecasting-tool);
111
+ a maintainer reviews each PR before merging, and merged predictions show up
112
+ on the corresponding truth dataset's *Show predictions* toggle in the
113
+ dashboard, with a per-submitter leaderboard (MAE / WIS / rWIS / coverage).
114
+
115
+ ## Status
116
+
117
+ Active. Coverage and dataset list grow through PRs to the upload pipeline.
118
+