Section restructure, normal font, fact-check the explainers
Browse filesPage section hierarchy:
- βοΈ Live now β hero block
- π
Weekly forecast β main 4-panel chart
- π Toto vs NWS β near-term forecast (was 'same hour, side-by-side')
- (NWS KOKX radar)
- π Results β scoreboard table + residual chart
- π§ How it's made β both explainer accordions
Every section is a top-level H2; the chart and scoreboard sub-headings
become H3 lines underneath. CSS bumps H1/H2/H3 sizes, adds a top
border between H2 sections, and bolds the accordion summary so it
reads like a section label instead of UI chrome.
Theme switches from Soft to Default with an Inter + system-font stack
for a more 'normal' typography baseline.
Rewrote both accordions against the current code:
- Scoreboard explainer now reflects fixed 1 h / 3 h / 12 h
lookaheads (not the obsolete 'latest pre-target' rule), pressure
dropped, residual pinned to 1 h-ahead.
- Forecast explainer now reflects Toto-2.0-22m (was 4 M), the SQLite
archive read path, 5-min display vs hourly inference split, 48 h
horizon (was 24), HF Dataset backup + GitHub Actions keep-warm
cron. Removes stale references to a 'past-forecast overlay' that
no longer renders.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@@ -275,13 +275,10 @@ def refresh():
|
|
| 275 |
horizon_hours=1,
|
| 276 |
)
|
| 277 |
if "temp_f" in week["totos"]:
|
| 278 |
-
comparison_md = (
|
| 279 |
-
"
|
| 280 |
-
|
| 281 |
-
|
| 282 |
-
nws_temp=week["nws_aligned"].get("temp_f"),
|
| 283 |
-
tz=DISPLAY_TZ,
|
| 284 |
-
)
|
| 285 |
)
|
| 286 |
else:
|
| 287 |
comparison_md = ""
|
|
@@ -322,7 +319,7 @@ def render_scoreboard(conn) -> str:
|
|
| 322 |
)
|
| 323 |
|
| 324 |
lines = [
|
| 325 |
-
"###
|
| 326 |
(
|
| 327 |
f"<span style='opacity:0.6'>**n** = number of past hours scored in the rolling 48 h window. "
|
| 328 |
f"Scoreboard started {started_str}.</span>"
|
|
@@ -446,14 +443,51 @@ SUBTITLE = (
|
|
| 446 |
"The scoreboard tracks who's been more accurate over the past 48 hours."
|
| 447 |
)
|
| 448 |
|
| 449 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 450 |
gr.Markdown("# Toto on my home weather station")
|
| 451 |
gr.Markdown(HOOK)
|
| 452 |
gr.Markdown(SUBTITLE)
|
| 453 |
|
|
|
|
| 454 |
hero_md = gr.Markdown()
|
| 455 |
-
|
|
|
|
|
|
|
| 456 |
week_plot = gr.Plot(label="Weekly")
|
|
|
|
|
|
|
| 457 |
comparison_md = gr.Markdown()
|
| 458 |
gr.HTML(
|
| 459 |
# KOKX is the NWS radar site at Upton, NY β covers Long Island incl.
|
|
@@ -475,49 +509,58 @@ with gr.Blocks(title="Toto Weather Forecast", theme=gr.themes.Soft()) as demo:
|
|
| 475 |
"<span style='opacity:0.55'>π Live data + forecast auto-refresh every 15 minutes.</span>"
|
| 476 |
)
|
| 477 |
|
| 478 |
-
gr.Markdown("##
|
| 479 |
scoreboard_md = gr.Markdown()
|
| 480 |
residual_plot = gr.Plot(label="Forecast residual")
|
| 481 |
|
|
|
|
| 482 |
with gr.Accordion("How the scoreboard is calculated", open=False):
|
| 483 |
gr.Markdown(
|
| 484 |
"We score each model on **how close its prediction was to the actual Ecowitt reading** "
|
| 485 |
-
"for the same hour,
|
|
|
|
| 486 |
"**Picking which forecast counts.** Every refresh logs both models' forecasts for the "
|
| 487 |
-
"next
|
| 488 |
-
"hour
|
| 489 |
-
"
|
| 490 |
-
"**The math.** For each metric, per source:\n\n"
|
| 491 |
" `abs_err = |p50 β actual|`\n\n"
|
| 492 |
" `MAE = mean(abs_err)` over target hours in the last 48 h\n\n"
|
| 493 |
-
" `n` = number of
|
| 494 |
-
"The lower MAE wins.
|
|
|
|
|
|
|
|
|
|
| 495 |
"**What this is NOT.** We score the point prediction (p50) β which throws away Toto's "
|
| 496 |
"uncertainty. A scoring rule like CRPS or pinball loss would credit a well-calibrated "
|
| 497 |
-
"10β90% band; MAE doesn't.
|
| 498 |
-
"call both contribute to the same number. Per-horizon breakdowns are a likely follow-up.\n\n"
|
| 499 |
"Full spec: [`docs/toto-inference.md`](https://huggingface.co/spaces/bitsofchris/time-series-ai-weather-forecast/blob/main/docs/toto-inference.md#scoreboard--how-the-accuracy-is-calculated)."
|
| 500 |
)
|
| 501 |
|
| 502 |
with gr.Accordion("How the forecast is made", open=False):
|
| 503 |
gr.Markdown(
|
| 504 |
"**Model.** [Datadog/Toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) "
|
| 505 |
-
"(~22 M params, CPU). Second-smallest variant of Toto 2.0; the larger
|
| 506 |
-
"
|
| 507 |
-
"
|
| 508 |
-
"
|
| 509 |
-
"
|
| 510 |
-
"
|
| 511 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 512 |
"**Output.** `model.forecast(...)` returns 9 analytical quantiles "
|
| 513 |
-
"(`[0.1, 0.2, β¦, 0.9]`) for each future step β no Monte-Carlo sampling. "
|
| 514 |
-
"We plot the p10βp90 band and the p50 median.
|
| 515 |
-
"**Horizon.**
|
| 516 |
-
"
|
| 517 |
-
"
|
| 518 |
-
"
|
| 519 |
-
"
|
| 520 |
-
"
|
|
|
|
| 521 |
"Full spec: [`docs/toto-inference.md`](https://huggingface.co/spaces/bitsofchris/time-series-ai-weather-forecast/blob/main/docs/toto-inference.md)."
|
| 522 |
)
|
| 523 |
|
|
|
|
| 275 |
horizon_hours=1,
|
| 276 |
)
|
| 277 |
if "temp_f" in week["totos"]:
|
| 278 |
+
comparison_md = aligned_comparison_markdown(
|
| 279 |
+
toto=week["totos"]["temp_f"],
|
| 280 |
+
nws_temp=week["nws_aligned"].get("temp_f"),
|
| 281 |
+
tz=DISPLAY_TZ,
|
|
|
|
|
|
|
|
|
|
| 282 |
)
|
| 283 |
else:
|
| 284 |
comparison_md = ""
|
|
|
|
| 319 |
)
|
| 320 |
|
| 321 |
lines = [
|
| 322 |
+
"### Rolling 48 h MAE β lower is better",
|
| 323 |
(
|
| 324 |
f"<span style='opacity:0.6'>**n** = number of past hours scored in the rolling 48 h window. "
|
| 325 |
f"Scoreboard started {started_str}.</span>"
|
|
|
|
| 443 |
"The scoreboard tracks who's been more accurate over the past 48 hours."
|
| 444 |
)
|
| 445 |
|
| 446 |
+
SYSTEM_FONT = [
|
| 447 |
+
gr.themes.GoogleFont("Inter"),
|
| 448 |
+
"system-ui", "-apple-system", "BlinkMacSystemFont",
|
| 449 |
+
"Segoe UI", "Roboto", "Helvetica", "Arial", "sans-serif",
|
| 450 |
+
]
|
| 451 |
+
HEADER_CSS = """
|
| 452 |
+
.gradio-container h1 { font-size: 2.2rem !important; line-height: 1.1; margin-bottom: 0.3em; }
|
| 453 |
+
.gradio-container h2 {
|
| 454 |
+
font-size: 1.65rem !important;
|
| 455 |
+
margin-top: 1.6em; margin-bottom: 0.4em;
|
| 456 |
+
padding-top: 0.6em; padding-bottom: 0.2em;
|
| 457 |
+
border-top: 1px solid #e3e3e3;
|
| 458 |
+
}
|
| 459 |
+
.gradio-container h3 {
|
| 460 |
+
font-size: 1.15rem !important;
|
| 461 |
+
margin-top: 0.8em; margin-bottom: 0.3em;
|
| 462 |
+
opacity: 0.9;
|
| 463 |
+
}
|
| 464 |
+
/* Make the two collapsible explainer titles read like real section
|
| 465 |
+
headings instead of small UI chrome. */
|
| 466 |
+
.gradio-container .label-wrap > button,
|
| 467 |
+
.gradio-container .accordion > .label-wrap,
|
| 468 |
+
.gradio-container details > summary {
|
| 469 |
+
font-size: 1.1rem !important;
|
| 470 |
+
font-weight: 600 !important;
|
| 471 |
+
}
|
| 472 |
+
"""
|
| 473 |
+
|
| 474 |
+
with gr.Blocks(
|
| 475 |
+
title="Toto Weather Forecast",
|
| 476 |
+
theme=gr.themes.Default(font=SYSTEM_FONT),
|
| 477 |
+
css=HEADER_CSS,
|
| 478 |
+
) as demo:
|
| 479 |
gr.Markdown("# Toto on my home weather station")
|
| 480 |
gr.Markdown(HOOK)
|
| 481 |
gr.Markdown(SUBTITLE)
|
| 482 |
|
| 483 |
+
gr.Markdown("## βοΈ Live now")
|
| 484 |
hero_md = gr.Markdown()
|
| 485 |
+
|
| 486 |
+
gr.Markdown("## π
Weekly forecast")
|
| 487 |
+
gr.Markdown(f"### {VIEW_WEEK['label']}")
|
| 488 |
week_plot = gr.Plot(label="Weekly")
|
| 489 |
+
|
| 490 |
+
gr.Markdown("## π Toto vs NWS β near-term forecast")
|
| 491 |
comparison_md = gr.Markdown()
|
| 492 |
gr.HTML(
|
| 493 |
# KOKX is the NWS radar site at Upton, NY β covers Long Island incl.
|
|
|
|
| 509 |
"<span style='opacity:0.55'>π Live data + forecast auto-refresh every 15 minutes.</span>"
|
| 510 |
)
|
| 511 |
|
| 512 |
+
gr.Markdown("## π Results")
|
| 513 |
scoreboard_md = gr.Markdown()
|
| 514 |
residual_plot = gr.Plot(label="Forecast residual")
|
| 515 |
|
| 516 |
+
gr.Markdown("## π§ How it's made")
|
| 517 |
with gr.Accordion("How the scoreboard is calculated", open=False):
|
| 518 |
gr.Markdown(
|
| 519 |
"We score each model on **how close its prediction was to the actual Ecowitt reading** "
|
| 520 |
+
"for the same hour, at three fixed forecast lookaheads β **1 h, 3 h, and 12 h ahead** "
|
| 521 |
+
"β averaged over the rolling last 48 hours.\n\n"
|
| 522 |
"**Picking which forecast counts.** Every refresh logs both models' forecasts for the "
|
| 523 |
+
"next 48 hours along with `forecast_made_at` and `target_ts`. For each past target "
|
| 524 |
+
"hour and each lookahead N, we pick the forecast whose `forecast_made_at` is closest "
|
| 525 |
+
"to `target_ts β N hours`. Same lookahead for both models = fair comparison.\n\n"
|
| 526 |
+
"**The math.** For each metric (temperature, humidity), per source, per lookahead:\n\n"
|
| 527 |
" `abs_err = |p50 β actual|`\n\n"
|
| 528 |
" `MAE = mean(abs_err)` over target hours in the last 48 h\n\n"
|
| 529 |
+
" `n` = number of past target hours with both a forecast and an Ecowitt actual\n\n"
|
| 530 |
+
"The lower MAE wins. Barometric pressure is omitted from the scoreboard because NWS "
|
| 531 |
+
"doesn't expose a pressure forecast β there's nothing to compare against.\n\n"
|
| 532 |
+
"**Residual chart** (below the table). Same picking rule, pinned to the **1 h-ahead** "
|
| 533 |
+
"lookahead (first row of the table). Each point is `prediction β actual`; zero = perfect.\n\n"
|
| 534 |
"**What this is NOT.** We score the point prediction (p50) β which throws away Toto's "
|
| 535 |
"uncertainty. A scoring rule like CRPS or pinball loss would credit a well-calibrated "
|
| 536 |
+
"10β90% band; MAE doesn't.\n\n"
|
|
|
|
| 537 |
"Full spec: [`docs/toto-inference.md`](https://huggingface.co/spaces/bitsofchris/time-series-ai-weather-forecast/blob/main/docs/toto-inference.md#scoreboard--how-the-accuracy-is-calculated)."
|
| 538 |
)
|
| 539 |
|
| 540 |
with gr.Accordion("How the forecast is made", open=False):
|
| 541 |
gr.Markdown(
|
| 542 |
"**Model.** [Datadog/Toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) "
|
| 543 |
+
"(~22 M params, CPU). Second-smallest variant of Toto 2.0; the larger 313 M / 1 B / "
|
| 544 |
+
"2.5 B models would tighten the band further.\n\n"
|
| 545 |
+
"**Input.** Univariate per metric β temperature, humidity, pressure, rain rate run "
|
| 546 |
+
"independently. The Space pulls Ecowitt's `cycle_type=5min` history into a local "
|
| 547 |
+
"SQLite archive (`data/ecowitt.db`) every 15 min and accumulates true 5-min cadence "
|
| 548 |
+
"over time. The **chart** displays the 5-min series; **Toto** is fed the same series "
|
| 549 |
+
"downsampled to hourly so the input length stays in the model's sweet spot.\n\n"
|
| 550 |
+
"**Context length.** 7 days Γ hourly = up to 168 points. Toto requires the context "
|
| 551 |
+
"to be a multiple of its `patch_size` (read from `model.config`), so we truncate the "
|
| 552 |
+
"oldest points to the largest multiple that fits β or, if we have fewer points than "
|
| 553 |
+
"one patch, left-pad and set `target_mask=False` on the padding so the model ignores it.\n\n"
|
| 554 |
"**Output.** `model.forecast(...)` returns 9 analytical quantiles "
|
| 555 |
+
"(`[0.1, 0.2, β¦, 0.9]`) for each future step β **no Monte-Carlo sampling**. "
|
| 556 |
+
"We plot the p10βp90 band and the p50 median.\n\n"
|
| 557 |
+
"**Horizon.** 48 hourly steps = 48 hours into the future. Per-metric inference takes "
|
| 558 |
+
"well under a second on the free-tier CPU.\n\n"
|
| 559 |
+
"**Cadence.** A daemon thread inside the Space runs sync β inference β push every "
|
| 560 |
+
"15 minutes; a 10-minute GitHub Actions cron pings the public URL to keep the Space "
|
| 561 |
+
"warm. Both DBs (forecasts + raw archive) are backed up to a private HF Dataset "
|
| 562 |
+
"(`bitsofchris/toto-weather-forecast-log`), so the scoreboard survives Space "
|
| 563 |
+
"rebuilds.\n\n"
|
| 564 |
"Full spec: [`docs/toto-inference.md`](https://huggingface.co/spaces/bitsofchris/time-series-ai-weather-forecast/blob/main/docs/toto-inference.md)."
|
| 565 |
)
|
| 566 |
|