bitsofchris Claude Opus 4.7 (1M context) commited on
Commit
7c1a4d6
Β·
1 Parent(s): 3361f38

Section restructure, normal font, fact-check the explainers

Browse files

Page section hierarchy:
- β˜€οΈ Live now β€” hero block
- πŸ“… Weekly forecast β€” main 4-panel chart
- πŸ†š Toto vs NWS β€” near-term forecast (was 'same hour, side-by-side')
- (NWS KOKX radar)
- πŸ“Š Results β€” scoreboard table + residual chart
- πŸ”§ How it's made β€” both explainer accordions

Every section is a top-level H2; the chart and scoreboard sub-headings
become H3 lines underneath. CSS bumps H1/H2/H3 sizes, adds a top
border between H2 sections, and bolds the accordion summary so it
reads like a section label instead of UI chrome.

Theme switches from Soft to Default with an Inter + system-font stack
for a more 'normal' typography baseline.

Rewrote both accordions against the current code:
- Scoreboard explainer now reflects fixed 1 h / 3 h / 12 h
lookaheads (not the obsolete 'latest pre-target' rule), pressure
dropped, residual pinned to 1 h-ahead.
- Forecast explainer now reflects Toto-2.0-22m (was 4 M), the SQLite
archive read path, 5-min display vs hourly inference split, 48 h
horizon (was 24), HF Dataset backup + GitHub Actions keep-warm
cron. Removes stale references to a 'past-forecast overlay' that
no longer renders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show
  1. app.py +78 -35
app.py CHANGED
@@ -275,13 +275,10 @@ def refresh():
275
  horizon_hours=1,
276
  )
277
  if "temp_f" in week["totos"]:
278
- comparison_md = (
279
- "### πŸ†š Toto vs NWS β€” same hour, side-by-side\n\n"
280
- + aligned_comparison_markdown(
281
- toto=week["totos"]["temp_f"],
282
- nws_temp=week["nws_aligned"].get("temp_f"),
283
- tz=DISPLAY_TZ,
284
- )
285
  )
286
  else:
287
  comparison_md = ""
@@ -322,7 +319,7 @@ def render_scoreboard(conn) -> str:
322
  )
323
 
324
  lines = [
325
- "### πŸ“Š Forecast scoreboard (rolling 48 h MAE β€” lower is better)",
326
  (
327
  f"<span style='opacity:0.6'>**n** = number of past hours scored in the rolling 48 h window. "
328
  f"Scoreboard started {started_str}.</span>"
@@ -446,14 +443,51 @@ SUBTITLE = (
446
  "The scoreboard tracks who's been more accurate over the past 48 hours."
447
  )
448
 
449
- with gr.Blocks(title="Toto Weather Forecast", theme=gr.themes.Soft()) as demo:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
450
  gr.Markdown("# Toto on my home weather station")
451
  gr.Markdown(HOOK)
452
  gr.Markdown(SUBTITLE)
453
 
 
454
  hero_md = gr.Markdown()
455
- gr.Markdown(f"### πŸ“… {VIEW_WEEK['label']}")
 
 
456
  week_plot = gr.Plot(label="Weekly")
 
 
457
  comparison_md = gr.Markdown()
458
  gr.HTML(
459
  # KOKX is the NWS radar site at Upton, NY β€” covers Long Island incl.
@@ -475,49 +509,58 @@ with gr.Blocks(title="Toto Weather Forecast", theme=gr.themes.Soft()) as demo:
475
  "<span style='opacity:0.55'>πŸ”„ Live data + forecast auto-refresh every 15 minutes.</span>"
476
  )
477
 
478
- gr.Markdown("### πŸ† How has each model done so far?")
479
  scoreboard_md = gr.Markdown()
480
  residual_plot = gr.Plot(label="Forecast residual")
481
 
 
482
  with gr.Accordion("How the scoreboard is calculated", open=False):
483
  gr.Markdown(
484
  "We score each model on **how close its prediction was to the actual Ecowitt reading** "
485
- "for the same hour, averaged over the last 48 hours.\n\n"
 
486
  "**Picking which forecast counts.** Every refresh logs both models' forecasts for the "
487
- "next 24-72 hours along with `forecast_made_at` and `target_ts`. For each past target "
488
- "hour we keep only the **most recent forecast issued *before* that hour** β€” so neither "
489
- "model is allowed to peek at data it couldn't have seen at prediction time.\n\n"
490
- "**The math.** For each metric, per source:\n\n"
491
  "&nbsp;&nbsp;`abs_err = |p50 βˆ’ actual|`\n\n"
492
  "&nbsp;&nbsp;`MAE = mean(abs_err)` over target hours in the last 48 h\n\n"
493
- "&nbsp;&nbsp;`n` = number of (target hour, source) pairs that had both a forecast and an Ecowitt actual\n\n"
494
- "The lower MAE wins. NWS doesn't forecast barometric pressure, so the pressure row shows Toto only.\n\n"
 
 
 
495
  "**What this is NOT.** We score the point prediction (p50) β€” which throws away Toto's "
496
  "uncertainty. A scoring rule like CRPS or pinball loss would credit a well-calibrated "
497
- "10–90% band; MAE doesn't. Folded across all horizons too β€” Toto's +6 h call and +24 h "
498
- "call both contribute to the same number. Per-horizon breakdowns are a likely follow-up.\n\n"
499
  "Full spec: [`docs/toto-inference.md`](https://huggingface.co/spaces/bitsofchris/time-series-ai-weather-forecast/blob/main/docs/toto-inference.md#scoreboard--how-the-accuracy-is-calculated)."
500
  )
501
 
502
  with gr.Accordion("How the forecast is made", open=False):
503
  gr.Markdown(
504
  "**Model.** [Datadog/Toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) "
505
- "(~22 M params, CPU). Second-smallest variant of Toto 2.0; the larger ones (313 M / 1 B / 2.5 B) would tighten the band further.\n\n"
506
- "**Input.** For each metric we feed Toto a univariate window of the most recent "
507
- "Ecowitt history at the chosen display cadence (default 1 h spacing). "
508
- "Toto requires the context length to be a multiple of its `patch_size=32`, so we "
509
- "truncate the oldest points to the largest multiple of 32 we have β€” or, if we have "
510
- "fewer than 32, left-pad to one patch and set `target_mask=False` on the padded "
511
- "steps so the model ignores them.\n\n"
 
 
 
 
512
  "**Output.** `model.forecast(...)` returns 9 analytical quantiles "
513
- "(`[0.1, 0.2, …, 0.9]`) for each future step β€” no Monte-Carlo sampling. "
514
- "We plot the p10–p90 band and the p50 median. "
515
- "**Horizon.** `horizon_steps = round(horizon_hours / step_hours)`; defaults give 24 hourly steps.\n\n"
516
- "**Cadence.** A daemon thread inside the Space re-runs the whole pipeline every "
517
- "15 minutes (cache TTL is 14 min, so each tick re-hits Ecowitt and NWS). Every "
518
- "snapshot is persisted to SQLite and backed up to a private HF Dataset, which is "
519
- "also what powers the side-by-side scoreboard and the past-forecast overlays "
520
- "above.\n\n"
 
521
  "Full spec: [`docs/toto-inference.md`](https://huggingface.co/spaces/bitsofchris/time-series-ai-weather-forecast/blob/main/docs/toto-inference.md)."
522
  )
523
 
 
275
  horizon_hours=1,
276
  )
277
  if "temp_f" in week["totos"]:
278
+ comparison_md = aligned_comparison_markdown(
279
+ toto=week["totos"]["temp_f"],
280
+ nws_temp=week["nws_aligned"].get("temp_f"),
281
+ tz=DISPLAY_TZ,
 
 
 
282
  )
283
  else:
284
  comparison_md = ""
 
319
  )
320
 
321
  lines = [
322
+ "### Rolling 48 h MAE β€” lower is better",
323
  (
324
  f"<span style='opacity:0.6'>**n** = number of past hours scored in the rolling 48 h window. "
325
  f"Scoreboard started {started_str}.</span>"
 
443
  "The scoreboard tracks who's been more accurate over the past 48 hours."
444
  )
445
 
446
+ SYSTEM_FONT = [
447
+ gr.themes.GoogleFont("Inter"),
448
+ "system-ui", "-apple-system", "BlinkMacSystemFont",
449
+ "Segoe UI", "Roboto", "Helvetica", "Arial", "sans-serif",
450
+ ]
451
+ HEADER_CSS = """
452
+ .gradio-container h1 { font-size: 2.2rem !important; line-height: 1.1; margin-bottom: 0.3em; }
453
+ .gradio-container h2 {
454
+ font-size: 1.65rem !important;
455
+ margin-top: 1.6em; margin-bottom: 0.4em;
456
+ padding-top: 0.6em; padding-bottom: 0.2em;
457
+ border-top: 1px solid #e3e3e3;
458
+ }
459
+ .gradio-container h3 {
460
+ font-size: 1.15rem !important;
461
+ margin-top: 0.8em; margin-bottom: 0.3em;
462
+ opacity: 0.9;
463
+ }
464
+ /* Make the two collapsible explainer titles read like real section
465
+ headings instead of small UI chrome. */
466
+ .gradio-container .label-wrap > button,
467
+ .gradio-container .accordion > .label-wrap,
468
+ .gradio-container details > summary {
469
+ font-size: 1.1rem !important;
470
+ font-weight: 600 !important;
471
+ }
472
+ """
473
+
474
+ with gr.Blocks(
475
+ title="Toto Weather Forecast",
476
+ theme=gr.themes.Default(font=SYSTEM_FONT),
477
+ css=HEADER_CSS,
478
+ ) as demo:
479
  gr.Markdown("# Toto on my home weather station")
480
  gr.Markdown(HOOK)
481
  gr.Markdown(SUBTITLE)
482
 
483
+ gr.Markdown("## β˜€οΈ Live now")
484
  hero_md = gr.Markdown()
485
+
486
+ gr.Markdown("## πŸ“… Weekly forecast")
487
+ gr.Markdown(f"### {VIEW_WEEK['label']}")
488
  week_plot = gr.Plot(label="Weekly")
489
+
490
+ gr.Markdown("## πŸ†š Toto vs NWS β€” near-term forecast")
491
  comparison_md = gr.Markdown()
492
  gr.HTML(
493
  # KOKX is the NWS radar site at Upton, NY β€” covers Long Island incl.
 
509
  "<span style='opacity:0.55'>πŸ”„ Live data + forecast auto-refresh every 15 minutes.</span>"
510
  )
511
 
512
+ gr.Markdown("## πŸ“Š Results")
513
  scoreboard_md = gr.Markdown()
514
  residual_plot = gr.Plot(label="Forecast residual")
515
 
516
+ gr.Markdown("## πŸ”§ How it's made")
517
  with gr.Accordion("How the scoreboard is calculated", open=False):
518
  gr.Markdown(
519
  "We score each model on **how close its prediction was to the actual Ecowitt reading** "
520
+ "for the same hour, at three fixed forecast lookaheads β€” **1 h, 3 h, and 12 h ahead** "
521
+ "β€” averaged over the rolling last 48 hours.\n\n"
522
  "**Picking which forecast counts.** Every refresh logs both models' forecasts for the "
523
+ "next 48 hours along with `forecast_made_at` and `target_ts`. For each past target "
524
+ "hour and each lookahead N, we pick the forecast whose `forecast_made_at` is closest "
525
+ "to `target_ts βˆ’ N hours`. Same lookahead for both models = fair comparison.\n\n"
526
+ "**The math.** For each metric (temperature, humidity), per source, per lookahead:\n\n"
527
  "&nbsp;&nbsp;`abs_err = |p50 βˆ’ actual|`\n\n"
528
  "&nbsp;&nbsp;`MAE = mean(abs_err)` over target hours in the last 48 h\n\n"
529
+ "&nbsp;&nbsp;`n` = number of past target hours with both a forecast and an Ecowitt actual\n\n"
530
+ "The lower MAE wins. Barometric pressure is omitted from the scoreboard because NWS "
531
+ "doesn't expose a pressure forecast β€” there's nothing to compare against.\n\n"
532
+ "**Residual chart** (below the table). Same picking rule, pinned to the **1 h-ahead** "
533
+ "lookahead (first row of the table). Each point is `prediction βˆ’ actual`; zero = perfect.\n\n"
534
  "**What this is NOT.** We score the point prediction (p50) β€” which throws away Toto's "
535
  "uncertainty. A scoring rule like CRPS or pinball loss would credit a well-calibrated "
536
+ "10–90% band; MAE doesn't.\n\n"
 
537
  "Full spec: [`docs/toto-inference.md`](https://huggingface.co/spaces/bitsofchris/time-series-ai-weather-forecast/blob/main/docs/toto-inference.md#scoreboard--how-the-accuracy-is-calculated)."
538
  )
539
 
540
  with gr.Accordion("How the forecast is made", open=False):
541
  gr.Markdown(
542
  "**Model.** [Datadog/Toto-2.0-22m](https://huggingface.co/Datadog/Toto-2.0-22m) "
543
+ "(~22 M params, CPU). Second-smallest variant of Toto 2.0; the larger 313 M / 1 B / "
544
+ "2.5 B models would tighten the band further.\n\n"
545
+ "**Input.** Univariate per metric β€” temperature, humidity, pressure, rain rate run "
546
+ "independently. The Space pulls Ecowitt's `cycle_type=5min` history into a local "
547
+ "SQLite archive (`data/ecowitt.db`) every 15 min and accumulates true 5-min cadence "
548
+ "over time. The **chart** displays the 5-min series; **Toto** is fed the same series "
549
+ "downsampled to hourly so the input length stays in the model's sweet spot.\n\n"
550
+ "**Context length.** 7 days Γ— hourly = up to 168 points. Toto requires the context "
551
+ "to be a multiple of its `patch_size` (read from `model.config`), so we truncate the "
552
+ "oldest points to the largest multiple that fits β€” or, if we have fewer points than "
553
+ "one patch, left-pad and set `target_mask=False` on the padding so the model ignores it.\n\n"
554
  "**Output.** `model.forecast(...)` returns 9 analytical quantiles "
555
+ "(`[0.1, 0.2, …, 0.9]`) for each future step β€” **no Monte-Carlo sampling**. "
556
+ "We plot the p10–p90 band and the p50 median.\n\n"
557
+ "**Horizon.** 48 hourly steps = 48 hours into the future. Per-metric inference takes "
558
+ "well under a second on the free-tier CPU.\n\n"
559
+ "**Cadence.** A daemon thread inside the Space runs sync β†’ inference β†’ push every "
560
+ "15 minutes; a 10-minute GitHub Actions cron pings the public URL to keep the Space "
561
+ "warm. Both DBs (forecasts + raw archive) are backed up to a private HF Dataset "
562
+ "(`bitsofchris/toto-weather-forecast-log`), so the scoreboard survives Space "
563
+ "rebuilds.\n\n"
564
  "Full spec: [`docs/toto-inference.md`](https://huggingface.co/spaces/bitsofchris/time-series-ai-weather-forecast/blob/main/docs/toto-inference.md)."
565
  )
566