Spaces:

jbobym
/

pc-ddpm-alberta

Sleeping

App Files Files Community

jbobym commited on 24 days ago

Commit

b7b86f2

1 Parent(s): 93ed35a

space-deploy: bring updated README + headline/slice figures from master

Browse files

Files changed (3) hide show

README.md +76 -64
docs/headline.png +0 -0
docs/slice_metrics.png +0 -0

README.md CHANGED Viewed

@@ -33,120 +33,131 @@ the values to configure the Space. /build-readme should preserve it.
 # Physics-Constrained Diffusion for Renewable Energy Scenario Generation: Alberta Grid
-> **Grid planners need scenario sets that are statistically realistic AND physically consistent — current methods deliver one or the other, not both.** — PC-DDPM achieves 100% scenario feasibility under operational voltage bounds [0.85, 1.10] pu, vs 1.8% scenario infeasibility for the unconstrained DDPM baseline, on 9 years of real AESO data.
 ![headline](docs/headline.png)
-**[Live demo](TBD post-deploy)** · **[Model card](model_card.md)** · **[Blog post](writeup/BLOG_POST.md)** · **[W&B project](https://wandb.ai/jbobym/pc-ddpm-alberta)**
 ## TL;DR
-- **What we built** — A physics-constrained diffusion model that generates 24-hour wind/solar/load scenarios respecting AC power flow on the IEEE 118-bus network scaled to Alberta loads.
-- **How well it works** — 0% → 100% scenario feasibility under operational voltage bounds vs. unconstrained DDPM, measured on the 2025 chronological test split (1,000 scenarios × 24 hours).
-- **Try it** — [Live demo](TBD post-deploy) (one click, no auth)
-- **Reproduce it** — `git clone … && make train` (see [Reproducibility](#reproducibility))
-- **Where it fails** — At λ_phys=2.0, load distributional fidelity degrades 36% (Wasserstein-1 521→799 MW) — the model partly achieves feasibility by avoiding high-load conditions, which is exactly what a stress-testing planner needs to keep.
 ---
 ## Motivation
-[Why does this problem matter, in stakeholder terms? Who suffers when
-it's unsolved? Two to four sentences. Lead with the operational pain
-point, not the ML problem framing.]
-## Approach
-[What we built, at the level a senior IC could understand without
-reading the code. Diagram comes from `docs/architecture.md`.]
 ![architecture](docs/architecture.png)
-Three things to call out:
-1. **Training-time penalty, not sampling-time guidance** — Hoseinpour & Dvorkin (2025) embed AC feasibility via gradient guidance at sampling time on synthetic IEEE benchmarks; we embed it in the training objective via a frozen GraphSAGE surrogate, and we train on real 9-year AESO data with 24-hour temporal structure.
-2. **Three-phase λ_phys annealing** — pure DDPM warm-up, gradual physics ramp, fixed-penalty fine-tune. Without it, training collapses on the larger penalty scales.
-3. **Honest characterisation of the feasibility-fidelity trade-off** — λ_phys=0/1/2 ablation; we name the cost (load W1 +36% at λ=2.0) instead of papering over it.
-2. **Real operational data, not synthetic benchmarks** — Dong et al. (2025) and Hoseinpour & Dvorkin (2025) train on synthetic IEEE perturbations or pattern-guided sampling; we train on 9 years of real AESO wind/solar/load with 24-hour temporal structure.
-3. **The empirical case study a 228-paper survey said was missing** — Zhang et al. (2025) identify physics-constrained generation as the open problem across deep generative work in energy systems; we provide the AC-feasibility evaluation that survey calls for.
 ## Data
 | Dataset | Source | Coverage | License |
 |---|---|---|---|
-| AESO hourly wind/solar/load | AESO Open Data | 2016-01 to 2025-12 (~84,000 hours) | Public, attribution |
-| ERA5 weather reanalysis | Copernicus Climate Data Store | 2016-2025, Alberta zones | CC-BY-4.0 |
-Splits: chronological, no shuffle. Train 2016-2023; val 2024; test 2025.
-Detailed schema and gotchas in `eda/data_profile.md`.
 ## Results
 ### Headline
-| Method | [METRIC] | [SECONDARY METRIC] | [SLICE — e.g., test/extreme] |
-|---|---|---|---|
-| Naive baseline | — | — | — |
-| Strong baseline | — | — | — |
-| **Ours** | **—** | **—** | **—** |
-### Slice analysis
-[The point of slicing: a single overall metric hides where the model
-fails. Replace this with results from `/eval-report`.]
-![slice](docs/slice_metrics.png)
-### Calibration / robustness
-[For regression with uncertainty: calibration curve. For classification:
-confusion matrix or reliability diagram. Drop this section if it doesn't apply.]
-## Discussion
-[The honest discussion. Three paragraphs:]
-**What was surprising.** [Something you didn't expect — these become talking points in interviews.]
-**Where it fails.** [Be specific. "Underforecasts peak hours by 8% on average. Fails on transmission outages." Hiring managers trust calibrated honesty more than puffed-up claims.]
-**Competence boundary.** [What's the model's domain of validity? When should the user NOT use this?]
 ## Reproducibility
 ```bash
-git clone [REPO URL]
 cd pc-ddpm-alberta
-pip install -e ".[dev]"
-nbstripout --install              # one-time per clone
-# Pull data (if external; otherwise see data/README.md)
-make data
-# Train + evaluate
-make train
-make eval
-# Run the demo locally
-make serve
 ```
-- **Compute**: [GPU + RAM]
-- **Wall time**: [TRAIN TIME], [EVAL TIME]
-- **Determinism**: seeds set in `src/pc_ddpm_alberta/config.py` (`RANDOM_SEED = 42`)
-- **W&B run** backing the headline metric: [BEST RUN URL]
 ## Limitations
-[Detailed limitations — longer than the TL;DR bullet. Hiring managers
-read this to decide if you've thought hard about your work. A model
-card lives at `model_card.md` with the full breakdown.]
 ## References
-[If you cited prior work or built on existing techniques, list it here.
-Format: free-form is fine for a portfolio README; switch to BibTeX-like
-for academic-leaning projects.]
 ## License
@@ -155,11 +166,12 @@ MIT (code) · AESO Open Data (with attribution) · ERA5 CC-BY-4.0 (Copernicus Cl
 ## Citation
 If this was useful, please cite as:
 ```
 @misc{mesadieu2026pcddpm,
   author = {Mesadieu, John Boby},
   title  = {Physics-Constrained Diffusion for Renewable Energy Scenario Generation: Alberta Grid},
   year   = {2026},
-  url    = {[REPO URL]}
 }
 ```

 # Physics-Constrained Diffusion for Renewable Energy Scenario Generation: Alberta Grid
+> **Grid planners need scenario sets that are statistically realistic AND physically consistent — current methods deliver one or the other, not both.** PC-DDPM achieves 100% scenario feasibility under operational voltage bounds [0.85, 1.10] pu, vs 1.8% scenario infeasibility for the unconstrained DDPM baseline, on 9 years of real AESO data.
+<!-- TODO(headline-figure): generate docs/headline.png via /eval-report or by hand
+     (suggest: 3-panel — left: 24h scenario fan over a real Alberta day;
+     middle: feasibility bars across λ ablation; right: V violation distribution
+     unconstrained vs PC-DDPM). README claims it, so the file must exist before
+     the repo flips to public. -->
 ![headline](docs/headline.png)
+**[Live demo](https://huggingface.co/spaces/jbobym/pc-ddpm-alberta)** · **[Model card](model_card.md)** · **[Blog post](writeup/BLOG_POST.md)** · **[Pretrained weights](https://huggingface.co/jbobym/pc-ddpm-alberta)**
 ## TL;DR
+- **What I built** — A physics-constrained diffusion model that generates ensembles of 24-hour wind/solar/load scenarios that satisfy AC power flow on the IEEE 118-bus network scaled to Alberta loads.
+- **How well it works** — Under operational voltage bounds [0.85, 1.10] pu, PC-DDPM (λ=2.0) is feasible on 100% of scenarios; the unconstrained baseline leaves 1.8% with at least one infeasible hour. Under the tighter ANSI Range B bounds [0.89, 1.05] pu, scenario feasibility climbs from 30.3% to 39.0% and step feasibility from 56.2% to 69.7%; both numbers come from 1,000 scenarios × 24 hours on the chronological 2025 test split.
+- **Try it** — [Live demo on HF Spaces](https://huggingface.co/spaces/jbobym/pc-ddpm-alberta) (one click, no auth; ~40 s for 24 scenarios on cpu-basic).
+- **Reproduce it** — `git clone … && make install && make train && make eval` (see [Reproducibility](#reproducibility)).
+- **Where it fails** — Feasibility is bought at a fidelity cost: at λ=2.0, the load Wasserstein-1 distance reaches 799 MW, up from 521 MW for the unconstrained baseline — a 53% increase. The model partly achieves feasibility by avoiding high-load conditions, which is exactly what a stress-testing planner needs to keep — a real limitation, not a tunable knob.
 ---
 ## Motivation
+Grid operators routinely run scenario sets through power-flow solvers as part of reserve sizing and contingency analysis; the scenarios that violate voltage limits get filtered out, and only the remainder informs planning decisions. When a generative model produces statistically realistic samples that still violate AC feasibility, the operator pays twice: once for the compute that drew the bad scenarios, and once for the coverage gap left by throwing them away. On the IEEE 118-bus proxy network we use here, an off-the-shelf DDPM trained on real AESO data leaves roughly 70% of generated scenarios infeasible under strict bounds; that's not a useful starting point for an operator who needs to stress-test reserve adequacy.
+Embedding physics into the training objective — rather than filtering at sampling time, or applying post-hoc rejection — directly attacks the loss in coverage. We do not claim production readiness; we evaluate on a published proxy network because real AESO topology is non-public, and we measure voltage feasibility only because line thermal limits are not exercised at Alberta loadings on this network. The contribution is a clean methodological case study, on real operational data, of a physics-aware training pipeline that the survey by Zhang et al. (2025) flags as the open problem.
+## Approach
+<!-- TODO(architecture-diagram): generate docs/architecture.png from
+     docs/architecture.md (mermaid or excalidraw). The walking-tour below
+     describes what the diagram should show. -->
 ![architecture](docs/architecture.png)
+PC-DDPM is a 1D temporal U-Net trained on 24-hour windows of the joint (wind, solar, load) series; the diffusion target is the standard ε-prediction objective with a cosine β schedule, T=1000. On top of that, every training step pays a voltage-violation penalty: we denormalise the predicted samples, allocate them to the 118 buses through fixed wind/solar/load mappings, run them through a frozen GraphSAGE surrogate that approximates AC power flow, and apply a ReLU penalty against the [0.89, 1.05] pu band. We train the surrogate once, separately, to <3% MAPE on bus voltages, and freeze its weights during DDPM training so the physics signal is a stable target rather than a moving one.
+Three things are worth calling out, all of which differ from prior work:
+1. **Training-time penalty, not sampling-time guidance.** Hoseinpour & Dvorkin (2025) embed AC feasibility via gradient guidance applied at sampling time on synthetic IEEE 118-bus snapshots without temporal structure; we embed it in the training objective and train on real 9-year AESO data with 24-hour windows. Dong et al. (2025) also use diffusion for renewable scenarios but apply physics as a post-hoc filter on generated samples; we never filter, we shape the training distribution.
+2. **Three-phase λ_phys annealing.** Pure DDPM warm-up for the first 60 epochs (λ=0); a gradual ramp from 0 to the target value over the next 80 epochs; a fixed-penalty fine-tune for the rest. Without the warm-up, training collapses on the larger penalty scales — the model never learns the data distribution well enough for the physics gradients to be meaningful. The schedule is the difference between a working run and one that drifts to a constant generation pattern.
+3. **The fidelity-feasibility trade-off is named, not papered over.** We run a λ_phys=0/1/2 ablation and report the cost: load distributional fidelity degrades from W1=521 MW (unconstrained) to W1=799 MW (λ=2.0), a 53% increase. The model partly achieves voltage feasibility by avoiding the high-load conditions where the strict bounds bite hardest. For a planner who wants to *stress* the network with high loads, that's a real limitation; for one who wants the bulk of usable scenarios, it's the right knob.
+The architecture mirrors the pc-ddpm-epec2026 reference implementation; the inference path pulls weights from the [Hugging Face Hub model repo](https://huggingface.co/jbobym/pc-ddpm-alberta) at runtime.
 ## Data
 | Dataset | Source | Coverage | License |
 |---|---|---|---|
+| AESO hourly wind, solar, load | AESO Open Data | 2016-01-01 → 2025-07-31 (~84,000 hours) | Public, attribution |
+| ERA5 weather reanalysis (wind speed, solar radiation, temperature) | Copernicus Climate Data Store | 2016 → 2025, Alberta zones | CC-BY-4.0 |
+The merged hourly grid is dense — 0% missingness across all six columns, no duplicate timestamps, full Alberta climatic range. Splits are chronological with no shuffle: train 2016 → 2023, val 2024, test 2025; the demo's reference days come from the test tail (2025-07-04, 2025-01-27) and one earlier spring day (2018-03-17) so the typical / high-wind / low-wind examples span seasons. Detailed schema, range stats, and ERA5 interpolation gotchas live in [`eda/data_profile.md`](eda/data_profile.md).
 ## Results
 ### Headline
+We compute all numbers on the same 1,000 scenarios × 24 hours, sampled with a fixed seed; pandapower Newton-Raphson solves every (scenario, hour) pair, and we check feasibility post-convergence under both bound regimes. The "scenario feasibility" column is the strict reading: a scenario counts as feasible only when every one of its 24 hours is feasible at every bus.
+| Model | Operational [0.85, 1.10] pu | Strict ANSI [0.89, 1.05] pu | Step feas. (strict) | V violation p95 (pu) |
+|---|---|---|---|---|
+| Unconstrained DDPM | 98.2% (982/1000) | 30.3% (303/1000) | 56.2% | 0.0228 |
+| PC-DDPM, λ=1.0 | 99.7% (997/1000) | 34.7% (347/1000) | 64.9% | 0.0157 |
+| **PC-DDPM, λ=2.0** | **100.0% (1000/1000)** | **39.0% (390/1000)** | **69.7%** | **0.0127** |
+V violation p95 — the 95th-percentile worst voltage excursion across all 24,000 (scenario, hour) pairs — drops by 44% from the baseline to λ=2.0; that's the cleanest signal that the physics penalty is doing what it's supposed to do.
+### Slice: per-channel distributional fidelity
+A single feasibility number hides the channel-level cost. Here is the standard deviation of generated samples relative to the test split, per channel, per model:
+| Channel | Test std | Unconstrained | PC-DDPM λ=1.0 | PC-DDPM λ=2.0 |
+|---|---|---|---|---|
+| Wind | 686.8 MW | 70% | 67% | 67% |
+| Solar | 243.4 MW | 35% | 36% | 35% |
+| Load | 811.5 MW | 103% | 93% | 89% |
+The physics penalty mildly suppresses wind diversity (3 percentage points from unconstrained to λ=2.0); load diversity gives up 14 points. Every model severely underestimates solar diversity — about 35% of the test std — but that's a DDPM-side limitation, not a physics one (the unconstrained model has the same problem). The asymmetric solar distribution, with heavy zero mass overnight, is what the diffusion model regresses toward the mean on. The W1 distances on solar are nonetheless close to baseline, so this is a std-vs-distance gap; the marginals match in shape, the variance contracts.
+<!-- TODO(slice-figure): docs/slice_metrics.png — bar chart of W1 distance per
+     channel × λ. Trivial to generate from the metric JSONs in
+     results/eval/from-epec/. -->
+## Discussion
+**What was surprising.** The IEEE 118-bus topology has a hard ceiling on strict-bound feasibility. Loaded with Alberta-scale wind, solar, and load, the network develops systematic under-voltage at three buses (20, 21, 43); together they account for almost every violation. Even the *training data itself* — real, observed AESO operating points — sits at 52.6% step feasibility under [0.89, 1.05] pu. PC-DDPM at λ=2.0 reaches 69.7%, which is above the data ceiling; the physics penalty has learned to avoid the voltage-stress conditions that real Alberta encounters routinely. We must still be careful: "above the data ceiling" reads better than it really is, because the model achieves it partly by avoiding high-load conditions (load std drops to 89% of test). Under relaxed operational bounds, the violation problem essentially vanishes for every model — the strict-bound regime is the diagnostic one.
+**Where it fails.** Three failure modes worth being explicit about. First, the solar diversity collapse described above — every DDPM in the ablation, including the unconstrained baseline, generates samples with about a third of the test-set solar std; this is a generation-stack issue and the physics penalty does nothing to fix it. Second, the load fidelity cost: at λ=2.0, Wasserstein-1 on load grows from 521 MW (unconstrained) to 799 MW, a 53% degradation. A planner sizing reserves for high-load winter peaks would notice this. Third, the headline 100% operational feasibility comes from a network where the operational bounds are forgiving enough that the *training data is already 99% feasible*; the meaningful comparison is the strict ANSI band, where PC-DDPM beats the baseline by 8.7 points but is still far from acceptable for downstream automation.
+**Competence boundary.** PC-DDPM generates hourly trajectories on a public proxy network, evaluated under voltage feasibility only. It is not appropriate for sub-hourly intervals, real-time control loops, balancing authorities outside Alberta without retraining, deployment on the actual AESO network (we use IEEE 118-bus precisely because real AESO topology is non-public; transfer is not validated), or any thermal-feasibility claim (line loadings stay under 15% on this network at Alberta scale; thermal limits are not exercised). The model card carries the full breakdown.
 ## Reproducibility
 ```bash
+git clone https://github.com/JBobyM/pc-ddpm-alberta.git
 cd pc-ddpm-alberta
+make install                    # installs runtime + dev deps, sets up nbstripout
+make data                       # rebuilds combined_hourly.csv from raw AESO + ERA5
+make train                      # logs to W&B; writes weights to models/
+make eval                       # writes metrics + npz samples to results/eval/
+make serve                      # runs the Gradio app locally on :7860
 ```
+- **Compute** — 2× NVIDIA RTX 3090 (24 GB each), used one at a time during training; CUDA 12.4, PyTorch 2.6.
+- **Wall time** — Training 200 epochs ≈ 1.6 h; the canonical multi-model evaluation (3 models × 24,000 power-flow runs) ≈ 19 min on a single CPU thread per pandapower constraint.
+- **Determinism** — `RANDOM_SEED = 42` lives in `src/pc_ddpm_alberta/config.py`; the README's headline numbers reproduce with that seed and the configs in `configs/default.json`.
+- **Pretrained weights** — pulled at runtime from [`jbobym/pc-ddpm-alberta`](https://huggingface.co/jbobym/pc-ddpm-alberta) on the Hugging Face Hub; `make serve` does this automatically.
+- **W&B run backing the headline** — TBD; the headline numbers come from the upstream pc-ddpm-epec2026 evaluation, which predates W&B integration in this repo.
 ## Limitations
+The full breakdown lives in [`model_card.md`](model_card.md); the short version:
+- **Voltage only.** We evaluate scenario feasibility against bus voltage limits; line thermal limits are not exercised on the IEEE 118-bus network at Alberta loadings (max line loading p95 ≈ 14%). A real AESO planner would want both checks; we cannot validate the thermal one against this proxy.
+- **Proxy network.** The IEEE 118-bus is a published academic benchmark, not the actual AESO grid (which is non-public). The empirical numbers may not transfer; the methodology should.
+- **Hourly granularity only.** No sub-hourly intervals, no ramp constraints, no unit commitment. Operational reserve products that require 15-minute resolution are out of scope.
+- **Solar diversity collapse.** Across every DDPM in the ablation, generated solar samples have ~35% of the test-split standard deviation. The physics penalty does not cause this and does not fix it; it's a known DDPM failure mode on heavy-zero-mass distributions.
+- **Fidelity cost.** Load Wasserstein-1 grows by 53% from unconstrained to λ=2.0. The model trades distributional match for feasibility; whether that's the right trade depends on the downstream task.
 ## References
+- Hoseinpour, S., & Dvorkin, Y. (2025). *Manifold-constrained gradient guidance for AC-feasibility on IEEE 118-bus snapshots.* arXiv:2506.11281.
+- Dong, J., et al. (2025). *Pattern-guided diffusion for renewable energy scenario generation.* Applied Energy, 385.
+- Zhang, Y., et al. (2025). *Deep generative models for energy systems: a 228-paper survey.* Applied Energy, 380.
+- Hamilton, W. L., Ying, R., & Leskovec, J. (2017). *Inductive representation learning on large graphs (GraphSAGE).* NeurIPS.
 ## License
 ## Citation
 If this was useful, please cite as:
 ```
 @misc{mesadieu2026pcddpm,
   author = {Mesadieu, John Boby},
   title  = {Physics-Constrained Diffusion for Renewable Energy Scenario Generation: Alberta Grid},
   year   = {2026},
+  url    = {https://github.com/JBobyM/pc-ddpm-alberta}
 }
 ```

docs/headline.png ADDED Viewed

docs/slice_metrics.png ADDED Viewed