pc-ddpm-alberta / README.md
jbobym's picture
space-deploy: bring updated README + headline/slice figures from master
b7b86f2
---
title: PC-DDPM Alberta
emoji:
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 6.14.0
python_version: '3.13'
app_file: app.py
pinned: false
license: mit
short_description: Physics-constrained DDPM for Alberta wind/solar/load.
---
<!--
This is the DELIVERABLE README — what visitors see when they land on
your GitHub repo. /init-project will populate the [BRACKETS] from
PROJECT_BRIEF.md; /build-readme will fill the rest after you have
results to put in it.
The structure below is deliberate and matches what hiring managers
look for in the first 90 seconds: headline figure, quantified impact,
and architecture-level clarity right at the top. Don't reorder the
sections — the order is the message.
For instructions on USING the template (slash commands, hooks, etc.),
see TEMPLATE_USAGE.md.
The YAML frontmatter above is HF Spaces metadata (sdk, app_file, etc.).
GitHub renders it as a small box at the top of the README; HF reads
the values to configure the Space. /build-readme should preserve it.
-->
# Physics-Constrained Diffusion for Renewable Energy Scenario Generation: Alberta Grid
> **Grid planners need scenario sets that are statistically realistic AND physically consistent — current methods deliver one or the other, not both.** PC-DDPM achieves 100% scenario feasibility under operational voltage bounds [0.85, 1.10] pu, vs 1.8% scenario infeasibility for the unconstrained DDPM baseline, on 9 years of real AESO data.
<!-- TODO(headline-figure): generate docs/headline.png via /eval-report or by hand
(suggest: 3-panel — left: 24h scenario fan over a real Alberta day;
middle: feasibility bars across λ ablation; right: V violation distribution
unconstrained vs PC-DDPM). README claims it, so the file must exist before
the repo flips to public. -->
![headline](docs/headline.png)
**[Live demo](https://huggingface.co/spaces/jbobym/pc-ddpm-alberta)** · **[Model card](model_card.md)** · **[Blog post](writeup/BLOG_POST.md)** · **[Pretrained weights](https://huggingface.co/jbobym/pc-ddpm-alberta)**
## TL;DR
- **What I built** — A physics-constrained diffusion model that generates ensembles of 24-hour wind/solar/load scenarios that satisfy AC power flow on the IEEE 118-bus network scaled to Alberta loads.
- **How well it works** — Under operational voltage bounds [0.85, 1.10] pu, PC-DDPM (λ=2.0) is feasible on 100% of scenarios; the unconstrained baseline leaves 1.8% with at least one infeasible hour. Under the tighter ANSI Range B bounds [0.89, 1.05] pu, scenario feasibility climbs from 30.3% to 39.0% and step feasibility from 56.2% to 69.7%; both numbers come from 1,000 scenarios × 24 hours on the chronological 2025 test split.
- **Try it** — [Live demo on HF Spaces](https://huggingface.co/spaces/jbobym/pc-ddpm-alberta) (one click, no auth; ~40 s for 24 scenarios on cpu-basic).
- **Reproduce it** — `git clone … && make install && make train && make eval` (see [Reproducibility](#reproducibility)).
- **Where it fails** — Feasibility is bought at a fidelity cost: at λ=2.0, the load Wasserstein-1 distance reaches 799 MW, up from 521 MW for the unconstrained baseline — a 53% increase. The model partly achieves feasibility by avoiding high-load conditions, which is exactly what a stress-testing planner needs to keep — a real limitation, not a tunable knob.
---
## Motivation
Grid operators routinely run scenario sets through power-flow solvers as part of reserve sizing and contingency analysis; the scenarios that violate voltage limits get filtered out, and only the remainder informs planning decisions. When a generative model produces statistically realistic samples that still violate AC feasibility, the operator pays twice: once for the compute that drew the bad scenarios, and once for the coverage gap left by throwing them away. On the IEEE 118-bus proxy network we use here, an off-the-shelf DDPM trained on real AESO data leaves roughly 70% of generated scenarios infeasible under strict bounds; that's not a useful starting point for an operator who needs to stress-test reserve adequacy.
Embedding physics into the training objective — rather than filtering at sampling time, or applying post-hoc rejection — directly attacks the loss in coverage. We do not claim production readiness; we evaluate on a published proxy network because real AESO topology is non-public, and we measure voltage feasibility only because line thermal limits are not exercised at Alberta loadings on this network. The contribution is a clean methodological case study, on real operational data, of a physics-aware training pipeline that the survey by Zhang et al. (2025) flags as the open problem.
## Approach
<!-- TODO(architecture-diagram): generate docs/architecture.png from
docs/architecture.md (mermaid or excalidraw). The walking-tour below
describes what the diagram should show. -->
![architecture](docs/architecture.png)
PC-DDPM is a 1D temporal U-Net trained on 24-hour windows of the joint (wind, solar, load) series; the diffusion target is the standard ε-prediction objective with a cosine β schedule, T=1000. On top of that, every training step pays a voltage-violation penalty: we denormalise the predicted samples, allocate them to the 118 buses through fixed wind/solar/load mappings, run them through a frozen GraphSAGE surrogate that approximates AC power flow, and apply a ReLU penalty against the [0.89, 1.05] pu band. We train the surrogate once, separately, to <3% MAPE on bus voltages, and freeze its weights during DDPM training so the physics signal is a stable target rather than a moving one.
Three things are worth calling out, all of which differ from prior work:
1. **Training-time penalty, not sampling-time guidance.** Hoseinpour & Dvorkin (2025) embed AC feasibility via gradient guidance applied at sampling time on synthetic IEEE 118-bus snapshots without temporal structure; we embed it in the training objective and train on real 9-year AESO data with 24-hour windows. Dong et al. (2025) also use diffusion for renewable scenarios but apply physics as a post-hoc filter on generated samples; we never filter, we shape the training distribution.
2. **Three-phase λ_phys annealing.** Pure DDPM warm-up for the first 60 epochs (λ=0); a gradual ramp from 0 to the target value over the next 80 epochs; a fixed-penalty fine-tune for the rest. Without the warm-up, training collapses on the larger penalty scales — the model never learns the data distribution well enough for the physics gradients to be meaningful. The schedule is the difference between a working run and one that drifts to a constant generation pattern.
3. **The fidelity-feasibility trade-off is named, not papered over.** We run a λ_phys=0/1/2 ablation and report the cost: load distributional fidelity degrades from W1=521 MW (unconstrained) to W1=799 MW (λ=2.0), a 53% increase. The model partly achieves voltage feasibility by avoiding the high-load conditions where the strict bounds bite hardest. For a planner who wants to *stress* the network with high loads, that's a real limitation; for one who wants the bulk of usable scenarios, it's the right knob.
The architecture mirrors the pc-ddpm-epec2026 reference implementation; the inference path pulls weights from the [Hugging Face Hub model repo](https://huggingface.co/jbobym/pc-ddpm-alberta) at runtime.
## Data
| Dataset | Source | Coverage | License |
|---|---|---|---|
| AESO hourly wind, solar, load | AESO Open Data | 2016-01-01 → 2025-07-31 (~84,000 hours) | Public, attribution |
| ERA5 weather reanalysis (wind speed, solar radiation, temperature) | Copernicus Climate Data Store | 2016 → 2025, Alberta zones | CC-BY-4.0 |
The merged hourly grid is dense — 0% missingness across all six columns, no duplicate timestamps, full Alberta climatic range. Splits are chronological with no shuffle: train 2016 → 2023, val 2024, test 2025; the demo's reference days come from the test tail (2025-07-04, 2025-01-27) and one earlier spring day (2018-03-17) so the typical / high-wind / low-wind examples span seasons. Detailed schema, range stats, and ERA5 interpolation gotchas live in [`eda/data_profile.md`](eda/data_profile.md).
## Results
### Headline
We compute all numbers on the same 1,000 scenarios × 24 hours, sampled with a fixed seed; pandapower Newton-Raphson solves every (scenario, hour) pair, and we check feasibility post-convergence under both bound regimes. The "scenario feasibility" column is the strict reading: a scenario counts as feasible only when every one of its 24 hours is feasible at every bus.
| Model | Operational [0.85, 1.10] pu | Strict ANSI [0.89, 1.05] pu | Step feas. (strict) | V violation p95 (pu) |
|---|---|---|---|---|
| Unconstrained DDPM | 98.2% (982/1000) | 30.3% (303/1000) | 56.2% | 0.0228 |
| PC-DDPM, λ=1.0 | 99.7% (997/1000) | 34.7% (347/1000) | 64.9% | 0.0157 |
| **PC-DDPM, λ=2.0** | **100.0% (1000/1000)** | **39.0% (390/1000)** | **69.7%** | **0.0127** |
V violation p95 — the 95th-percentile worst voltage excursion across all 24,000 (scenario, hour) pairs — drops by 44% from the baseline to λ=2.0; that's the cleanest signal that the physics penalty is doing what it's supposed to do.
### Slice: per-channel distributional fidelity
A single feasibility number hides the channel-level cost. Here is the standard deviation of generated samples relative to the test split, per channel, per model:
| Channel | Test std | Unconstrained | PC-DDPM λ=1.0 | PC-DDPM λ=2.0 |
|---|---|---|---|---|
| Wind | 686.8 MW | 70% | 67% | 67% |
| Solar | 243.4 MW | 35% | 36% | 35% |
| Load | 811.5 MW | 103% | 93% | 89% |
The physics penalty mildly suppresses wind diversity (3 percentage points from unconstrained to λ=2.0); load diversity gives up 14 points. Every model severely underestimates solar diversity — about 35% of the test std — but that's a DDPM-side limitation, not a physics one (the unconstrained model has the same problem). The asymmetric solar distribution, with heavy zero mass overnight, is what the diffusion model regresses toward the mean on. The W1 distances on solar are nonetheless close to baseline, so this is a std-vs-distance gap; the marginals match in shape, the variance contracts.
<!-- TODO(slice-figure): docs/slice_metrics.png — bar chart of W1 distance per
channel × λ. Trivial to generate from the metric JSONs in
results/eval/from-epec/. -->
## Discussion
**What was surprising.** The IEEE 118-bus topology has a hard ceiling on strict-bound feasibility. Loaded with Alberta-scale wind, solar, and load, the network develops systematic under-voltage at three buses (20, 21, 43); together they account for almost every violation. Even the *training data itself* — real, observed AESO operating points — sits at 52.6% step feasibility under [0.89, 1.05] pu. PC-DDPM at λ=2.0 reaches 69.7%, which is above the data ceiling; the physics penalty has learned to avoid the voltage-stress conditions that real Alberta encounters routinely. We must still be careful: "above the data ceiling" reads better than it really is, because the model achieves it partly by avoiding high-load conditions (load std drops to 89% of test). Under relaxed operational bounds, the violation problem essentially vanishes for every model — the strict-bound regime is the diagnostic one.
**Where it fails.** Three failure modes worth being explicit about. First, the solar diversity collapse described above — every DDPM in the ablation, including the unconstrained baseline, generates samples with about a third of the test-set solar std; this is a generation-stack issue and the physics penalty does nothing to fix it. Second, the load fidelity cost: at λ=2.0, Wasserstein-1 on load grows from 521 MW (unconstrained) to 799 MW, a 53% degradation. A planner sizing reserves for high-load winter peaks would notice this. Third, the headline 100% operational feasibility comes from a network where the operational bounds are forgiving enough that the *training data is already 99% feasible*; the meaningful comparison is the strict ANSI band, where PC-DDPM beats the baseline by 8.7 points but is still far from acceptable for downstream automation.
**Competence boundary.** PC-DDPM generates hourly trajectories on a public proxy network, evaluated under voltage feasibility only. It is not appropriate for sub-hourly intervals, real-time control loops, balancing authorities outside Alberta without retraining, deployment on the actual AESO network (we use IEEE 118-bus precisely because real AESO topology is non-public; transfer is not validated), or any thermal-feasibility claim (line loadings stay under 15% on this network at Alberta scale; thermal limits are not exercised). The model card carries the full breakdown.
## Reproducibility
```bash
git clone https://github.com/JBobyM/pc-ddpm-alberta.git
cd pc-ddpm-alberta
make install # installs runtime + dev deps, sets up nbstripout
make data # rebuilds combined_hourly.csv from raw AESO + ERA5
make train # logs to W&B; writes weights to models/
make eval # writes metrics + npz samples to results/eval/
make serve # runs the Gradio app locally on :7860
```
- **Compute** — 2× NVIDIA RTX 3090 (24 GB each), used one at a time during training; CUDA 12.4, PyTorch 2.6.
- **Wall time** — Training 200 epochs ≈ 1.6 h; the canonical multi-model evaluation (3 models × 24,000 power-flow runs) ≈ 19 min on a single CPU thread per pandapower constraint.
- **Determinism** — `RANDOM_SEED = 42` lives in `src/pc_ddpm_alberta/config.py`; the README's headline numbers reproduce with that seed and the configs in `configs/default.json`.
- **Pretrained weights** — pulled at runtime from [`jbobym/pc-ddpm-alberta`](https://huggingface.co/jbobym/pc-ddpm-alberta) on the Hugging Face Hub; `make serve` does this automatically.
- **W&B run backing the headline** — TBD; the headline numbers come from the upstream pc-ddpm-epec2026 evaluation, which predates W&B integration in this repo.
## Limitations
The full breakdown lives in [`model_card.md`](model_card.md); the short version:
- **Voltage only.** We evaluate scenario feasibility against bus voltage limits; line thermal limits are not exercised on the IEEE 118-bus network at Alberta loadings (max line loading p95 ≈ 14%). A real AESO planner would want both checks; we cannot validate the thermal one against this proxy.
- **Proxy network.** The IEEE 118-bus is a published academic benchmark, not the actual AESO grid (which is non-public). The empirical numbers may not transfer; the methodology should.
- **Hourly granularity only.** No sub-hourly intervals, no ramp constraints, no unit commitment. Operational reserve products that require 15-minute resolution are out of scope.
- **Solar diversity collapse.** Across every DDPM in the ablation, generated solar samples have ~35% of the test-split standard deviation. The physics penalty does not cause this and does not fix it; it's a known DDPM failure mode on heavy-zero-mass distributions.
- **Fidelity cost.** Load Wasserstein-1 grows by 53% from unconstrained to λ=2.0. The model trades distributional match for feasibility; whether that's the right trade depends on the downstream task.
## References
- Hoseinpour, S., & Dvorkin, Y. (2025). *Manifold-constrained gradient guidance for AC-feasibility on IEEE 118-bus snapshots.* arXiv:2506.11281.
- Dong, J., et al. (2025). *Pattern-guided diffusion for renewable energy scenario generation.* Applied Energy, 385.
- Zhang, Y., et al. (2025). *Deep generative models for energy systems: a 228-paper survey.* Applied Energy, 380.
- Hamilton, W. L., Ying, R., & Leskovec, J. (2017). *Inductive representation learning on large graphs (GraphSAGE).* NeurIPS.
## License
MIT (code) · AESO Open Data (with attribution) · ERA5 CC-BY-4.0 (Copernicus Climate Data Store)
## Citation
If this was useful, please cite as:
```
@misc{mesadieu2026pcddpm,
author = {Mesadieu, John Boby},
title = {Physics-Constrained Diffusion for Renewable Energy Scenario Generation: Alberta Grid},
year = {2026},
url = {https://github.com/JBobyM/pc-ddpm-alberta}
}
```