jbobym commited on
Commit
b7b86f2
·
1 Parent(s): 93ed35a

space-deploy: bring updated README + headline/slice figures from master

Browse files
Files changed (3) hide show
  1. README.md +76 -64
  2. docs/headline.png +0 -0
  3. docs/slice_metrics.png +0 -0
README.md CHANGED
@@ -33,120 +33,131 @@ the values to configure the Space. /build-readme should preserve it.
33
 
34
  # Physics-Constrained Diffusion for Renewable Energy Scenario Generation: Alberta Grid
35
 
36
- > **Grid planners need scenario sets that are statistically realistic AND physically consistent — current methods deliver one or the other, not both.** PC-DDPM achieves 100% scenario feasibility under operational voltage bounds [0.85, 1.10] pu, vs 1.8% scenario infeasibility for the unconstrained DDPM baseline, on 9 years of real AESO data.
37
 
 
 
 
 
 
38
  ![headline](docs/headline.png)
39
 
40
- **[Live demo](TBD post-deploy)** · **[Model card](model_card.md)** · **[Blog post](writeup/BLOG_POST.md)** · **[W&B project](https://wandb.ai/jbobym/pc-ddpm-alberta)**
41
 
42
  ## TL;DR
43
 
44
- - **What we built** — A physics-constrained diffusion model that generates 24-hour wind/solar/load scenarios respecting AC power flow on the IEEE 118-bus network scaled to Alberta loads.
45
- - **How well it works** — 0% 100% scenario feasibility under operational voltage bounds vs. unconstrained DDPM, measured on the 2025 chronological test split (1,000 scenarios × 24 hours).
46
- - **Try it** — [Live demo](TBD post-deploy) (one click, no auth)
47
- - **Reproduce it** — `git clone … && make train` (see [Reproducibility](#reproducibility))
48
- - **Where it fails** — At λ_phys=2.0, load distributional fidelity degrades 36% (Wasserstein-1 521→799 MW) the model partly achieves feasibility by avoiding high-load conditions, which is exactly what a stress-testing planner needs to keep.
49
 
50
  ---
51
 
52
  ## Motivation
53
 
54
- [Why does this problem matter, in stakeholder terms? Who suffers when
55
- it's unsolved? Two to four sentences. Lead with the operational pain
56
- point, not the ML problem framing.]
57
 
58
- ## Approach
59
 
60
- [What we built, at the level a senior IC could understand without
61
- reading the code. Diagram comes from `docs/architecture.md`.]
62
 
 
 
 
63
  ![architecture](docs/architecture.png)
64
 
65
- Three things to call out:
 
 
 
 
 
 
66
 
67
- 1. **Training-time penalty, not sampling-time guidance** — Hoseinpour & Dvorkin (2025) embed AC feasibility via gradient guidance at sampling time on synthetic IEEE benchmarks; we embed it in the training objective via a frozen GraphSAGE surrogate, and we train on real 9-year AESO data with 24-hour temporal structure.
68
- 2. **Three-phase λ_phys annealing** — pure DDPM warm-up, gradual physics ramp, fixed-penalty fine-tune. Without it, training collapses on the larger penalty scales.
69
- 3. **Honest characterisation of the feasibility-fidelity trade-off** — λ_phys=0/1/2 ablation; we name the cost (load W1 +36% at λ=2.0) instead of papering over it.
70
- 2. **Real operational data, not synthetic benchmarks** — Dong et al. (2025) and Hoseinpour & Dvorkin (2025) train on synthetic IEEE perturbations or pattern-guided sampling; we train on 9 years of real AESO wind/solar/load with 24-hour temporal structure.
71
- 3. **The empirical case study a 228-paper survey said was missing** — Zhang et al. (2025) identify physics-constrained generation as the open problem across deep generative work in energy systems; we provide the AC-feasibility evaluation that survey calls for.
72
 
73
  ## Data
74
 
75
  | Dataset | Source | Coverage | License |
76
  |---|---|---|---|
77
- | AESO hourly wind/solar/load | AESO Open Data | 2016-01 to 2025-12 (~84,000 hours) | Public, attribution |
78
- | ERA5 weather reanalysis | Copernicus Climate Data Store | 2016-2025, Alberta zones | CC-BY-4.0 |
79
 
80
- Splits: chronological, no shuffle. Train 2016-2023; val 2024; test 2025.
81
- Detailed schema and gotchas in `eda/data_profile.md`.
82
 
83
  ## Results
84
 
85
  ### Headline
86
 
87
- | Method | [METRIC] | [SECONDARY METRIC] | [SLICE e.g., test/extreme] |
88
- |---|---|---|---|
89
- | Naive baseline | — | — | — |
90
- | Strong baseline | — | — | — |
91
- | **Ours** | **—** | **—** | **—** |
92
 
93
- ### Slice analysis
 
 
 
 
94
 
95
- [The point of slicing: a single overall metric hides where the model
96
- fails. Replace this with results from `/eval-report`.]
97
 
98
- ![slice](docs/slice_metrics.png)
99
 
100
- ### Calibration / robustness
101
 
102
- [For regression with uncertainty: calibration curve. For classification:
103
- confusion matrix or reliability diagram. Drop this section if it doesn't apply.]
 
 
 
104
 
105
- ## Discussion
106
 
107
- [The honest discussion. Three paragraphs:]
 
 
108
 
109
- **What was surprising.** [Something you didn't expect — these become talking points in interviews.]
110
 
111
- **Where it fails.** [Be specific. "Underforecasts peak hours by 8% on average. Fails on transmission outages." Hiring managers trust calibrated honesty more than puffed-up claims.]
112
 
113
- **Competence boundary.** [What's the model's domain of validity? When should the user NOT use this?]
 
 
114
 
115
  ## Reproducibility
116
 
117
  ```bash
118
- git clone [REPO URL]
119
  cd pc-ddpm-alberta
120
- pip install -e ".[dev]"
121
- nbstripout --install # one-time per clone
122
-
123
- # Pull data (if external; otherwise see data/README.md)
124
- make data
125
-
126
- # Train + evaluate
127
- make train
128
- make eval
129
-
130
- # Run the demo locally
131
- make serve
132
  ```
133
 
134
- - **Compute**: [GPU + RAM]
135
- - **Wall time**: [TRAIN TIME], [EVAL TIME]
136
- - **Determinism**: seeds set in `src/pc_ddpm_alberta/config.py` (`RANDOM_SEED = 42`)
137
- - **W&B run** backing the headline metric: [BEST RUN URL]
 
138
 
139
  ## Limitations
140
 
141
- [Detailed limitations longer than the TL;DR bullet. Hiring managers
142
- read this to decide if you've thought hard about your work. A model
143
- card lives at `model_card.md` with the full breakdown.]
 
 
 
 
144
 
145
  ## References
146
 
147
- [If you cited prior work or built on existing techniques, list it here.
148
- Format: free-form is fine for a portfolio README; switch to BibTeX-like
149
- for academic-leaning projects.]
 
150
 
151
  ## License
152
 
@@ -155,11 +166,12 @@ MIT (code) · AESO Open Data (with attribution) · ERA5 CC-BY-4.0 (Copernicus Cl
155
  ## Citation
156
 
157
  If this was useful, please cite as:
 
158
  ```
159
  @misc{mesadieu2026pcddpm,
160
  author = {Mesadieu, John Boby},
161
  title = {Physics-Constrained Diffusion for Renewable Energy Scenario Generation: Alberta Grid},
162
  year = {2026},
163
- url = {[REPO URL]}
164
  }
165
  ```
 
33
 
34
  # Physics-Constrained Diffusion for Renewable Energy Scenario Generation: Alberta Grid
35
 
36
+ > **Grid planners need scenario sets that are statistically realistic AND physically consistent — current methods deliver one or the other, not both.** PC-DDPM achieves 100% scenario feasibility under operational voltage bounds [0.85, 1.10] pu, vs 1.8% scenario infeasibility for the unconstrained DDPM baseline, on 9 years of real AESO data.
37
 
38
+ <!-- TODO(headline-figure): generate docs/headline.png via /eval-report or by hand
39
+ (suggest: 3-panel — left: 24h scenario fan over a real Alberta day;
40
+ middle: feasibility bars across λ ablation; right: V violation distribution
41
+ unconstrained vs PC-DDPM). README claims it, so the file must exist before
42
+ the repo flips to public. -->
43
  ![headline](docs/headline.png)
44
 
45
+ **[Live demo](https://huggingface.co/spaces/jbobym/pc-ddpm-alberta)** · **[Model card](model_card.md)** · **[Blog post](writeup/BLOG_POST.md)** · **[Pretrained weights](https://huggingface.co/jbobym/pc-ddpm-alberta)**
46
 
47
  ## TL;DR
48
 
49
+ - **What I built** — A physics-constrained diffusion model that generates ensembles of 24-hour wind/solar/load scenarios that satisfy AC power flow on the IEEE 118-bus network scaled to Alberta loads.
50
+ - **How well it works** — Under operational voltage bounds [0.85, 1.10] pu, PC-DDPM (λ=2.0) is feasible on 100% of scenarios; the unconstrained baseline leaves 1.8% with at least one infeasible hour. Under the tighter ANSI Range B bounds [0.89, 1.05] pu, scenario feasibility climbs from 30.3% to 39.0% and step feasibility from 56.2% to 69.7%; both numbers come from 1,000 scenarios × 24 hours on the chronological 2025 test split.
51
+ - **Try it** — [Live demo on HF Spaces](https://huggingface.co/spaces/jbobym/pc-ddpm-alberta) (one click, no auth; ~40 s for 24 scenarios on cpu-basic).
52
+ - **Reproduce it** — `git clone … && make install && make train && make eval` (see [Reproducibility](#reproducibility)).
53
+ - **Where it fails** — Feasibility is bought at a fidelity cost: at λ=2.0, the load Wasserstein-1 distance reaches 799 MW, up from 521 MW for the unconstrained baseline — a 53% increase. The model partly achieves feasibility by avoiding high-load conditions, which is exactly what a stress-testing planner needs to keep — a real limitation, not a tunable knob.
54
 
55
  ---
56
 
57
  ## Motivation
58
 
59
+ Grid operators routinely run scenario sets through power-flow solvers as part of reserve sizing and contingency analysis; the scenarios that violate voltage limits get filtered out, and only the remainder informs planning decisions. When a generative model produces statistically realistic samples that still violate AC feasibility, the operator pays twice: once for the compute that drew the bad scenarios, and once for the coverage gap left by throwing them away. On the IEEE 118-bus proxy network we use here, an off-the-shelf DDPM trained on real AESO data leaves roughly 70% of generated scenarios infeasible under strict bounds; that's not a useful starting point for an operator who needs to stress-test reserve adequacy.
 
 
60
 
61
+ Embedding physics into the training objective — rather than filtering at sampling time, or applying post-hoc rejection — directly attacks the loss in coverage. We do not claim production readiness; we evaluate on a published proxy network because real AESO topology is non-public, and we measure voltage feasibility only because line thermal limits are not exercised at Alberta loadings on this network. The contribution is a clean methodological case study, on real operational data, of a physics-aware training pipeline that the survey by Zhang et al. (2025) flags as the open problem.
62
 
63
+ ## Approach
 
64
 
65
+ <!-- TODO(architecture-diagram): generate docs/architecture.png from
66
+ docs/architecture.md (mermaid or excalidraw). The walking-tour below
67
+ describes what the diagram should show. -->
68
  ![architecture](docs/architecture.png)
69
 
70
+ PC-DDPM is a 1D temporal U-Net trained on 24-hour windows of the joint (wind, solar, load) series; the diffusion target is the standard ε-prediction objective with a cosine β schedule, T=1000. On top of that, every training step pays a voltage-violation penalty: we denormalise the predicted samples, allocate them to the 118 buses through fixed wind/solar/load mappings, run them through a frozen GraphSAGE surrogate that approximates AC power flow, and apply a ReLU penalty against the [0.89, 1.05] pu band. We train the surrogate once, separately, to <3% MAPE on bus voltages, and freeze its weights during DDPM training so the physics signal is a stable target rather than a moving one.
71
+
72
+ Three things are worth calling out, all of which differ from prior work:
73
+
74
+ 1. **Training-time penalty, not sampling-time guidance.** Hoseinpour & Dvorkin (2025) embed AC feasibility via gradient guidance applied at sampling time on synthetic IEEE 118-bus snapshots without temporal structure; we embed it in the training objective and train on real 9-year AESO data with 24-hour windows. Dong et al. (2025) also use diffusion for renewable scenarios but apply physics as a post-hoc filter on generated samples; we never filter, we shape the training distribution.
75
+ 2. **Three-phase λ_phys annealing.** Pure DDPM warm-up for the first 60 epochs (λ=0); a gradual ramp from 0 to the target value over the next 80 epochs; a fixed-penalty fine-tune for the rest. Without the warm-up, training collapses on the larger penalty scales — the model never learns the data distribution well enough for the physics gradients to be meaningful. The schedule is the difference between a working run and one that drifts to a constant generation pattern.
76
+ 3. **The fidelity-feasibility trade-off is named, not papered over.** We run a λ_phys=0/1/2 ablation and report the cost: load distributional fidelity degrades from W1=521 MW (unconstrained) to W1=799 MW (λ=2.0), a 53% increase. The model partly achieves voltage feasibility by avoiding the high-load conditions where the strict bounds bite hardest. For a planner who wants to *stress* the network with high loads, that's a real limitation; for one who wants the bulk of usable scenarios, it's the right knob.
77
 
78
+ The architecture mirrors the pc-ddpm-epec2026 reference implementation; the inference path pulls weights from the [Hugging Face Hub model repo](https://huggingface.co/jbobym/pc-ddpm-alberta) at runtime.
 
 
 
 
79
 
80
  ## Data
81
 
82
  | Dataset | Source | Coverage | License |
83
  |---|---|---|---|
84
+ | AESO hourly wind, solar, load | AESO Open Data | 2016-01-01 2025-07-31 (~84,000 hours) | Public, attribution |
85
+ | ERA5 weather reanalysis (wind speed, solar radiation, temperature) | Copernicus Climate Data Store | 20162025, Alberta zones | CC-BY-4.0 |
86
 
87
+ The merged hourly grid is dense — 0% missingness across all six columns, no duplicate timestamps, full Alberta climatic range. Splits are chronological with no shuffle: train 20162023, val 2024, test 2025; the demo's reference days come from the test tail (2025-07-04, 2025-01-27) and one earlier spring day (2018-03-17) so the typical / high-wind / low-wind examples span seasons. Detailed schema, range stats, and ERA5 interpolation gotchas live in [`eda/data_profile.md`](eda/data_profile.md).
 
88
 
89
  ## Results
90
 
91
  ### Headline
92
 
93
+ We compute all numbers on the same 1,000 scenarios × 24 hours, sampled with a fixed seed; pandapower Newton-Raphson solves every (scenario, hour) pair, and we check feasibility post-convergence under both bound regimes. The "scenario feasibility" column is the strict reading: a scenario counts as feasible only when every one of its 24 hours is feasible at every bus.
 
 
 
 
94
 
95
+ | Model | Operational [0.85, 1.10] pu | Strict ANSI [0.89, 1.05] pu | Step feas. (strict) | V violation p95 (pu) |
96
+ |---|---|---|---|---|
97
+ | Unconstrained DDPM | 98.2% (982/1000) | 30.3% (303/1000) | 56.2% | 0.0228 |
98
+ | PC-DDPM, λ=1.0 | 99.7% (997/1000) | 34.7% (347/1000) | 64.9% | 0.0157 |
99
+ | **PC-DDPM, λ=2.0** | **100.0% (1000/1000)** | **39.0% (390/1000)** | **69.7%** | **0.0127** |
100
 
101
+ V violation p95 the 95th-percentile worst voltage excursion across all 24,000 (scenario, hour) pairs — drops by 44% from the baseline to λ=2.0; that's the cleanest signal that the physics penalty is doing what it's supposed to do.
 
102
 
103
+ ### Slice: per-channel distributional fidelity
104
 
105
+ A single feasibility number hides the channel-level cost. Here is the standard deviation of generated samples relative to the test split, per channel, per model:
106
 
107
+ | Channel | Test std | Unconstrained | PC-DDPM λ=1.0 | PC-DDPM λ=2.0 |
108
+ |---|---|---|---|---|
109
+ | Wind | 686.8 MW | 70% | 67% | 67% |
110
+ | Solar | 243.4 MW | 35% | 36% | 35% |
111
+ | Load | 811.5 MW | 103% | 93% | 89% |
112
 
113
+ The physics penalty mildly suppresses wind diversity (3 percentage points from unconstrained to λ=2.0); load diversity gives up 14 points. Every model severely underestimates solar diversity — about 35% of the test std — but that's a DDPM-side limitation, not a physics one (the unconstrained model has the same problem). The asymmetric solar distribution, with heavy zero mass overnight, is what the diffusion model regresses toward the mean on. The W1 distances on solar are nonetheless close to baseline, so this is a std-vs-distance gap; the marginals match in shape, the variance contracts.
114
 
115
+ <!-- TODO(slice-figure): docs/slice_metrics.png bar chart of W1 distance per
116
+ channel × λ. Trivial to generate from the metric JSONs in
117
+ results/eval/from-epec/. -->
118
 
119
+ ## Discussion
120
 
121
+ **What was surprising.** The IEEE 118-bus topology has a hard ceiling on strict-bound feasibility. Loaded with Alberta-scale wind, solar, and load, the network develops systematic under-voltage at three buses (20, 21, 43); together they account for almost every violation. Even the *training data itself* — real, observed AESO operating points — sits at 52.6% step feasibility under [0.89, 1.05] pu. PC-DDPM at λ=2.0 reaches 69.7%, which is above the data ceiling; the physics penalty has learned to avoid the voltage-stress conditions that real Alberta encounters routinely. We must still be careful: "above the data ceiling" reads better than it really is, because the model achieves it partly by avoiding high-load conditions (load std drops to 89% of test). Under relaxed operational bounds, the violation problem essentially vanishes for every model — the strict-bound regime is the diagnostic one.
122
 
123
+ **Where it fails.** Three failure modes worth being explicit about. First, the solar diversity collapse described above — every DDPM in the ablation, including the unconstrained baseline, generates samples with about a third of the test-set solar std; this is a generation-stack issue and the physics penalty does nothing to fix it. Second, the load fidelity cost: at λ=2.0, Wasserstein-1 on load grows from 521 MW (unconstrained) to 799 MW, a 53% degradation. A planner sizing reserves for high-load winter peaks would notice this. Third, the headline 100% operational feasibility comes from a network where the operational bounds are forgiving enough that the *training data is already 99% feasible*; the meaningful comparison is the strict ANSI band, where PC-DDPM beats the baseline by 8.7 points but is still far from acceptable for downstream automation.
124
+
125
+ **Competence boundary.** PC-DDPM generates hourly trajectories on a public proxy network, evaluated under voltage feasibility only. It is not appropriate for sub-hourly intervals, real-time control loops, balancing authorities outside Alberta without retraining, deployment on the actual AESO network (we use IEEE 118-bus precisely because real AESO topology is non-public; transfer is not validated), or any thermal-feasibility claim (line loadings stay under 15% on this network at Alberta scale; thermal limits are not exercised). The model card carries the full breakdown.
126
 
127
  ## Reproducibility
128
 
129
  ```bash
130
+ git clone https://github.com/JBobyM/pc-ddpm-alberta.git
131
  cd pc-ddpm-alberta
132
+ make install # installs runtime + dev deps, sets up nbstripout
133
+ make data # rebuilds combined_hourly.csv from raw AESO + ERA5
134
+ make train # logs to W&B; writes weights to models/
135
+ make eval # writes metrics + npz samples to results/eval/
136
+ make serve # runs the Gradio app locally on :7860
 
 
 
 
 
 
 
137
  ```
138
 
139
+ - **Compute** NVIDIA RTX 3090 (24 GB each), used one at a time during training; CUDA 12.4, PyTorch 2.6.
140
+ - **Wall time** Training 200 epochs ≈ 1.6 h; the canonical multi-model evaluation (3 models × 24,000 power-flow runs) ≈ 19 min on a single CPU thread per pandapower constraint.
141
+ - **Determinism** `RANDOM_SEED = 42` lives in `src/pc_ddpm_alberta/config.py`; the README's headline numbers reproduce with that seed and the configs in `configs/default.json`.
142
+ - **Pretrained weights** pulled at runtime from [`jbobym/pc-ddpm-alberta`](https://huggingface.co/jbobym/pc-ddpm-alberta) on the Hugging Face Hub; `make serve` does this automatically.
143
+ - **W&B run backing the headline** — TBD; the headline numbers come from the upstream pc-ddpm-epec2026 evaluation, which predates W&B integration in this repo.
144
 
145
  ## Limitations
146
 
147
+ The full breakdown lives in [`model_card.md`](model_card.md); the short version:
148
+
149
+ - **Voltage only.** We evaluate scenario feasibility against bus voltage limits; line thermal limits are not exercised on the IEEE 118-bus network at Alberta loadings (max line loading p95 ≈ 14%). A real AESO planner would want both checks; we cannot validate the thermal one against this proxy.
150
+ - **Proxy network.** The IEEE 118-bus is a published academic benchmark, not the actual AESO grid (which is non-public). The empirical numbers may not transfer; the methodology should.
151
+ - **Hourly granularity only.** No sub-hourly intervals, no ramp constraints, no unit commitment. Operational reserve products that require 15-minute resolution are out of scope.
152
+ - **Solar diversity collapse.** Across every DDPM in the ablation, generated solar samples have ~35% of the test-split standard deviation. The physics penalty does not cause this and does not fix it; it's a known DDPM failure mode on heavy-zero-mass distributions.
153
+ - **Fidelity cost.** Load Wasserstein-1 grows by 53% from unconstrained to λ=2.0. The model trades distributional match for feasibility; whether that's the right trade depends on the downstream task.
154
 
155
  ## References
156
 
157
+ - Hoseinpour, S., & Dvorkin, Y. (2025). *Manifold-constrained gradient guidance for AC-feasibility on IEEE 118-bus snapshots.* arXiv:2506.11281.
158
+ - Dong, J., et al. (2025). *Pattern-guided diffusion for renewable energy scenario generation.* Applied Energy, 385.
159
+ - Zhang, Y., et al. (2025). *Deep generative models for energy systems: a 228-paper survey.* Applied Energy, 380.
160
+ - Hamilton, W. L., Ying, R., & Leskovec, J. (2017). *Inductive representation learning on large graphs (GraphSAGE).* NeurIPS.
161
 
162
  ## License
163
 
 
166
  ## Citation
167
 
168
  If this was useful, please cite as:
169
+
170
  ```
171
  @misc{mesadieu2026pcddpm,
172
  author = {Mesadieu, John Boby},
173
  title = {Physics-Constrained Diffusion for Renewable Energy Scenario Generation: Alberta Grid},
174
  year = {2026},
175
+ url = {https://github.com/JBobyM/pc-ddpm-alberta}
176
  }
177
  ```
docs/headline.png ADDED
docs/slice_metrics.png ADDED