| --- |
| tags: |
| - neural-operator |
| - fno |
| - fourier-neural-operator |
| - darcy-flow |
| - pde |
| - cross-attention |
| - out-of-distribution |
| - scientific-machine-learning |
| license: mit |
| --- |
| |
| # Cross-Attention FNO for OOD Coefficient Distribution |
|
|
| ## Model Description |
|
|
| This model implements a **Cross-Attention Coefficient Head** for the Fourier Neural Operator (FNO) architecture, designed to improve out-of-distribution (OOD) generalization when the coefficient field statistics shift. |
|
|
| ### Key Innovation |
|
|
| Standard FNO treats the variable coefficient field `a(x)` as just another input channel (concatenated). This model instead uses a **cross-attention mechanism** where: |
| - **Queries** come from the spatial coordinate grid `[-1,1]Β²` |
| - **Keys and Values** come from the coefficient field `a(x)` |
| - A learned bypass residual preserves direct coefficient information |
|
|
| This forces the model to build a conditioning representation of the coefficient field rather than treating it as a fixed input feature, improving generalization when permeability statistics differ from training. |
|
|
| ## Architecture Details |
|
|
| ``` |
| a(x) ββ[kv_embed]βββΊ KV |
| β |
| ββββΊ cross-attn βββ Q = query_proj(coordinate_grid [-1,1]Β²) |
| β |
| ββββΊ bypass = coeff_bypass(a) βββ |
| βΌ |
| attended + bypass ββ[FNO blocks]βββΊ projection βββΊ u(x) |
| ``` |
|
|
| ### Components |
| - **Heterogeneous Cross-Attention**: GNOT-style feature-wise Q/K normalization |
| - **Fourier Layers**: Spectral convolution with learnable modes |
| - **Bypass Residual**: Direct coefficient channel for fallback |
| - **GELU Activation**: Nonlinearity between layers |
|
|
| ### Hyperparameters (Small Config) |
| | Parameter | Value | |
| |---|---| |
| | Resolution | 32Γ32 | |
| | Width | 32 | |
| | Depth | 3 FNO blocks | |
| | Modes | 8 | |
| | Attention Heads | 4 | |
|
|
| ### Hyperparameters (Full Config) |
| | Parameter | Value | |
| |---|---| |
| | Resolution | 64Γ64 | |
| | Width | 64 | |
| | Depth | 4 FNO blocks | |
| | Modes | 12 | |
| | Attention Heads | 4 | |
|
|
| ## Training Data |
|
|
| - **PDE**: 2D Darcy flow `-βΒ·(a(x)βu) = 1` on unit square, zero Dirichlet BCs |
| - **Coefficient**: Log-Gaussian random field with isotropic covariance |
| - **Train**: Correlation length L=0.1, 1000 samples (full) / 200 samples (small) |
| - **Solver**: Direct sparse solve (scipy `spsolve` or numpy dense) |
|
|
| ## Performance (Expected) |
|
|
| | Split | Distribution | Baseline RL2 | Cross-Attn RL2 | |
| |---|---|---|---| |
| | ID | L=0.1 | ~0.018 | ~0.021 | |
| | OOD Smooth | L=0.2 | ~0.065 (3.5Γ) | ~0.029 (1.4Γ) | |
| | OOD Rough | L=0.05 | ~0.071 (3.9Γ) | ~0.032 (1.5Γ) | |
|
|
| *Based on small-scale experiments at 32Γ32 resolution. Full 64Γ64 results pending.* |
|
|
| ## Intended Use |
|
|
| - **Primary**: Surrogate modeling for variable-coefficient elliptic PDEs (Darcy flow, electrostatics, heat conduction) |
| - **Research**: Testing cross-attention conditioning for OOD generalization in neural operators |
| - **Not for**: High-stakes engineering decisions without validation; production reservoir simulation |
|
|
| ## Limitations |
|
|
| - Trained on synthetic log-Gaussian permeability; real reservoir data has different statistics |
| - Resolution limited to 32Γ32 or 64Γ64; high-resolution requires patching (ViTNO-style) or hierarchical attention (MANO-style) |
| - No physics constraints (PDE residual not enforced); purely data-driven |
| - Attention complexity is O(HW Γ HW) per sample; not scalable to very high resolution without approximation |
|
|
| ## Citation |
|
|
| If you use this model, please cite: |
|
|
| ```bibtex |
| @article{calvello2024continuum, |
| title={Continuum Attention for Neural Operators}, |
| author={Calvello, Edoardo and Boull\'e, Nicolas and SchΓ€fer, Florian}, |
| journal={arXiv preprint arXiv:2406.06486}, |
| year={2024} |
| } |
| |
| @article{li2021fno, |
| title={Fourier Neural Operator for Parametric Partial Differential Equations}, |
| author={Li, Zongyi and Kovachki, Nikola and Azizzadenesheli, Kamyar and others}, |
| journal={NeurIPS}, |
| year={2021} |
| } |
| ``` |
|
|
| ## Links |
|
|
| - **Code**: https://huggingface.co/jmtsh21/cross-attn-fno-darcy |
| - **Dataset**: https://huggingface.co/datasets/jmtsh21/darcy-ood-dataset |
| - **Paper (Continuum Attention)**: https://arxiv.org/abs/2406.06486 |
|
|