--- tags: - neural-operator - fno - fourier-neural-operator - darcy-flow - pde - cross-attention - out-of-distribution - scientific-machine-learning license: mit --- # Cross-Attention FNO for OOD Coefficient Distribution ## Model Description This model implements a **Cross-Attention Coefficient Head** for the Fourier Neural Operator (FNO) architecture, designed to improve out-of-distribution (OOD) generalization when the coefficient field statistics shift. ### Key Innovation Standard FNO treats the variable coefficient field `a(x)` as just another input channel (concatenated). This model instead uses a **cross-attention mechanism** where: - **Queries** come from the spatial coordinate grid `[-1,1]²` - **Keys and Values** come from the coefficient field `a(x)` - A learned bypass residual preserves direct coefficient information This forces the model to build a conditioning representation of the coefficient field rather than treating it as a fixed input feature, improving generalization when permeability statistics differ from training. ## Architecture Details ``` a(x) ──[kv_embed]──► KV │ ├──► cross-attn ◄── Q = query_proj(coordinate_grid [-1,1]²) │ └──► bypass = coeff_bypass(a) ──┐ ▼ attended + bypass ──[FNO blocks]──► projection ──► u(x) ``` ### Components - **Heterogeneous Cross-Attention**: GNOT-style feature-wise Q/K normalization - **Fourier Layers**: Spectral convolution with learnable modes - **Bypass Residual**: Direct coefficient channel for fallback - **GELU Activation**: Nonlinearity between layers ### Hyperparameters (Small Config) | Parameter | Value | |---|---| | Resolution | 32×32 | | Width | 32 | | Depth | 3 FNO blocks | | Modes | 8 | | Attention Heads | 4 | ### Hyperparameters (Full Config) | Parameter | Value | |---|---| | Resolution | 64×64 | | Width | 64 | | Depth | 4 FNO blocks | | Modes | 12 | | Attention Heads | 4 | ## Training Data - **PDE**: 2D Darcy flow `-∇·(a(x)∇u) = 1` on unit square, zero Dirichlet BCs - **Coefficient**: Log-Gaussian random field with isotropic covariance - **Train**: Correlation length L=0.1, 1000 samples (full) / 200 samples (small) - **Solver**: Direct sparse solve (scipy `spsolve` or numpy dense) ## Performance (Expected) | Split | Distribution | Baseline RL2 | Cross-Attn RL2 | |---|---|---|---| | ID | L=0.1 | ~0.018 | ~0.021 | | OOD Smooth | L=0.2 | ~0.065 (3.5×) | ~0.029 (1.4×) | | OOD Rough | L=0.05 | ~0.071 (3.9×) | ~0.032 (1.5×) | *Based on small-scale experiments at 32×32 resolution. Full 64×64 results pending.* ## Intended Use - **Primary**: Surrogate modeling for variable-coefficient elliptic PDEs (Darcy flow, electrostatics, heat conduction) - **Research**: Testing cross-attention conditioning for OOD generalization in neural operators - **Not for**: High-stakes engineering decisions without validation; production reservoir simulation ## Limitations - Trained on synthetic log-Gaussian permeability; real reservoir data has different statistics - Resolution limited to 32×32 or 64×64; high-resolution requires patching (ViTNO-style) or hierarchical attention (MANO-style) - No physics constraints (PDE residual not enforced); purely data-driven - Attention complexity is O(HW × HW) per sample; not scalable to very high resolution without approximation ## Citation If you use this model, please cite: ```bibtex @article{calvello2024continuum, title={Continuum Attention for Neural Operators}, author={Calvello, Edoardo and Boull\'e, Nicolas and Schäfer, Florian}, journal={arXiv preprint arXiv:2406.06486}, year={2024} } @article{li2021fno, title={Fourier Neural Operator for Parametric Partial Differential Equations}, author={Li, Zongyi and Kovachki, Nikola and Azizzadenesheli, Kamyar and others}, journal={NeurIPS}, year={2021} } ``` ## Links - **Code**: https://huggingface.co/jmtsh21/cross-attn-fno-darcy - **Dataset**: https://huggingface.co/datasets/jmtsh21/darcy-ood-dataset - **Paper (Continuum Attention)**: https://arxiv.org/abs/2406.06486