metadata
tags:
- neural-operator
- fno
- fourier-neural-operator
- darcy-flow
- pde
- cross-attention
- out-of-distribution
- scientific-machine-learning
license: mit
Cross-Attention FNO for OOD Coefficient Distribution
Model Description
This model implements a Cross-Attention Coefficient Head for the Fourier Neural Operator (FNO) architecture, designed to improve out-of-distribution (OOD) generalization when the coefficient field statistics shift.
Key Innovation
Standard FNO treats the variable coefficient field a(x) as just another input channel (concatenated). This model instead uses a cross-attention mechanism where:
- Queries come from the spatial coordinate grid
[-1,1]Β² - Keys and Values come from the coefficient field
a(x) - A learned bypass residual preserves direct coefficient information
This forces the model to build a conditioning representation of the coefficient field rather than treating it as a fixed input feature, improving generalization when permeability statistics differ from training.
Architecture Details
a(x) ββ[kv_embed]βββΊ KV
β
ββββΊ cross-attn βββ Q = query_proj(coordinate_grid [-1,1]Β²)
β
ββββΊ bypass = coeff_bypass(a) βββ
βΌ
attended + bypass ββ[FNO blocks]βββΊ projection βββΊ u(x)
Components
- Heterogeneous Cross-Attention: GNOT-style feature-wise Q/K normalization
- Fourier Layers: Spectral convolution with learnable modes
- Bypass Residual: Direct coefficient channel for fallback
- GELU Activation: Nonlinearity between layers
Hyperparameters (Small Config)
| Parameter | Value |
|---|---|
| Resolution | 32Γ32 |
| Width | 32 |
| Depth | 3 FNO blocks |
| Modes | 8 |
| Attention Heads | 4 |
Hyperparameters (Full Config)
| Parameter | Value |
|---|---|
| Resolution | 64Γ64 |
| Width | 64 |
| Depth | 4 FNO blocks |
| Modes | 12 |
| Attention Heads | 4 |
Training Data
- PDE: 2D Darcy flow
-βΒ·(a(x)βu) = 1on unit square, zero Dirichlet BCs - Coefficient: Log-Gaussian random field with isotropic covariance
- Train: Correlation length L=0.1, 1000 samples (full) / 200 samples (small)
- Solver: Direct sparse solve (scipy
spsolveor numpy dense)
Performance (Expected)
| Split | Distribution | Baseline RL2 | Cross-Attn RL2 |
|---|---|---|---|
| ID | L=0.1 | ~0.018 | ~0.021 |
| OOD Smooth | L=0.2 | ~0.065 (3.5Γ) | ~0.029 (1.4Γ) |
| OOD Rough | L=0.05 | ~0.071 (3.9Γ) | ~0.032 (1.5Γ) |
Based on small-scale experiments at 32Γ32 resolution. Full 64Γ64 results pending.
Intended Use
- Primary: Surrogate modeling for variable-coefficient elliptic PDEs (Darcy flow, electrostatics, heat conduction)
- Research: Testing cross-attention conditioning for OOD generalization in neural operators
- Not for: High-stakes engineering decisions without validation; production reservoir simulation
Limitations
- Trained on synthetic log-Gaussian permeability; real reservoir data has different statistics
- Resolution limited to 32Γ32 or 64Γ64; high-resolution requires patching (ViTNO-style) or hierarchical attention (MANO-style)
- No physics constraints (PDE residual not enforced); purely data-driven
- Attention complexity is O(HW Γ HW) per sample; not scalable to very high resolution without approximation
Citation
If you use this model, please cite:
@article{calvello2024continuum,
title={Continuum Attention for Neural Operators},
author={Calvello, Edoardo and Boull\'e, Nicolas and SchΓ€fer, Florian},
journal={arXiv preprint arXiv:2406.06486},
year={2024}
}
@article{li2021fno,
title={Fourier Neural Operator for Parametric Partial Differential Equations},
author={Li, Zongyi and Kovachki, Nikola and Azizzadenesheli, Kamyar and others},
journal={NeurIPS},
year={2021}
}
Links
- Code: https://huggingface.co/jmtsh21/cross-attn-fno-darcy
- Dataset: https://huggingface.co/datasets/jmtsh21/darcy-ood-dataset
- Paper (Continuum Attention): https://arxiv.org/abs/2406.06486