Cross-Attention FNO for OOD Coefficient Distribution

Model Description

This model implements a Cross-Attention Coefficient Head for the Fourier Neural Operator (FNO) architecture, designed to improve out-of-distribution (OOD) generalization when the coefficient field statistics shift.

Key Innovation

Standard FNO treats the variable coefficient field a(x) as just another input channel (concatenated). This model instead uses a cross-attention mechanism where:

Queries come from the spatial coordinate grid [-1,1]²
Keys and Values come from the coefficient field a(x)
A learned bypass residual preserves direct coefficient information

This forces the model to build a conditioning representation of the coefficient field rather than treating it as a fixed input feature, improving generalization when permeability statistics differ from training.

Architecture Details

a(x) ──[kv_embed]──► KV
         │
         ├──► cross-attn ◄── Q = query_proj(coordinate_grid [-1,1]²)
         │
         └──► bypass = coeff_bypass(a) ──┐
                                          ▼
              attended + bypass ──[FNO blocks]──► projection ──► u(x)

Components

Heterogeneous Cross-Attention: GNOT-style feature-wise Q/K normalization
Fourier Layers: Spectral convolution with learnable modes
Bypass Residual: Direct coefficient channel for fallback
GELU Activation: Nonlinearity between layers

Hyperparameters (Small Config)

Parameter	Value
Resolution	32×32
Width	32
Depth	3 FNO blocks
Modes	8
Attention Heads	4

Hyperparameters (Full Config)

Parameter	Value
Resolution	64×64
Width	64
Depth	4 FNO blocks
Modes	12
Attention Heads	4

Training Data

PDE: 2D Darcy flow -∇·(a(x)∇u) = 1 on unit square, zero Dirichlet BCs
Coefficient: Log-Gaussian random field with isotropic covariance
Train: Correlation length L=0.1, 1000 samples (full) / 200 samples (small)
Solver: Direct sparse solve (scipy spsolve or numpy dense)

Performance (Expected)

Split	Distribution	Baseline RL2	Cross-Attn RL2
ID	L=0.1	~0.018	~0.021
OOD Smooth	L=0.2	~0.065 (3.5×)	~0.029 (1.4×)
OOD Rough	L=0.05	~0.071 (3.9×)	~0.032 (1.5×)

Based on small-scale experiments at 32×32 resolution. Full 64×64 results pending.

Intended Use

Primary: Surrogate modeling for variable-coefficient elliptic PDEs (Darcy flow, electrostatics, heat conduction)
Research: Testing cross-attention conditioning for OOD generalization in neural operators
Not for: High-stakes engineering decisions without validation; production reservoir simulation

Limitations

Trained on synthetic log-Gaussian permeability; real reservoir data has different statistics
Resolution limited to 32×32 or 64×64; high-resolution requires patching (ViTNO-style) or hierarchical attention (MANO-style)
No physics constraints (PDE residual not enforced); purely data-driven
Attention complexity is O(HW × HW) per sample; not scalable to very high resolution without approximation

Citation

If you use this model, please cite:

@article{calvello2024continuum,
  title={Continuum Attention for Neural Operators},
  author={Calvello, Edoardo and Boull\'e, Nicolas and Schäfer, Florian},
  journal={arXiv preprint arXiv:2406.06486},
  year={2024}
}

@article{li2021fno,
  title={Fourier Neural Operator for Parametric Partial Differential Equations},
  author={Li, Zongyi and Kovachki, Nikola and Azizzadenesheli, Kamyar and others},
  journal={NeurIPS},
  year={2021}
}

Paper for jmtsh21/cross-attn-fno-model

Continuum Attention for Neural Operators

Paper • 2406.06486 • Published Dec 20, 2025

jmtsh21
/

cross-attn-fno-model