Cross-Attention FNO for OOD Coefficient Distribution

Model Description

This model implements a Cross-Attention Coefficient Head for the Fourier Neural Operator (FNO) architecture, designed to improve out-of-distribution (OOD) generalization when the coefficient field statistics shift.

Key Innovation

Standard FNO treats the variable coefficient field a(x) as just another input channel (concatenated). This model instead uses a cross-attention mechanism where:

  • Queries come from the spatial coordinate grid [-1,1]Β²
  • Keys and Values come from the coefficient field a(x)
  • A learned bypass residual preserves direct coefficient information

This forces the model to build a conditioning representation of the coefficient field rather than treating it as a fixed input feature, improving generalization when permeability statistics differ from training.

Architecture Details

a(x) ──[kv_embed]──► KV
         β”‚
         β”œβ”€β”€β–Ί cross-attn ◄── Q = query_proj(coordinate_grid [-1,1]Β²)
         β”‚
         └──► bypass = coeff_bypass(a) ──┐
                                          β–Ό
              attended + bypass ──[FNO blocks]──► projection ──► u(x)

Components

  • Heterogeneous Cross-Attention: GNOT-style feature-wise Q/K normalization
  • Fourier Layers: Spectral convolution with learnable modes
  • Bypass Residual: Direct coefficient channel for fallback
  • GELU Activation: Nonlinearity between layers

Hyperparameters (Small Config)

Parameter Value
Resolution 32Γ—32
Width 32
Depth 3 FNO blocks
Modes 8
Attention Heads 4

Hyperparameters (Full Config)

Parameter Value
Resolution 64Γ—64
Width 64
Depth 4 FNO blocks
Modes 12
Attention Heads 4

Training Data

  • PDE: 2D Darcy flow -βˆ‡Β·(a(x)βˆ‡u) = 1 on unit square, zero Dirichlet BCs
  • Coefficient: Log-Gaussian random field with isotropic covariance
  • Train: Correlation length L=0.1, 1000 samples (full) / 200 samples (small)
  • Solver: Direct sparse solve (scipy spsolve or numpy dense)

Performance (Expected)

Split Distribution Baseline RL2 Cross-Attn RL2
ID L=0.1 ~0.018 ~0.021
OOD Smooth L=0.2 ~0.065 (3.5Γ—) ~0.029 (1.4Γ—)
OOD Rough L=0.05 ~0.071 (3.9Γ—) ~0.032 (1.5Γ—)

Based on small-scale experiments at 32Γ—32 resolution. Full 64Γ—64 results pending.

Intended Use

  • Primary: Surrogate modeling for variable-coefficient elliptic PDEs (Darcy flow, electrostatics, heat conduction)
  • Research: Testing cross-attention conditioning for OOD generalization in neural operators
  • Not for: High-stakes engineering decisions without validation; production reservoir simulation

Limitations

  • Trained on synthetic log-Gaussian permeability; real reservoir data has different statistics
  • Resolution limited to 32Γ—32 or 64Γ—64; high-resolution requires patching (ViTNO-style) or hierarchical attention (MANO-style)
  • No physics constraints (PDE residual not enforced); purely data-driven
  • Attention complexity is O(HW Γ— HW) per sample; not scalable to very high resolution without approximation

Citation

If you use this model, please cite:

@article{calvello2024continuum,
  title={Continuum Attention for Neural Operators},
  author={Calvello, Edoardo and Boull\'e, Nicolas and SchΓ€fer, Florian},
  journal={arXiv preprint arXiv:2406.06486},
  year={2024}
}

@article{li2021fno,
  title={Fourier Neural Operator for Parametric Partial Differential Equations},
  author={Li, Zongyi and Kovachki, Nikola and Azizzadenesheli, Kamyar and others},
  journal={NeurIPS},
  year={2021}
}

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for jmtsh21/cross-attn-fno-model