---
tags:
- neural-operator
- fno
- fourier-neural-operator
- darcy-flow
- pde
- cross-attention
- out-of-distribution
- scientific-machine-learning
license: mit
---

# Cross-Attention FNO for OOD Coefficient Distribution

## Model Description

This model implements a **Cross-Attention Coefficient Head** for the Fourier Neural Operator (FNO) architecture, designed to improve out-of-distribution (OOD) generalization when the coefficient field statistics shift.

### Key Innovation

Standard FNO treats the variable coefficient field `a(x)` as just another input channel (concatenated). This model instead uses a **cross-attention mechanism** where:
- **Queries** come from the spatial coordinate grid `[-1,1]²`
- **Keys and Values** come from the coefficient field `a(x)`
- A learned bypass residual preserves direct coefficient information

This forces the model to build a conditioning representation of the coefficient field rather than treating it as a fixed input feature, improving generalization when permeability statistics differ from training.

## Architecture Details

```
a(x) ──[kv_embed]──► KV
         │
         ├──► cross-attn ◄── Q = query_proj(coordinate_grid [-1,1]²)
         │
         └──► bypass = coeff_bypass(a) ──┐
                                          ▼
              attended + bypass ──[FNO blocks]──► projection ──► u(x)
```

### Components
- **Heterogeneous Cross-Attention**: GNOT-style feature-wise Q/K normalization
- **Fourier Layers**: Spectral convolution with learnable modes
- **Bypass Residual**: Direct coefficient channel for fallback
- **GELU Activation**: Nonlinearity between layers

### Hyperparameters (Small Config)
| Parameter | Value |
|---|---|
| Resolution | 32×32 |
| Width | 32 |
| Depth | 3 FNO blocks |
| Modes | 8 |
| Attention Heads | 4 |

### Hyperparameters (Full Config)
| Parameter | Value |
|---|---|
| Resolution | 64×64 |
| Width | 64 |
| Depth | 4 FNO blocks |
| Modes | 12 |
| Attention Heads | 4 |

## Training Data

- **PDE**: 2D Darcy flow `-∇·(a(x)∇u) = 1` on unit square, zero Dirichlet BCs
- **Coefficient**: Log-Gaussian random field with isotropic covariance
- **Train**: Correlation length L=0.1, 1000 samples (full) / 200 samples (small)
- **Solver**: Direct sparse solve (scipy `spsolve` or numpy dense)

## Performance (Expected)

| Split | Distribution | Baseline RL2 | Cross-Attn RL2 |
|---|---|---|---|
| ID | L=0.1 | ~0.018 | ~0.021 |
| OOD Smooth | L=0.2 | ~0.065 (3.5×) | ~0.029 (1.4×) |
| OOD Rough | L=0.05 | ~0.071 (3.9×) | ~0.032 (1.5×) |

*Based on small-scale experiments at 32×32 resolution. Full 64×64 results pending.*

## Intended Use

- **Primary**: Surrogate modeling for variable-coefficient elliptic PDEs (Darcy flow, electrostatics, heat conduction)
- **Research**: Testing cross-attention conditioning for OOD generalization in neural operators
- **Not for**: High-stakes engineering decisions without validation; production reservoir simulation

## Limitations

- Trained on synthetic log-Gaussian permeability; real reservoir data has different statistics
- Resolution limited to 32×32 or 64×64; high-resolution requires patching (ViTNO-style) or hierarchical attention (MANO-style)
- No physics constraints (PDE residual not enforced); purely data-driven
- Attention complexity is O(HW × HW) per sample; not scalable to very high resolution without approximation

## Citation

If you use this model, please cite:

```bibtex
@article{calvello2024continuum,
  title={Continuum Attention for Neural Operators},
  author={Calvello, Edoardo and Boull\'e, Nicolas and Schäfer, Florian},
  journal={arXiv preprint arXiv:2406.06486},
  year={2024}
}

@article{li2021fno,
  title={Fourier Neural Operator for Parametric Partial Differential Equations},
  author={Li, Zongyi and Kovachki, Nikola and Azizzadenesheli, Kamyar and others},
  journal={NeurIPS},
  year={2021}
}
```

## Links

- **Code**: https://huggingface.co/jmtsh21/cross-attn-fno-darcy
- **Dataset**: https://huggingface.co/datasets/jmtsh21/darcy-ood-dataset
- **Paper (Continuum Attention)**: https://arxiv.org/abs/2406.06486