Upload README.md

348fae9 verified 4 days ago

4.18 kB

	---
	tags:
	- neural-operator
	- fno
	- fourier-neural-operator
	- darcy-flow
	- pde
	- cross-attention
	- out-of-distribution
	- scientific-machine-learning
	license: mit
	---

	# Cross-Attention FNO for OOD Coefficient Distribution

	## Model Description

	This model implements a Cross-Attention Coefficient Head for the Fourier Neural Operator (FNO) architecture, designed to improve out-of-distribution (OOD) generalization when the coefficient field statistics shift.

	### Key Innovation

	Standard FNO treats the variable coefficient field `a(x)` as just another input channel (concatenated). This model instead uses a cross-attention mechanism where:
	- Queries come from the spatial coordinate grid `[-1,1]²`
	- Keys and Values come from the coefficient field `a(x)`
	- A learned bypass residual preserves direct coefficient information

	This forces the model to build a conditioning representation of the coefficient field rather than treating it as a fixed input feature, improving generalization when permeability statistics differ from training.

	## Architecture Details

	```
	a(x) ──[kv_embed]──► KV
	│
	├──► cross-attn ◄── Q = query_proj(coordinate_grid [-1,1]²)
	│
	└──► bypass = coeff_bypass(a) ──┐
	▼
	attended + bypass ──[FNO blocks]──► projection ──► u(x)
	```

	### Components
	- Heterogeneous Cross-Attention: GNOT-style feature-wise Q/K normalization
	- Fourier Layers: Spectral convolution with learnable modes
	- Bypass Residual: Direct coefficient channel for fallback
	- GELU Activation: Nonlinearity between layers

	### Hyperparameters (Small Config)
	\| Parameter \| Value \|
	\|---\|---\|
	\| Resolution \| 32×32 \|
	\| Width \| 32 \|
	\| Depth \| 3 FNO blocks \|
	\| Modes \| 8 \|
	\| Attention Heads \| 4 \|

	### Hyperparameters (Full Config)
	\| Parameter \| Value \|
	\|---\|---\|
	\| Resolution \| 64×64 \|
	\| Width \| 64 \|
	\| Depth \| 4 FNO blocks \|
	\| Modes \| 12 \|
	\| Attention Heads \| 4 \|

	## Training Data

	- PDE: 2D Darcy flow `-∇·(a(x)∇u) = 1` on unit square, zero Dirichlet BCs
	- Coefficient: Log-Gaussian random field with isotropic covariance
	- Train: Correlation length L=0.1, 1000 samples (full) / 200 samples (small)
	- Solver: Direct sparse solve (scipy `spsolve` or numpy dense)

	## Performance (Expected)

	\| Split \| Distribution \| Baseline RL2 \| Cross-Attn RL2 \|
	\|---\|---\|---\|---\|
	\| ID \| L=0.1 \| ~0.018 \| ~0.021 \|
	\| OOD Smooth \| L=0.2 \| ~0.065 (3.5×) \| ~0.029 (1.4×) \|
	\| OOD Rough \| L=0.05 \| ~0.071 (3.9×) \| ~0.032 (1.5×) \|

	Based on small-scale experiments at 32×32 resolution. Full 64×64 results pending.

	## Intended Use

	- Primary: Surrogate modeling for variable-coefficient elliptic PDEs (Darcy flow, electrostatics, heat conduction)
	- Research: Testing cross-attention conditioning for OOD generalization in neural operators
	- Not for: High-stakes engineering decisions without validation; production reservoir simulation

	## Limitations

	- Trained on synthetic log-Gaussian permeability; real reservoir data has different statistics
	- Resolution limited to 32×32 or 64×64; high-resolution requires patching (ViTNO-style) or hierarchical attention (MANO-style)
	- No physics constraints (PDE residual not enforced); purely data-driven
	- Attention complexity is O(HW × HW) per sample; not scalable to very high resolution without approximation

	## Citation

	If you use this model, please cite:

	```bibtex
	@article{calvello2024continuum,
	title={Continuum Attention for Neural Operators},
	author={Calvello, Edoardo and Boull\'e, Nicolas and Schäfer, Florian},
	journal={arXiv preprint arXiv:2406.06486},
	year={2024}
	}

	@article{li2021fno,
	title={Fourier Neural Operator for Parametric Partial Differential Equations},
	author={Li, Zongyi and Kovachki, Nikola and Azizzadenesheli, Kamyar and others},
	journal={NeurIPS},
	year={2021}
	}
	```

	## Links

	- Code: https://huggingface.co/jmtsh21/cross-attn-fno-darcy
	- Dataset: https://huggingface.co/datasets/jmtsh21/darcy-ood-dataset
	- Paper (Continuum Attention): https://arxiv.org/abs/2406.06486