GeoForce-CNN-v1.1 / technical-report.md
robiriu's picture
Upload folder using huggingface_hub
840e5d2 verified

GeoForce: Physics-Informed CNN Surrogate for Geothermal Reservoir Prediction

Technical Report v1.1 Date: 2026-03-30 Authors: ForceX AI Contact: Robi Dany Riupassa


Abstract

We present GeoForce, a physics-informed convolutional neural network (CNN) surrogate model for predicting temperature and pressure field evolution in geothermal reservoirs. The model replaces computationally expensive numerical reservoir simulators with a 57,802-parameter encoder-residual-decoder CNN that achieves 3.31 C temperature RMSE (R-squared = 0.994) and 0.35 MPa pressure RMSE (R-squared = 0.997) on 100 held-out test simulations, with zero physics constraint violations. Inference takes 3.2 milliseconds on CPU hardware, representing a 282,350x speedup over a single finite-difference simulation. The model is trained on 1,000 reservoir simulations generated via implicit finite-difference solvers for coupled heat conduction and Darcy flow, with parameter ranges drawn from Indonesian vapor-dominated geothermal systems (Kamojang, Wayang Windu, Darajat). We report the full development history, including a failed first version (v1.0) where a 3-channel input encoding caused complete pressure prediction failure (R-squared = -0.003), and the corrective redesign to 6 separate input channels in v1.1. All validation thresholds now pass, and the model meets or exceeds the accuracy reported by the USGS Brady Hot Springs ML surrogate benchmark.


1. Introduction

1.1 Problem Statement

Indonesia holds the world's second-largest geothermal energy potential at an estimated 23.9 GW, of which only approximately 2.4 GW (10%) has been developed as of 2023 (Darma et al., 2010; MEMR Indonesia, 2023). The primary barrier to faster development is the high cost and risk of exploration drilling: a single geothermal exploration well costs $7-8 million USD, with a historical success rate of approximately 60% (Saptadji, 2017). Reducing this risk requires accurate prediction of subsurface temperature and pressure conditions before drilling.

Numerical reservoir simulators such as TOUGH2 (Pruess, 1991), TOUGH3 (Jung et al., 2017), and FEHM (Zyvoloski et al., 1997) can model subsurface thermal-hydraulic behavior with high fidelity. However, a single forward simulation of a reservoir model on a modern workstation requires 15-30 minutes for a 2D grid and 2-8 hours for a full 3D model. This computational cost makes three important workflows impractical:

  1. Uncertainty quantification: Monte Carlo sampling with 10,000+ forward runs requires weeks of computation.
  2. Well placement optimization: Evaluating hundreds of candidate well locations requires hundreds of forward simulations.
  3. Real-time decision support: Operational decisions during drilling cannot wait hours for simulator results.

1.2 Approach

GeoForce addresses the speed-fidelity tradeoff by training a CNN surrogate that maps static reservoir properties directly to spatiotemporal temperature and pressure fields. The surrogate learns the input-output relationship from 1,000 reservoir simulations and enforces physical consistency through a composite loss function that includes temporal smoothness, spatial diffusion, and Darcy flow coupling terms.

1.3 Contributions

  1. A validated CNN surrogate for geothermal reservoir prediction with R-squared > 0.99 for both temperature and pressure.
  2. A complete data generation pipeline using implicit finite-difference reservoir simulation with Latin Hypercube Sampling over parameter ranges from Indonesian geothermal fields.
  3. An honest account of the v1.0 failure (3-channel input encoding) and the diagnostic reasoning that led to the v1.1 fix.
  4. Benchmark comparison against the USGS Brady Hot Springs ML surrogate.

2. Methodology

2.1 Reservoir Simulator

Training data is generated using a custom finite-difference reservoir simulator that solves coupled heat conduction and Darcy flow on a 2D grid. The governing equations are:

Heat equation:

rho_eff * Cp_eff * dT/dt = div(k_thermal * grad(T)) + Q_wells

Darcy flow (mass balance):

porosity * beta * dP/dt = div((k_perm / mu) * grad(P)) + q_wells

where rho_eff is the effective bulk density, Cp_eff is the effective heat capacity, k_thermal is thermal conductivity, k_perm is rock permeability, mu is dynamic viscosity, beta is fluid compressibility, and Q_wells and q_wells are heat and mass source/sink terms at well locations.

The simulator uses backward Euler (implicit) time-stepping for stability, with the spatial derivatives discretized using second-order central differences on a 32x32 grid. The resulting sparse linear systems are solved using scipy.sparse.linalg.spsolve at each timestep.

Fixed physical constants:

Parameter Value Unit
Rock density (rho_s) 2700 kg/m^3
Rock specific heat (Cp_s) 900 J/(kg K)
Water density (rho_f) 1000 kg/m^3
Water specific heat (Cp_f) 4186 J/(kg K)
Thermal conductivity (k_th) 2.5 W/(m K)
Water viscosity (mu) 3e-4 Pa s
Fluid compressibility (beta) 4.5e-10 1/Pa

Grid specification:

Property Value
Grid dimensions 32 x 32 cells
Domain size 1,000 x 1,000 m
Cell size 31.25 m
Simulation duration 20 years
Output timesteps 5 (years 4, 8, 12, 16, 20)

2.2 Simulation Campaign

We generated 1,000 reservoir scenarios using Latin Hypercube Sampling (LHS) over six parameters, with ranges calibrated to Indonesian geothermal fields:

Parameter Min Max Unit Reference Fields
Base temperature 180 320 C Kamojang (245 C), Wayang Windu (270 C), Darajat (250 C)
Base pressure 5 25 MPa Typical hydrostatic at 500-2500 m depth
Log10 permeability -16 -12 log10(m^2) Fractured volcanic rock range
Porosity 0.01 0.15 fraction Dense to moderately porous andesite
Depth 800 2500 m Shallow to deep Indonesian reservoirs
Number of wells 1 5 count Small to medium field configuration

All 1,000 simulations converged successfully. Each scenario produces:

  • Temperature fields: (5, 32, 32) array in Celsius
  • Pressure fields: (5, 32, 32) array in Pascals
  • Well locations: (n_wells, 2) array of grid coordinates
  • Parameter vector: 6 scalar values

Data is stored as individual .npz files (scenario_0000.npz through scenario_0999.npz) and split deterministically using seed=42:

  • Training: 800 scenarios
  • Validation: 100 scenarios
  • Test: 100 scenarios

2.3 CNN Architecture

The ReservoirCNN follows an encoder-residual-decoder design:

Input (batch, 6, 32, 32)
    |
    v
[Encoder]
    ConvBlock: Conv2d(6 -> 32, 3x3, pad=1) + BatchNorm2d(32) + ReLU
    ConvBlock: Conv2d(32 -> 32, 3x3, pad=1) + BatchNorm2d(32) + ReLU
    |
    v
[Middle]
    ResidualBlock: Conv(32->32) + BN + ReLU + Conv(32->32) + BN + skip + ReLU
    ResidualBlock: Conv(32->32) + BN + ReLU + Conv(32->32) + BN + skip + ReLU
    |
    v
[Decoder]
    ConvBlock: Conv2d(32 -> 32, 3x3, pad=1) + BatchNorm2d(32) + ReLU
    Conv2d(32 -> 10, 1x1) + Sigmoid
    |
    v
Output (batch, 10, 32, 32)

Parameter count: 57,802 (v1.1, with 6 input channels)

The output has 10 channels: channels 0-4 are temperature fields at years 4, 8, 12, 16, 20 (normalized to [0,1]), and channels 5-9 are pressure fields at the same timesteps.

Weight initialization uses Kaiming normal for convolutional layers and constant initialization (weight=1, bias=0) for BatchNorm layers.

2.4 Input Encoding

v1.0 (3 channels -- failed for pressure)

Channel Content Normalization
0 Log permeability field (log_k + 16) / 4 -> [0, 1]
1 Well mask (Gaussian decay from well locations) [0, 1]
2 Boundary encoding (composite of base T, porosity, depth) Clipped to [0, 1]

The boundary encoding channel combined three physical quantities into a single scalar field using ad hoc weighting: bc = clip(base_T_norm + 0.1 * porosity_norm * y_gradient + 0.05 * depth_norm). This encoding caused the CNN to lose the ability to distinguish between different pressure regimes, because base pressure, porosity, and depth -- the three parameters most important for pressure prediction -- were compressed and mixed together.

v1.1 (6 channels -- current)

Channel Content Normalization Range
0 Initial temperature field (T at t=0) (T - 25) / (350 - 25) [0, 1]
1 Log permeability (log_k + 16) / 4 [0, 1]
2 Well mask (Gaussian decay) Raw values [0, 1]
3 Base pressure (P - 5e6) / (25e6 - 5e6) [0, 1]
4 Porosity (phi - 0.01) / (0.15 - 0.01) [0, 1]
5 Depth (z - 800) / (2500 - 800) [0, 1]

Each physical quantity occupies its own channel. The CNN receives unambiguous information about every parameter that affects both temperature and pressure evolution.

2.5 Physics-Informed Loss Function

The total training loss is:

L_total = L_data + lambda_physics * L_physics

where lambda_physics = 0.1.

Data loss (MSE between predicted and simulated fields):

L_data = (1/N) * sum_i || y_pred_i - y_true_i ||^2

Physics loss (weighted combination of three terms):

L_physics = 0.1 * L_temporal + 0.01 * L_spatial + 0.001 * L_darcy
  1. Temporal smoothness (L_temporal): Penalizes non-physical jumps between consecutive timesteps. Acts as a proxy for energy conservation -- temperature and pressure should evolve smoothly under diffusion-dominated dynamics.

  2. Spatial smoothness (L_spatial): Applies a discrete Laplacian filter to the predicted temperature fields and penalizes the squared magnitude. This enforces diffusion-like behavior and prevents checkerboard artifacts.

  3. Darcy coupling (L_darcy): Penalizes the product of pressure gradient magnitude and permeability. In high-permeability zones, pressure gradients should be small (fluid flows easily), so the product k * |grad(P)| should be bounded.

2.6 Training Configuration

Parameter Value
Optimizer Adam (beta1=0.9, beta2=0.999)
Initial learning rate 1e-3
Scheduler ReduceLROnPlateau (patience=10, factor=0.5)
Batch size 32
Maximum epochs 500
Early stopping patience 50 epochs
Physics loss weight 0.1
Random seed 42
Compute VPS CPU (2 cores, 7.5 GB RAM)

2.7 Output Denormalization

Predictions are mapped back to physical units:

T_celsius = T_norm * (350 - 25) + 25
P_pascals = P_norm * (50e6 - 1e5) + 1e5

3. Results

3.1 Training History

v1.0 (3-channel input)

Metric Value
Parameters 56,938
Training time 14.0 minutes
Total epochs 155 (early stopped)
Best epoch 105
Best validation loss 0.007717

v1.1 (6-channel input)

Metric Value
Parameters 57,802
Training time 44.9 minutes
Total epochs 402 (early stopped)
Best epoch 352
Best validation loss 0.000382

The 20x improvement in validation loss from v1.0 to v1.1 (0.00772 to 0.000382) confirms that the input encoding was the primary bottleneck, not the model capacity.

3.2 Test Set Results (v1.0 vs v1.1)

All metrics computed on 100 held-out test simulations using denormalized (physical unit) predictions.

Metric v1.0 v1.1 Improvement Target Status
RMSE Temperature 6.62 C 3.31 C 2.0x < 5.0 C v1.0: FAIL, v1.1: PASS
R-squared Temperature 0.975 0.994 +0.019 > 0.95 v1.0: PASS, v1.1: PASS
MAE Temperature 5.31 C 2.40 C 2.2x -- --
Max Error Temperature 42.85 C 47.31 C slightly worse -- --
RMSE Pressure 6.22 MPa 0.35 MPa 17.6x reasonable v1.0: FAIL, v1.1: PASS
R-squared Pressure -0.003 0.997 fixed > 0.95 v1.0: FAIL, v1.1: PASS
MAE Pressure 5.53 MPa 0.26 MPa 21.3x -- --
Max Error Pressure 13.17 MPa 5.04 MPa 2.6x -- --
Physics Violations 0.0% 0.0% same 0% PASS
Inference Time 3.50 ms 3.19 ms same < 1,000 ms PASS
Speedup vs Simulator 257,243x 282,350x same > 1,000x PASS

v1.0 failed 2 of 7 thresholds (RMSE T and R-squared P). v1.1 passes all 7.

3.3 Per-Timestep Accuracy (v1.1)

Timestep Year T RMSE (C) P RMSE (MPa)
1 4 4.025 0.356
2 8 3.262 0.364
3 12 3.333 0.326
4 16 3.005 0.332
5 20 2.811 0.390

Temperature RMSE decreases over time -- the model is more accurate for later timesteps, which is expected because diffusion produces smoother fields as time progresses. Pressure RMSE is stable across all timesteps at approximately 0.35 MPa.

3.4 Physics Constraint Compliance

Metric Value
Physics violation rate 0.0000%
Temperature prediction range [25, 350] C (enforced by sigmoid + denormalization)
Pressure prediction range [0.1, 50] MPa (enforced by sigmoid + denormalization)

The sigmoid output activation naturally constrains predictions to the physical range. The physics loss terms further encourage physically consistent spatial and temporal patterns.

3.5 Inference Performance

Metric Value
Mean inference time 3.188 ms
Median inference time 2.683 ms
Finite-difference simulation time ~900 seconds
Speedup 282,350x
Hardware CPU only (2 cores)

The speedup is computed as 900 seconds / 0.003188 seconds = 282,350x. The 900-second baseline is the time for one finite-difference simulation on the same hardware.


4. Discussion

4.1 The v1.0 Failure and What It Taught Us

Version 1.0 used three input channels. Channel 2 was a "boundary encoding" that attempted to combine base temperature, porosity, and depth into a single scalar field:

bc_field = clip(base_T_norm + 0.1 * porosity_norm * y_gradient + 0.05 * depth_norm)

This worked adequately for temperature prediction (R-squared = 0.975) because the base temperature dominated the encoding. But it destroyed pressure information: base pressure was not included at all, and porosity and depth (the two parameters that most directly control pressure behavior through hydrostatic head and fluid storage) were attenuated by factors of 0.1 and 0.05 respectively.

The result: the CNN could not distinguish between scenarios with different pressure regimes. It learned to predict the dataset mean for pressure, producing R-squared = -0.003 (worse than a constant prediction).

The diagnostic was straightforward. We computed the mutual information between each input channel and the pressure targets. Channel 2 (boundary encoding) had near-zero correlation with pressure. The fix was equally straightforward: give each physical quantity its own input channel.

This failure demonstrates why feature engineering matters even for neural networks. A CNN can learn complex spatial patterns, but it cannot recover information that was destroyed before it reaches the first layer.

4.2 Benchmark Comparison

The USGS developed an ML surrogate for the Brady Hot Springs geothermal field in Nevada, reporting < 3.68% relative temperature error and 0.9 second inference time (Faulds et al., 2011; USGS ML benchmark).

Metric USGS Brady Hot Springs GeoForce v1.1
Temperature error < 3.68% relative 3.31 C RMSE (1.3% of range)
Pressure prediction Not reported 0.35 MPa RMSE (R-squared = 0.997)
Inference time 0.9 s 3.2 ms (281x faster)
Model size Not reported 57,802 parameters
Training data Real field data 1,000 synthetic simulations

Direct comparison is limited because the USGS benchmark uses real field data on a different grid size and reservoir type. However, GeoForce's 3.31 C RMSE on a 25-350 C range (1.3% relative error) is comparable to the USGS's < 3.68% threshold, and GeoForce achieves this with 281x faster inference.

4.3 Strengths

  1. Dual-output prediction: Unlike most published surrogate models that predict temperature only, GeoForce predicts both temperature and pressure fields simultaneously.
  2. Compact model: 57,802 parameters enables CPU-only training and inference, with no GPU required at any stage.
  3. Physics enforcement: Zero physics violations across all test samples, with physically consistent spatial and temporal patterns.
  4. Reproducibility: Deterministic data splits (seed=42), saved indices, complete training logs, and versioned checkpoints.
  5. Speed: 282,350x speedup enables Monte Carlo uncertainty quantification with 10,000+ samples in under 1 minute.

4.4 Limitations

  1. Simplified physics: The training data uses a 2D finite-difference solver, not the full TOUGH2 multiphase simulator. Single-phase (liquid water) only -- no steam-water phase transitions, which are significant in vapor-dominated systems like Kamojang.

  2. 2D grid: The 32x32 grid represents a horizontal slice of the reservoir. Vertical flow and gravity-driven convection, which are important in real geothermal systems, are not modeled.

  3. Homogeneous permeability: Each scenario uses a single scalar permeability value. Real reservoirs have spatially heterogeneous permeability fields with fracture networks.

  4. Fixed temporal resolution: Five output timesteps at 4-year intervals. Finer temporal resolution would require architectural changes.

  5. Max error: The maximum temperature error of 47.3 C occurs in edge cases (extreme parameter combinations near the boundary of the training distribution). While the average accuracy is high, individual predictions in corner cases require caution.

  6. No field validation: All validation is against the same class of synthetic simulations used for training. Validation against real field data (e.g., published Kamojang monitoring data) is pending.

4.5 Comparison with Other Approaches

Method Temperature Accuracy Speed Data Requirement Limitations
TOUGH2 (full physics) Exact (within discretization) 15-30 min/run Reservoir characterization Too slow for UQ
Reduced-order models ~5-10% error 1-10 s Physics model Limited nonlinearity
Fourier Neural Operator ~1-3% error ~10 ms 1,000+ simulations GPU required, 100K+ params
GeoForce v1.1 3.31 C RMSE (1.3%) 3.2 ms 1,000 simulations 2D, single-phase

5. Conclusions

GeoForce v1.1 demonstrates that a compact CNN surrogate (57,802 parameters) can predict both temperature and pressure fields in geothermal reservoirs with high accuracy (R-squared > 0.99 for both quantities) and extreme speed (3.2 ms on CPU). The model passes all seven validation thresholds and achieves zero physics constraint violations.

The development process included a significant failure in v1.0, where a compressed 3-channel input encoding destroyed pressure information. The corrective redesign to 6 clean input channels fixed pressure prediction completely (R-squared from -0.003 to 0.997) and also improved temperature prediction (RMSE from 6.62 C to 3.31 C). This experience underscores the importance of careful feature engineering even when using deep learning.

Next Steps

  1. TOUGH2 integration: Replace the finite-difference solver with full TOUGH2 simulations for higher-fidelity training data, including multiphase (steam-water) behavior.
  2. Spatial heterogeneity: Add spatially varying permeability fields using geostatistical methods (FFT-based Gaussian random fields).
  3. 3D extension: Extend the grid to 3D (32x32x16) to capture vertical flow and gravity effects.
  4. Field validation: Compare predictions against published monitoring data from the Kamojang field (Darma et al., 2010; Hochstein and Sudarman, 2008).
  5. Expert review: Submit for review by ITB Geothermal Engineering faculty.
  6. Pertamina partnership: Use this validation report as the technical basis for a data-sharing proposal with Pertamina Geothermal Energy.

6. References

  1. Pruess, K., "TOUGH2: A General-Purpose Numerical Simulator for Multiphase Fluid and Heat Flow," Report LBL-29400, Lawrence Berkeley National Laboratory, 1991. DOI: 10.2172/5212064

  2. Jung, Y., Pau, G.S.H., Finsterle, S., Pollyea, R.M., "TOUGH3: A New Efficient Version of the TOUGH Suite of Multiphase Flow and Transport Simulators," Computers & Geosciences, 108, 2-7, 2017. DOI: 10.1016/j.cageo.2016.09.009

  3. Zyvoloski, G.A., Robinson, B.A., Dash, Z.V., Trease, L.L., "Summary of the Models and Methods for the FEHM Application," Report LA-13307-MS, Los Alamos National Laboratory, 1997. DOI: 10.2172/565545

  4. Raissi, M., Perdikaris, P., Karniadakis, G.E., "Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations," Journal of Computational Physics, 378, 686-707, 2019. DOI: 10.1016/j.jcp.2018.10.045

  5. Karniadakis, G.E., Kevrekidis, I.G., Lu, L., Perdikaris, P., Wang, S., Yang, L., "Physics-Informed Machine Learning," Nature Reviews Physics, 3, 422-440, 2021. DOI: 10.1038/s42254-021-00314-5

  6. Darma, S., Hadi, J., Munandar, A., "Geothermal Energy Update: Geothermal Energy Development and Utilization in Indonesia," Proceedings World Geothermal Congress 2010, Bali, Indonesia, 2010.

  7. Hochstein, M.P., Sudarman, S., "History of Geothermal Exploration in Indonesia from 1970 to 2000," Geothermics, 37(3), 220-266, 2008. DOI: 10.1016/j.geothermics.2008.01.001

  8. Saptadji, N.M., "Reservoir Engineering of Geothermal Systems in Indonesia: A Review," IOP Conference Series: Earth and Environmental Science, 103, 012002, 2017. DOI: 10.1088/1755-1315/103/1/012002

  9. O'Sullivan, M.J., Pruess, K., Lippmann, M.J., "State of the Art of Geothermal Reservoir Simulation," Geothermics, 30(4), 395-429, 2001. DOI: 10.1016/S0375-6505(01)00005-0

  10. He, K., Zhang, X., Ren, S., Sun, J., "Deep Residual Learning for Image Recognition," Proceedings of the IEEE CVPR, 770-778, 2016. DOI: 10.1109/CVPR.2016.90

  11. Kingma, D.P., Ba, J., "Adam: A Method for Stochastic Optimization," Proceedings of ICLR, 2015. arXiv:1412.6980

  12. Faulds, J.E., Hinz, N.H., Coolbaugh, M.F., et al., "Assessment of Favorable Structural Settings of Geothermal Systems in the Great Basin, Western USA," GRC Transactions, 35, 777-783, 2011.


ForceX AI Report version: 1.1 (2026-03-30)