GeoForce: Physics-Informed CNN Surrogate for Geothermal Reservoir Prediction
Technical Report v1.1 Date: 2026-03-30 Authors: ForceX AI Contact: Robi Dany Riupassa
Abstract
We present GeoForce, a physics-informed convolutional neural network (CNN) surrogate model for predicting temperature and pressure field evolution in geothermal reservoirs. The model replaces computationally expensive numerical reservoir simulators with a 57,802-parameter encoder-residual-decoder CNN that achieves 3.31 C temperature RMSE (R-squared = 0.994) and 0.35 MPa pressure RMSE (R-squared = 0.997) on 100 held-out test simulations, with zero physics constraint violations. Inference takes 3.2 milliseconds on CPU hardware, representing a 282,350x speedup over a single finite-difference simulation. The model is trained on 1,000 reservoir simulations generated via implicit finite-difference solvers for coupled heat conduction and Darcy flow, with parameter ranges drawn from Indonesian vapor-dominated geothermal systems (Kamojang, Wayang Windu, Darajat). We report the full development history, including a failed first version (v1.0) where a 3-channel input encoding caused complete pressure prediction failure (R-squared = -0.003), and the corrective redesign to 6 separate input channels in v1.1. All validation thresholds now pass, and the model meets or exceeds the accuracy reported by the USGS Brady Hot Springs ML surrogate benchmark.
1. Introduction
1.1 Problem Statement
Indonesia holds the world's second-largest geothermal energy potential at an estimated 23.9 GW, of which only approximately 2.4 GW (10%) has been developed as of 2023 (Darma et al., 2010; MEMR Indonesia, 2023). The primary barrier to faster development is the high cost and risk of exploration drilling: a single geothermal exploration well costs $7-8 million USD, with a historical success rate of approximately 60% (Saptadji, 2017). Reducing this risk requires accurate prediction of subsurface temperature and pressure conditions before drilling.
Numerical reservoir simulators such as TOUGH2 (Pruess, 1991), TOUGH3 (Jung et al., 2017), and FEHM (Zyvoloski et al., 1997) can model subsurface thermal-hydraulic behavior with high fidelity. However, a single forward simulation of a reservoir model on a modern workstation requires 15-30 minutes for a 2D grid and 2-8 hours for a full 3D model. This computational cost makes three important workflows impractical:
- Uncertainty quantification: Monte Carlo sampling with 10,000+ forward runs requires weeks of computation.
- Well placement optimization: Evaluating hundreds of candidate well locations requires hundreds of forward simulations.
- Real-time decision support: Operational decisions during drilling cannot wait hours for simulator results.
1.2 Approach
GeoForce addresses the speed-fidelity tradeoff by training a CNN surrogate that maps static reservoir properties directly to spatiotemporal temperature and pressure fields. The surrogate learns the input-output relationship from 1,000 reservoir simulations and enforces physical consistency through a composite loss function that includes temporal smoothness, spatial diffusion, and Darcy flow coupling terms.
1.3 Contributions
- A validated CNN surrogate for geothermal reservoir prediction with R-squared > 0.99 for both temperature and pressure.
- A complete data generation pipeline using implicit finite-difference reservoir simulation with Latin Hypercube Sampling over parameter ranges from Indonesian geothermal fields.
- An honest account of the v1.0 failure (3-channel input encoding) and the diagnostic reasoning that led to the v1.1 fix.
- Benchmark comparison against the USGS Brady Hot Springs ML surrogate.
2. Methodology
2.1 Reservoir Simulator
Training data is generated using a custom finite-difference reservoir simulator that solves coupled heat conduction and Darcy flow on a 2D grid. The governing equations are:
Heat equation:
rho_eff * Cp_eff * dT/dt = div(k_thermal * grad(T)) + Q_wells
Darcy flow (mass balance):
porosity * beta * dP/dt = div((k_perm / mu) * grad(P)) + q_wells
where rho_eff is the effective bulk density, Cp_eff is the effective heat capacity, k_thermal is thermal conductivity, k_perm is rock permeability, mu is dynamic viscosity, beta is fluid compressibility, and Q_wells and q_wells are heat and mass source/sink terms at well locations.
The simulator uses backward Euler (implicit) time-stepping for stability, with the spatial derivatives discretized using second-order central differences on a 32x32 grid. The resulting sparse linear systems are solved using scipy.sparse.linalg.spsolve at each timestep.
Fixed physical constants:
| Parameter | Value | Unit |
|---|---|---|
| Rock density (rho_s) | 2700 | kg/m^3 |
| Rock specific heat (Cp_s) | 900 | J/(kg K) |
| Water density (rho_f) | 1000 | kg/m^3 |
| Water specific heat (Cp_f) | 4186 | J/(kg K) |
| Thermal conductivity (k_th) | 2.5 | W/(m K) |
| Water viscosity (mu) | 3e-4 | Pa s |
| Fluid compressibility (beta) | 4.5e-10 | 1/Pa |
Grid specification:
| Property | Value |
|---|---|
| Grid dimensions | 32 x 32 cells |
| Domain size | 1,000 x 1,000 m |
| Cell size | 31.25 m |
| Simulation duration | 20 years |
| Output timesteps | 5 (years 4, 8, 12, 16, 20) |
2.2 Simulation Campaign
We generated 1,000 reservoir scenarios using Latin Hypercube Sampling (LHS) over six parameters, with ranges calibrated to Indonesian geothermal fields:
| Parameter | Min | Max | Unit | Reference Fields |
|---|---|---|---|---|
| Base temperature | 180 | 320 | C | Kamojang (245 C), Wayang Windu (270 C), Darajat (250 C) |
| Base pressure | 5 | 25 | MPa | Typical hydrostatic at 500-2500 m depth |
| Log10 permeability | -16 | -12 | log10(m^2) | Fractured volcanic rock range |
| Porosity | 0.01 | 0.15 | fraction | Dense to moderately porous andesite |
| Depth | 800 | 2500 | m | Shallow to deep Indonesian reservoirs |
| Number of wells | 1 | 5 | count | Small to medium field configuration |
All 1,000 simulations converged successfully. Each scenario produces:
- Temperature fields: (5, 32, 32) array in Celsius
- Pressure fields: (5, 32, 32) array in Pascals
- Well locations: (n_wells, 2) array of grid coordinates
- Parameter vector: 6 scalar values
Data is stored as individual .npz files (scenario_0000.npz through scenario_0999.npz) and split deterministically using seed=42:
- Training: 800 scenarios
- Validation: 100 scenarios
- Test: 100 scenarios
2.3 CNN Architecture
The ReservoirCNN follows an encoder-residual-decoder design:
Input (batch, 6, 32, 32)
|
v
[Encoder]
ConvBlock: Conv2d(6 -> 32, 3x3, pad=1) + BatchNorm2d(32) + ReLU
ConvBlock: Conv2d(32 -> 32, 3x3, pad=1) + BatchNorm2d(32) + ReLU
|
v
[Middle]
ResidualBlock: Conv(32->32) + BN + ReLU + Conv(32->32) + BN + skip + ReLU
ResidualBlock: Conv(32->32) + BN + ReLU + Conv(32->32) + BN + skip + ReLU
|
v
[Decoder]
ConvBlock: Conv2d(32 -> 32, 3x3, pad=1) + BatchNorm2d(32) + ReLU
Conv2d(32 -> 10, 1x1) + Sigmoid
|
v
Output (batch, 10, 32, 32)
Parameter count: 57,802 (v1.1, with 6 input channels)
The output has 10 channels: channels 0-4 are temperature fields at years 4, 8, 12, 16, 20 (normalized to [0,1]), and channels 5-9 are pressure fields at the same timesteps.
Weight initialization uses Kaiming normal for convolutional layers and constant initialization (weight=1, bias=0) for BatchNorm layers.
2.4 Input Encoding
v1.0 (3 channels -- failed for pressure)
| Channel | Content | Normalization |
|---|---|---|
| 0 | Log permeability field | (log_k + 16) / 4 -> [0, 1] |
| 1 | Well mask (Gaussian decay from well locations) | [0, 1] |
| 2 | Boundary encoding (composite of base T, porosity, depth) | Clipped to [0, 1] |
The boundary encoding channel combined three physical quantities into a single scalar field using ad hoc weighting: bc = clip(base_T_norm + 0.1 * porosity_norm * y_gradient + 0.05 * depth_norm). This encoding caused the CNN to lose the ability to distinguish between different pressure regimes, because base pressure, porosity, and depth -- the three parameters most important for pressure prediction -- were compressed and mixed together.
v1.1 (6 channels -- current)
| Channel | Content | Normalization | Range |
|---|---|---|---|
| 0 | Initial temperature field (T at t=0) | (T - 25) / (350 - 25) | [0, 1] |
| 1 | Log permeability | (log_k + 16) / 4 | [0, 1] |
| 2 | Well mask (Gaussian decay) | Raw values | [0, 1] |
| 3 | Base pressure | (P - 5e6) / (25e6 - 5e6) | [0, 1] |
| 4 | Porosity | (phi - 0.01) / (0.15 - 0.01) | [0, 1] |
| 5 | Depth | (z - 800) / (2500 - 800) | [0, 1] |
Each physical quantity occupies its own channel. The CNN receives unambiguous information about every parameter that affects both temperature and pressure evolution.
2.5 Physics-Informed Loss Function
The total training loss is:
L_total = L_data + lambda_physics * L_physics
where lambda_physics = 0.1.
Data loss (MSE between predicted and simulated fields):
L_data = (1/N) * sum_i || y_pred_i - y_true_i ||^2
Physics loss (weighted combination of three terms):
L_physics = 0.1 * L_temporal + 0.01 * L_spatial + 0.001 * L_darcy
Temporal smoothness (L_temporal): Penalizes non-physical jumps between consecutive timesteps. Acts as a proxy for energy conservation -- temperature and pressure should evolve smoothly under diffusion-dominated dynamics.
Spatial smoothness (L_spatial): Applies a discrete Laplacian filter to the predicted temperature fields and penalizes the squared magnitude. This enforces diffusion-like behavior and prevents checkerboard artifacts.
Darcy coupling (L_darcy): Penalizes the product of pressure gradient magnitude and permeability. In high-permeability zones, pressure gradients should be small (fluid flows easily), so the product k * |grad(P)| should be bounded.
2.6 Training Configuration
| Parameter | Value |
|---|---|
| Optimizer | Adam (beta1=0.9, beta2=0.999) |
| Initial learning rate | 1e-3 |
| Scheduler | ReduceLROnPlateau (patience=10, factor=0.5) |
| Batch size | 32 |
| Maximum epochs | 500 |
| Early stopping patience | 50 epochs |
| Physics loss weight | 0.1 |
| Random seed | 42 |
| Compute | VPS CPU (2 cores, 7.5 GB RAM) |
2.7 Output Denormalization
Predictions are mapped back to physical units:
T_celsius = T_norm * (350 - 25) + 25
P_pascals = P_norm * (50e6 - 1e5) + 1e5
3. Results
3.1 Training History
v1.0 (3-channel input)
| Metric | Value |
|---|---|
| Parameters | 56,938 |
| Training time | 14.0 minutes |
| Total epochs | 155 (early stopped) |
| Best epoch | 105 |
| Best validation loss | 0.007717 |
v1.1 (6-channel input)
| Metric | Value |
|---|---|
| Parameters | 57,802 |
| Training time | 44.9 minutes |
| Total epochs | 402 (early stopped) |
| Best epoch | 352 |
| Best validation loss | 0.000382 |
The 20x improvement in validation loss from v1.0 to v1.1 (0.00772 to 0.000382) confirms that the input encoding was the primary bottleneck, not the model capacity.
3.2 Test Set Results (v1.0 vs v1.1)
All metrics computed on 100 held-out test simulations using denormalized (physical unit) predictions.
| Metric | v1.0 | v1.1 | Improvement | Target | Status |
|---|---|---|---|---|---|
| RMSE Temperature | 6.62 C | 3.31 C | 2.0x | < 5.0 C | v1.0: FAIL, v1.1: PASS |
| R-squared Temperature | 0.975 | 0.994 | +0.019 | > 0.95 | v1.0: PASS, v1.1: PASS |
| MAE Temperature | 5.31 C | 2.40 C | 2.2x | -- | -- |
| Max Error Temperature | 42.85 C | 47.31 C | slightly worse | -- | -- |
| RMSE Pressure | 6.22 MPa | 0.35 MPa | 17.6x | reasonable | v1.0: FAIL, v1.1: PASS |
| R-squared Pressure | -0.003 | 0.997 | fixed | > 0.95 | v1.0: FAIL, v1.1: PASS |
| MAE Pressure | 5.53 MPa | 0.26 MPa | 21.3x | -- | -- |
| Max Error Pressure | 13.17 MPa | 5.04 MPa | 2.6x | -- | -- |
| Physics Violations | 0.0% | 0.0% | same | 0% | PASS |
| Inference Time | 3.50 ms | 3.19 ms | same | < 1,000 ms | PASS |
| Speedup vs Simulator | 257,243x | 282,350x | same | > 1,000x | PASS |
v1.0 failed 2 of 7 thresholds (RMSE T and R-squared P). v1.1 passes all 7.
3.3 Per-Timestep Accuracy (v1.1)
| Timestep | Year | T RMSE (C) | P RMSE (MPa) |
|---|---|---|---|
| 1 | 4 | 4.025 | 0.356 |
| 2 | 8 | 3.262 | 0.364 |
| 3 | 12 | 3.333 | 0.326 |
| 4 | 16 | 3.005 | 0.332 |
| 5 | 20 | 2.811 | 0.390 |
Temperature RMSE decreases over time -- the model is more accurate for later timesteps, which is expected because diffusion produces smoother fields as time progresses. Pressure RMSE is stable across all timesteps at approximately 0.35 MPa.
3.4 Physics Constraint Compliance
| Metric | Value |
|---|---|
| Physics violation rate | 0.0000% |
| Temperature prediction range | [25, 350] C (enforced by sigmoid + denormalization) |
| Pressure prediction range | [0.1, 50] MPa (enforced by sigmoid + denormalization) |
The sigmoid output activation naturally constrains predictions to the physical range. The physics loss terms further encourage physically consistent spatial and temporal patterns.
3.5 Inference Performance
| Metric | Value |
|---|---|
| Mean inference time | 3.188 ms |
| Median inference time | 2.683 ms |
| Finite-difference simulation time | ~900 seconds |
| Speedup | 282,350x |
| Hardware | CPU only (2 cores) |
The speedup is computed as 900 seconds / 0.003188 seconds = 282,350x. The 900-second baseline is the time for one finite-difference simulation on the same hardware.
4. Discussion
4.1 The v1.0 Failure and What It Taught Us
Version 1.0 used three input channels. Channel 2 was a "boundary encoding" that attempted to combine base temperature, porosity, and depth into a single scalar field:
bc_field = clip(base_T_norm + 0.1 * porosity_norm * y_gradient + 0.05 * depth_norm)
This worked adequately for temperature prediction (R-squared = 0.975) because the base temperature dominated the encoding. But it destroyed pressure information: base pressure was not included at all, and porosity and depth (the two parameters that most directly control pressure behavior through hydrostatic head and fluid storage) were attenuated by factors of 0.1 and 0.05 respectively.
The result: the CNN could not distinguish between scenarios with different pressure regimes. It learned to predict the dataset mean for pressure, producing R-squared = -0.003 (worse than a constant prediction).
The diagnostic was straightforward. We computed the mutual information between each input channel and the pressure targets. Channel 2 (boundary encoding) had near-zero correlation with pressure. The fix was equally straightforward: give each physical quantity its own input channel.
This failure demonstrates why feature engineering matters even for neural networks. A CNN can learn complex spatial patterns, but it cannot recover information that was destroyed before it reaches the first layer.
4.2 Benchmark Comparison
The USGS developed an ML surrogate for the Brady Hot Springs geothermal field in Nevada, reporting < 3.68% relative temperature error and 0.9 second inference time (Faulds et al., 2011; USGS ML benchmark).
| Metric | USGS Brady Hot Springs | GeoForce v1.1 |
|---|---|---|
| Temperature error | < 3.68% relative | 3.31 C RMSE (1.3% of range) |
| Pressure prediction | Not reported | 0.35 MPa RMSE (R-squared = 0.997) |
| Inference time | 0.9 s | 3.2 ms (281x faster) |
| Model size | Not reported | 57,802 parameters |
| Training data | Real field data | 1,000 synthetic simulations |
Direct comparison is limited because the USGS benchmark uses real field data on a different grid size and reservoir type. However, GeoForce's 3.31 C RMSE on a 25-350 C range (1.3% relative error) is comparable to the USGS's < 3.68% threshold, and GeoForce achieves this with 281x faster inference.
4.3 Strengths
- Dual-output prediction: Unlike most published surrogate models that predict temperature only, GeoForce predicts both temperature and pressure fields simultaneously.
- Compact model: 57,802 parameters enables CPU-only training and inference, with no GPU required at any stage.
- Physics enforcement: Zero physics violations across all test samples, with physically consistent spatial and temporal patterns.
- Reproducibility: Deterministic data splits (seed=42), saved indices, complete training logs, and versioned checkpoints.
- Speed: 282,350x speedup enables Monte Carlo uncertainty quantification with 10,000+ samples in under 1 minute.
4.4 Limitations
Simplified physics: The training data uses a 2D finite-difference solver, not the full TOUGH2 multiphase simulator. Single-phase (liquid water) only -- no steam-water phase transitions, which are significant in vapor-dominated systems like Kamojang.
2D grid: The 32x32 grid represents a horizontal slice of the reservoir. Vertical flow and gravity-driven convection, which are important in real geothermal systems, are not modeled.
Homogeneous permeability: Each scenario uses a single scalar permeability value. Real reservoirs have spatially heterogeneous permeability fields with fracture networks.
Fixed temporal resolution: Five output timesteps at 4-year intervals. Finer temporal resolution would require architectural changes.
Max error: The maximum temperature error of 47.3 C occurs in edge cases (extreme parameter combinations near the boundary of the training distribution). While the average accuracy is high, individual predictions in corner cases require caution.
No field validation: All validation is against the same class of synthetic simulations used for training. Validation against real field data (e.g., published Kamojang monitoring data) is pending.
4.5 Comparison with Other Approaches
| Method | Temperature Accuracy | Speed | Data Requirement | Limitations |
|---|---|---|---|---|
| TOUGH2 (full physics) | Exact (within discretization) | 15-30 min/run | Reservoir characterization | Too slow for UQ |
| Reduced-order models | ~5-10% error | 1-10 s | Physics model | Limited nonlinearity |
| Fourier Neural Operator | ~1-3% error | ~10 ms | 1,000+ simulations | GPU required, 100K+ params |
| GeoForce v1.1 | 3.31 C RMSE (1.3%) | 3.2 ms | 1,000 simulations | 2D, single-phase |
5. Conclusions
GeoForce v1.1 demonstrates that a compact CNN surrogate (57,802 parameters) can predict both temperature and pressure fields in geothermal reservoirs with high accuracy (R-squared > 0.99 for both quantities) and extreme speed (3.2 ms on CPU). The model passes all seven validation thresholds and achieves zero physics constraint violations.
The development process included a significant failure in v1.0, where a compressed 3-channel input encoding destroyed pressure information. The corrective redesign to 6 clean input channels fixed pressure prediction completely (R-squared from -0.003 to 0.997) and also improved temperature prediction (RMSE from 6.62 C to 3.31 C). This experience underscores the importance of careful feature engineering even when using deep learning.
Next Steps
- TOUGH2 integration: Replace the finite-difference solver with full TOUGH2 simulations for higher-fidelity training data, including multiphase (steam-water) behavior.
- Spatial heterogeneity: Add spatially varying permeability fields using geostatistical methods (FFT-based Gaussian random fields).
- 3D extension: Extend the grid to 3D (32x32x16) to capture vertical flow and gravity effects.
- Field validation: Compare predictions against published monitoring data from the Kamojang field (Darma et al., 2010; Hochstein and Sudarman, 2008).
- Expert review: Submit for review by ITB Geothermal Engineering faculty.
- Pertamina partnership: Use this validation report as the technical basis for a data-sharing proposal with Pertamina Geothermal Energy.
6. References
Pruess, K., "TOUGH2: A General-Purpose Numerical Simulator for Multiphase Fluid and Heat Flow," Report LBL-29400, Lawrence Berkeley National Laboratory, 1991. DOI: 10.2172/5212064
Jung, Y., Pau, G.S.H., Finsterle, S., Pollyea, R.M., "TOUGH3: A New Efficient Version of the TOUGH Suite of Multiphase Flow and Transport Simulators," Computers & Geosciences, 108, 2-7, 2017. DOI: 10.1016/j.cageo.2016.09.009
Zyvoloski, G.A., Robinson, B.A., Dash, Z.V., Trease, L.L., "Summary of the Models and Methods for the FEHM Application," Report LA-13307-MS, Los Alamos National Laboratory, 1997. DOI: 10.2172/565545
Raissi, M., Perdikaris, P., Karniadakis, G.E., "Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations," Journal of Computational Physics, 378, 686-707, 2019. DOI: 10.1016/j.jcp.2018.10.045
Karniadakis, G.E., Kevrekidis, I.G., Lu, L., Perdikaris, P., Wang, S., Yang, L., "Physics-Informed Machine Learning," Nature Reviews Physics, 3, 422-440, 2021. DOI: 10.1038/s42254-021-00314-5
Darma, S., Hadi, J., Munandar, A., "Geothermal Energy Update: Geothermal Energy Development and Utilization in Indonesia," Proceedings World Geothermal Congress 2010, Bali, Indonesia, 2010.
Hochstein, M.P., Sudarman, S., "History of Geothermal Exploration in Indonesia from 1970 to 2000," Geothermics, 37(3), 220-266, 2008. DOI: 10.1016/j.geothermics.2008.01.001
Saptadji, N.M., "Reservoir Engineering of Geothermal Systems in Indonesia: A Review," IOP Conference Series: Earth and Environmental Science, 103, 012002, 2017. DOI: 10.1088/1755-1315/103/1/012002
O'Sullivan, M.J., Pruess, K., Lippmann, M.J., "State of the Art of Geothermal Reservoir Simulation," Geothermics, 30(4), 395-429, 2001. DOI: 10.1016/S0375-6505(01)00005-0
He, K., Zhang, X., Ren, S., Sun, J., "Deep Residual Learning for Image Recognition," Proceedings of the IEEE CVPR, 770-778, 2016. DOI: 10.1109/CVPR.2016.90
Kingma, D.P., Ba, J., "Adam: A Method for Stochastic Optimization," Proceedings of ICLR, 2015. arXiv:1412.6980
Faulds, J.E., Hinz, N.H., Coolbaugh, M.F., et al., "Assessment of Favorable Structural Settings of Geothermal Systems in the Great Basin, Western USA," GRC Transactions, 35, 777-783, 2011.
ForceX AI Report version: 1.1 (2026-03-30)