Black-Oil Reservoir Simulation Compute Sizing Worksheet

Black-oil simulation is usually dominated by fully implicit nonlinear solves rather than explicit stencil propagations as in seismic processing. Compute sizing is therefore driven by active cells, unknown count, adaptive time-step behavior, Newton iterations, linear solver iterations, Jacobian sparsity, well-model overhead, memory footprint, and benchmark-derived effective throughput.

The most important practical point: unlike RTM there is no single clean law like \(f_{\max}^4\). Black-oil runtime is mostly an implicit-solver problem. The best end-to-end estimate combines analytic sizing with one representative benchmark-derived effective throughput.

Section 1 — Model and Physics Parameters

Fill these in.

Parameter	Symbol	Example
Total cells in grid	\(N_{\mathrm{cell}}\)	120,000,000
Active cells	\(N_{\mathrm{act}}\)	85,000,000
Average connected neighbors per active cell	\(N_{\mathrm{conn}}\)	6
Black-oil primary variables per active cell	\(N_v\)	3
Number of wells	\(N_{\mathrm{well}}\)	400
Well unknowns per well / control block	\(N_{\mathrm{wu}}\)	2
Total simulated time	\(T_{\mathrm{sim}}\)	20 years
Number of report steps	\(N_{\mathrm{rpt}}\)	240
Average nonlinear time steps per report step	\(N_{\Delta t/\mathrm{rpt}}\)	4
Average Newton iterations per time step	\(N_{\mathrm{Nwt}}\)	5
Average linear iterations per Newton step	\(N_{\mathrm{Lin}}\)	18

Section 2 — Hardware + Calibrated Throughput

Parameter	Symbol	Example
Nodes per run	\(N_{\mathrm{node}}\)	64
Usable memory per node	\(M_{\mathrm{node}}\)	480 GB
Effective sustained simulator throughput	\(P_{\mathrm{node}}\)	\(4.5\times 10^9\) \(N_{\mathrm{nz}}\)-iter/s per node
Parallel efficiency at chosen node count	\(\eta_{\mathrm{scale}}\)	0.72
Cluster utilization	\(U\)	0.85
Target wall time for the case	\(H_{\mathrm{wall}}\)	8 hours
Number of cases in campaign	\(N_{\mathrm{case}}\)	120

\(P_{\mathrm{node}}\) must be derived empirically from a benchmark or bounded from memory bandwidth.

\[ P_{\mathrm{node}}= \frac{N_{\mathrm{nz}}\,N_{\mathrm{step}}\,N_{\mathrm{Nwt}}\,N_{\mathrm{Lin}}}{t\,N_{\mathrm{node}}} \]

All quantities in this calibration must correspond to totals over the full simulation run.

Important: throughput is not a true input parameter in the way RTM cell-update throughput often is. For reservoir simulation, it is normally a calibrated quantity.

Section 3 — What “Black-Oil” Means for Sizing

A standard three-phase black-oil model tracks water, oil, and gas flow with phase appearance and disappearance handled through formation-volume factors, solution gas–oil ratio, and optionally oil vaporization in gas. Production simulators commonly solve these equations in a fully implicit way, so each time step requires assembly of a nonlinear residual and Jacobian, followed by Newton updates and repeated sparse linear solves.

For compute planning, the physics detail that matters most is not the exact constitutive formula but how many unknowns and couplings it produces. In a standard black-oil formulation, the cell unknown count is usually treated as three primary variables per active cell.

Section 4 — Active Cells, Not Gross Cells

The first quantity that matters is the number of active simulation cells:

\[ N_{\mathrm{act}} = f_{\mathrm{act}}N_{\mathrm{cell}} \]

where \(f_{\mathrm{act}}\) is the active-cell fraction after removing inactive cells, null regions, pinch-outs handled out of system, and any cells excluded from the flow solve.

Reservoir simulators scale primarily with \(N_{\mathrm{act}}\), not total geometric cells. If local grid refinement, near-well refinement, or fault handling increases the active count or the average neighbor count, cost rises accordingly.

Section 5 — Unknown Count

For black-oil, the first-order unknown count is:

\[ N_{\mathrm{unk,cell}} = N_vN_{\mathrm{act}} \]

and the total unknown count including wells is:

\[ N_{\mathrm{unk}} = N_vN_{\mathrm{act}} + N_{\mathrm{wu}}N_{\mathrm{well}} \]

In many field cases, the well-unknown contribution is numerically much smaller than the cell-unknown count, but wells can still have a large impact on nonlinear difficulty and timestep chopping.

Section 6 — Jacobian Sparsity and Nonzeros

The dominant sparse matrix is the Jacobian assembled at each Newton step. For planning, approximate the matrix connectivity using an average number of connected neighboring cells \(N_{\mathrm{conn}}\).

Define the coupled block count per cell as

\[ N_{\mathrm{blk}} = 1 + N_{\mathrm{conn}} \]

Then a useful scalar nonzero estimate for the cell-cell Jacobian block is

\[ N_{\mathrm{nz,cell}} \approx N_{\mathrm{act}}N_v^2N_{\mathrm{blk}} \]

Adding well couplings gives

\[ N_{\mathrm{nz}} \approx N_{\mathrm{nz,cell}} + N_{\mathrm{nz,well}} \]

with

\[ N_{\mathrm{nz,well}} \approx \alpha_{\mathrm{well,nz}}N_{\mathrm{perf}}N_vN_{\mathrm{wu}} \]

where \(N_{\mathrm{perf}}\) is the total number of perforations and \(\alpha_{\mathrm{well,nz}}\) is a small formulation-dependent constant. For most large field cases, the cell-cell block still dominates the matrix size.

Rule-of-thumb interpretation

For a 3-variable black-oil system with approximately 6 neighbors per active cell, \(N_{\mathrm{blk}}\approx 7\) and therefore

\[ N_{\mathrm{nz,cell}} \approx 63N_{\mathrm{act}} \]

scalar nonzeros before accounting for wells and extra coupling structure. Corner-point transmissibility patterns, NNCs, faults, and local refinement can raise this noticeably.

This can easily be pushed 2–3× higher by:

non-neighbor connections
faults
local refinement

Section 7 — Number of Time Steps

Unlike RTM, black-oil simulation is not usually sized by a CFL-driven fixed time step. Modern simulators use adaptive time stepping, so the right planning quantity is the actual number of accepted nonlinear time steps.

\[ N_{\mathrm{step}} = N_{\mathrm{rpt}}N_{\Delta t/\mathrm{rpt}} \]

This is one of the most important user inputs. Runtime can vary by multiple × for identical grids purely due to timestep behavior. It depends heavily on controls, well events, saturation fronts, compressibility, schedule changes, convergence tolerances, timestep chop/retry logic, and whether the case includes difficult phases such as gas liberation.

A reservoir case with the same grid can vary by several times in runtime simply because one schedule converges with large stable time steps and another repeatedly chops time steps near control changes, breakthroughs, or phase transitions.

Section 8 — Nonlinear and Linear Solver Work

Total Newton solves:

\[ N_{\mathrm{Nwt,tot}} = N_{\mathrm{step}}N_{\mathrm{Nwt}} \]

Total linear solves:

\[ N_{\mathrm{Lin,tot}} = N_{\mathrm{Nwt,tot}}N_{\mathrm{Lin}} \]

where \(N_{\mathrm{Lin}}\) is the average linear iterations per Newton step and

\[ N_{\mathrm{Krylov}} \equiv N_{\mathrm{Lin,tot}} \]

for the number of Krylov iterations.

A useful first-order work model is to count sparse-matrix operations as nonzeros touched per linear iteration. Then total linear-algebra work is

\[ W_{\mathrm{lin}} \approx \alpha_{\mathrm{lin}}N_{\mathrm{nz}}N_{\mathrm{step}}N_{\mathrm{Nwt}}N_{\mathrm{Lin}} \]

where \(\alpha_{\mathrm{lin}}\) absorbs how much work is done per Krylov iteration including sparse matrix-vector multiply, preconditioner application, vector updates, reductions, and residual evaluation.

If you use CPR-type preconditioning, \(\alpha_{\mathrm{lin}}\) is often much larger than a bare SpMV count because the pressure stage, restriction/prolongation, smoothers, and auxiliary solves all add work. In reality, \(\alpha_{\mathrm{lin}}\) can be 5–20× SpMV as it is dominated by preconditioning cost.

Section 9 — Assembly and Property Evaluation Work

Reservoir simulation cost is not only the linear solver. Each Newton step also includes flux assembly, accumulation terms, relative permeability, capillary pressure, PVT lookups, well equations, and automatic-differentiation/Jacobian construction if used.

Model this separately as

\[ W_{\mathrm{asm}} \approx \alpha_{\mathrm{asm}}N_{\mathrm{act}}N_{\mathrm{step}}N_{\mathrm{Nwt}} \]

where \(\alpha_{\mathrm{asm}}\) is an empirical per-cell assembly cost.

Total end-to-end work becomes

\[ W_{\mathrm{tot}} \approx W_{\mathrm{asm}} + W_{\mathrm{lin}} + W_{\mathrm{io}} + W_{\mathrm{sched}} \]

with \(W_{\mathrm{io}}\) capturing restart/output work and \(W_{\mathrm{sched}}\) capturing schedule parsing, well-control logic, and other non-kernel overheads.

Section 10 — A More Practical Single-Multiplier Form

In planning work, the cleanest method is often to replace the detailed constants above with a measured black-oil campaign multiplier.

\[ \alpha_{\mathrm{BO}} = \frac{\text{measured end-to-end work for representative case}}{N_{\mathrm{act}}N_{\mathrm{step}}N_{\mathrm{Nwt}}N_{\mathrm{Lin}}} \]

\[ \alpha_{\mathrm{BO}} \sim \text{effective work per cell linear iteration} \sim \frac{\text{work}}{N_{\mathrm{act}}N_{\mathrm{Lin,tot}}} \]

Then define the normalized case work as

\[ W_{\mathrm{BO}} = \alpha_{\mathrm{BO}}N_{\mathrm{act}}N_{\mathrm{step}}N_{\mathrm{Nwt}}N_{\mathrm{Lin}} \]

This collapses assembly, well overhead, preconditioner cost, and implementation details into one calibration constant. It is the reservoir-simulation equivalent of using an end-to-end RTM throughput instead of trying to model every stencil and overhead term analytically.

Important: if your chosen throughput \(P_{\mathrm{node}}\) already comes from measured end-to-end black-oil runtime, do not apply both \(\alpha_{\mathrm{BO}}\) and separate assembly / linear-work constants or you will double count cost.

Use one of:
analytic model \(\left(W_{\mathrm{asm}}+W_{\mathrm{lin}}+\cdots\right)\)
calibrated model \((\alpha_{\mathrm{BO}})\)
Never use both.

Section 11 — Runtime Model

If you size from an analytic nonzero-based throughput, runtime is

\[ t_{\mathrm{case}} = \frac{W_{\mathrm{tot}}}{N_{\mathrm{node}}P_{\mathrm{node,eff}}} \]

where

\[ P_{\mathrm{node,eff}} = P_{\mathrm{node}}\eta_{\mathrm{scale}} \]

and \(P_{\mathrm{node}}\) is the sustained per-node effective throughput for the chosen work unit. That work unit must match the numerator. If your work is \(N_{\mathrm{nz}}\)-based, your throughput must also be \(N_{\mathrm{nz}}\)-based. Mixing units, such as \(N_{\mathrm{nz}}\)-based work with FLOP-based throughput, invalidates the estimate.

Common valid choices are:

scalar-nonzero · Krylov-iteration per second
Newton-cell updates per second
fully measured case-equivalent work per second

If instead you benchmark end-to-end directly, use

\[ t_{\mathrm{case}} = \frac{\alpha_{\mathrm{BO}}N_{\mathrm{act}}N_{\mathrm{step}}N_{\mathrm{Nwt}}N_{\mathrm{Lin}}}{N_{\mathrm{node}}P_{\mathrm{node,eff}}} \]

Section 12 — Node-hours for One Case

The accounting quantity needed for campaign planning is

\[ \mathrm{Nodeh}_{\mathrm{case}} = \frac{N_{\mathrm{node}}t_{\mathrm{case}}}{3600} \]

Section 13 — Campaign Sizing

\[ \mathrm{Nodeh}_{\mathrm{tot}} = N_{\mathrm{case}}\,\mathrm{Nodeh}_{\mathrm{case}} \]

\[ N_{\mathrm{cluster}} = \frac{\mathrm{Nodeh}_{\mathrm{tot}}}{UH_{\mathrm{wall}}} \]

where \(N_{\mathrm{cluster}}\) is the number of nodes required in the cluster to finish the total campaign inside the target wall-clock window \(H_{\mathrm{wall}}\).

Section 14 — Memory Sizing

Memory can be the dominant feasibility constraint for large black-oil cases. You need enough memory for grid/rock/fluid state, Jacobian, Krylov vectors, preconditioner data structures, halo buffers, wells, and output buffering.

State and property memory

\[ M_{\mathrm{state}} \approx N_{\mathrm{act}}b_{\mathrm{state}} \]

where \(b_{\mathrm{state}}\) includes primary variables, saturations, pressures, mobilities, transmissibilities, PVT properties, relperm/capillary auxiliary arrays, and temporary work arrays.

Jacobian memory

\[ M_{\mathrm{jac}} \approx N_{\mathrm{nz}}(b_{\mathrm{val}}+b_{\mathrm{col}})+(N_{\mathrm{unk}}+1)b_{\mathrm{ptr}} \]

For example, in double precision with 8-byte values, 4- or 8-byte column indices, and 8-byte row pointers, the Jacobian alone can be very large.

Krylov and preconditioner memory

\[ M_{\mathrm{vec}} \approx N_{\mathrm{vec}}N_{\mathrm{unk}}b_{\mathrm{fp}} \]

\[ M_{\mathrm{prec}} \approx \alpha_{\mathrm{prec}}\left(M_{\mathrm{jac}}+M_{\mathrm{coarse}}+M_{\mathrm{aux}}\right) \]

Here \(N_{\mathrm{vec}}\) is the number of simultaneously resident Krylov/work vectors and \(\alpha_{\mathrm{prec}}\) is an empirical multiplier for the chosen preconditioner hierarchy.

Total memory

\[ M_{\mathrm{tot}} \approx \gamma_{\mathrm{mem}}\left(M_{\mathrm{state}}+M_{\mathrm{jac}}+M_{\mathrm{vec}}+M_{\mathrm{prec}}+M_{\mathrm{well}}+M_{\mathrm{io}}\right) \]

where \(\gamma_{\mathrm{mem}}\) is a safety factor for fragmentation, transient peaks during assembly/preconditioning, and implementation overhead. A prudent planning value is usually greater than 1.1 and often closer to 1.2–1.4 for production sizing.

The per-node memory requirement is

\[ M_{\mathrm{node,req}} \approx \frac{M_{\mathrm{tot}}}{N_{\mathrm{node}}} \]

Feasibility requires

\[ M_{\mathrm{node,req}} < M_{\mathrm{node}} \]

Section 15 — Parallel Scaling and Communication

Black-oil simulators do not strong-scale indefinitely. Communication and coarse-grid / pressure-solve behavior eventually dominate. A useful planning view is

\[ P_{\mathrm{node,eff}} \approx \min\left(P_{\mathrm{mem}},P_{\mathrm{comp}},P_{\mathrm{comm}},P_{\mathrm{global}}\right) \]

where

\(P_{\mathrm{mem}}\) = memory bandwidth-limited throughput
\(P_{\mathrm{comp}}\) = compute-limited throughput
\(P_{\mathrm{comm}}\) = halo and sparse-exchange-limited throughput
\(P_{\mathrm{global}}\) = global-reduction / coarse-solve-limited throughput

Krylov methods and CPR-style preconditioners can suffer from global synchronizations and coarse-stage imbalance, so reservoir simulation often loses efficiency earlier than embarrassingly parallel shot-based seismic workloads.

Communication intuition

As subdomains become smaller, each node owns fewer cells but must still exchange boundary values and participate in global reductions. That causes the surface-to-volume ratio and synchronization cost to rise. The result is that runtime usually stops improving long before memory is exhausted.

Section 16 — What Actually Changes Runtime the Most

Active-cell count — directly increases unknowns, Jacobian size, memory, and total work.
Average connectivity — increases Jacobian nonzeros and communication.
Number of accepted time steps — often the largest schedule-driven multiplier.
Newton iterations per step — sensitive to phase transitions, controls, tolerances, and nonlinear difficulty.
Linear iterations per Newton step — sensitive to heterogeneity, preconditioner quality, faults, anisotropy, and scaling.
Well count / perforation count — usually modest in raw unknown count, but can strongly affect convergence and chopping.
I/O and restart frequency — important for large ensemble studies and history matching campaigns.

Section 17 — The Most Useful Practical Workflow

Count \(N_{\mathrm{act}}\), \(N_{\mathrm{conn}}\), \(N_{\mathrm{well}}\), and total perforations from the actual deck/model.
Estimate or measure \(N_{\mathrm{step}}\), \(N_{\mathrm{Nwt}}\), and \(N_{\mathrm{Lin}}\) from a representative case, not from optimism.
Compute \(N_{\mathrm{unk}}\), \(N_{\mathrm{nz}}\), and \(M_{\mathrm{tot}}\) to confirm the run fits in node memory.
Benchmark one representative case at the intended node count and derive \(P_{\mathrm{node,eff}}\) or \(\alpha_{\mathrm{BO}}\).
Use that calibrated value to scale campaign node-hours and required cluster size.

For procurement or architecture work, the benchmark-derived effective throughput is usually more trustworthy than any fully analytic estimate, because it captures real preconditioner behavior, software overhead, communication, and timestep acceptance patterns.

Section 18 — Back-of-the-Envelope Formulas

If you need a compact planning view, use the following:

\[ N_{\mathrm{unk}} \approx 3N_{\mathrm{act}} \]

\[ N_{\mathrm{nz}} \approx 63N_{\mathrm{act}} \qquad\text{for a 3-variable system with about 6 neighbors per cell} \]

\[ W_{\mathrm{BO}} \propto N_{\mathrm{act}}N_{\mathrm{step}}N_{\mathrm{Nwt}}N_{\mathrm{Lin}} \]

\[ t_{\mathrm{case}} \propto \frac{N_{\mathrm{act}}N_{\mathrm{step}}N_{\mathrm{Nwt}}N_{\mathrm{Lin}}}{N_{\mathrm{node}}\eta_{\mathrm{scale}}} \]

\[ M_{\mathrm{tot}} \propto N_{\mathrm{unk}} + N_{\mathrm{nz}} \]

Those are the black-oil equivalents of the simple seismic sizing laws. They are not exact, but they identify the dominant growth terms correctly.

Section 19 — Interpretation Compared with RTM

RTM sizing is dominated by explicit wave-propagation work over a fixed spatial-temporal discretization. Black-oil sizing is dominated by repeated implicit nonlinear solves whose cost depends on convergence behavior. In practice this means:

RTM is often easier to predict from physics resolution requirements.
Black-oil is often easier to mis-size if you ignore timestep chopping and solver iterations.
For reservoir simulation, representative benchmark calibration matters more.
Memory feasibility is often as important as raw throughput.