Seismic HPC FWI Sizing Worksheet

Section 1 — Model Parameters

ParameterSymbolExample
Model length X\(L_x\)40000 m
Model length Y\(L_y\)40000 m
Model depth\(L_z\)12000 m
Minimum velocity\(v_{\min}\)1500 m/s
Maximum velocity\(v_{\max}\)4500 m/s
Maximum frequency\(f_{\max}\)15 Hz
Grid points per wavelength\(n_\lambda\)10
CFL constant\(C_{\mathrm{CFL}}\)0.45
Propagation time\(T_{\mathrm{prop}}\)8 s
Number of shots\(N_{\mathrm{shot}}\)20000

Section 2 — Hardware Parameters

ParameterSymbolExample
GPU effective throughput (cell updates/s)\(P_{\mathrm{gpu}}\)\(6\times10^9\)
Scaling efficiency\(\eta\)0.75
GPUs per shot domain\(G_{\mathrm{shot}}\)64
Cluster utilization\(U\)0.85
Target campaign time\(H_{\mathrm{wall}}\)120 hours

Section 3 — FWI-Specific Parameters

ParameterSymbolExample
Shots per FWI iteration\(N_{\mathrm{shot,iter}}\)500
FWI iterations\(N_{\mathrm{iter}}\)8
Frequency bands\(N_{\mathrm{band}}\)4

Repeat from RTM sizing for work per propagation

CFL constant

\[ C_{\mathrm{CFL}} = \frac{v\,\Delta t}{\Delta x} \]

where \(v\) is maximum propagation velocity, \(\Delta t\) is time step, and \(\Delta x\) is spatial grid spacing.

Section 4 — Spatial Grid

\[ \Delta x \approx \frac{v_{\min}}{n_\lambda f_{\max}} \]

Where \(n_\lambda\) is grid points per wavelength and \(v_{\min}\) is minimum velocity in the model.

This ensures that the shortest wavelength is resolved and determines grid spacing.

Section 5 — Number of Cells

The number of grid cells is determined by the physical model dimensions and the spatial grid spacing.

\[ N_{\mathrm{cell}}=\frac{L_xL_yL_z}{\Delta x^3} \]

where \(L_x\), \(L_y\), and \(L_z\) are the physical model dimensions and \(\Delta x\) is the spatial grid spacing.

Frequency scaling

Grid spacing is chosen to resolve the shortest wavelength in the model:

\[ \Delta x \approx \frac{v_{\min}}{n_\lambda f_{\max}} \]

where \(v_{\min}\) is minimum velocity in the model, \(f_{\max}\) is maximum propagated frequency, and \(n_\lambda\) is grid points per wavelength.

Substituting this into the cell-count expression gives

\[ N_{\mathrm{cell}}=K_x f_{\max}^3 \qquad\text{where}\qquad K_x=\frac{L_xL_yL_zn_\lambda^3}{v_{\min}^3} \]

So \(N_{\mathrm{cell}}\propto f_{\max}^3\) for a fixed physical model size.

Padded computational domain

In practice, the solver operates on a larger computational grid than the physical model. Padding is required for several reasons:

Let the padded computational dimensions be

\[ L_x^{\mathrm{comp}} = L_x + 2L_{\mathrm{PML},x},\qquad L_y^{\mathrm{comp}} = L_y + 2L_{\mathrm{PML},y},\qquad L_z^{\mathrm{comp}} = L_z + L_{\mathrm{PML,top}} + L_{\mathrm{PML,bot}} \]

The true number of grid cells used by the simulation is then

\[ N_{\mathrm{cell,true}} = \frac{L_x^{\mathrm{comp}}L_y^{\mathrm{comp}}L_z^{\mathrm{comp}}}{\Delta x^3} \]

Determining absorber thickness

Absorbing boundary layers are usually specified in grid points rather than physical distance.

\[ L_{\mathrm{PML}} = N_{\mathrm{PML}}\Delta x \]

Typical values in production wave-equation solvers are on the order of 20–40 grid cells per absorbing boundary, depending on absorber formulation and reflection tolerance requirements.

The \(f_{\max}^3\) scaling law describes the interior model scaling. Boundary padding slightly increases the total cell count beyond the interior \(f_{\max}^3\) scaling, so actual simulations should compute the cell count using the padded dimensions above rather than the asymptotic scaling rule alone.

One-line practical formula

For a uniform grid, if you know the model volume, minimum velocity, target frequency, and points per wavelength, collapse the spatial constant and work in km\(^3\):

\[ N_{\mathrm{cell}} \approx \left(\frac{n_\lambda}{v_{\min}}\right)^3 \left(10^9\right) V_{\mathrm{km}^3}f_{\max}^3 \]

\(10^9\) comes from \(1~\mathrm{km}^3 = 10^9~\mathrm{m}^3\), where \(V_{\mathrm{km}^3}\) is model volume in km\(^3\), \(v_{\min}\) is in m/s, and \(f_{\max}\) is in Hz.

Back-of-the-napkin size and padding correction shortcuts, assuming \(n_\lambda\approx 10\) and \(v_{\min}\approx 1500~\mathrm{m/s}\):

\[ N_{\mathrm{cell,true}}\approx (1.15\text{ to }1.30)N_{\mathrm{cell}} \qquad\text{where}\qquad N_{\mathrm{cell}}\approx 300V_{\mathrm{km}^3}f_{\max}^3 \]
\[ \text{so from } W_{\mathrm{prop}}\propto f_{\max}^4 \Rightarrow W_{\mathrm{prop}}\propto N_{\mathrm{cell}}f_{\max} \]

Section 6 — Time Step

CFL constant:

\[ C_{\mathrm{CFL}} = \frac{v_{\max}\Delta t}{\Delta x} \]

where \(v_{\max}\) is maximum propagation velocity, \(\Delta t\) is time step, and \(\Delta x\) is spatial grid spacing.

\[ \Delta t \le \frac{C_{\mathrm{CFL}}\Delta x}{v_{\max}} \]

Production codes usually choose close to the limit for efficiency:

\[ \Delta t \approx \frac{C_{\mathrm{CFL}}\Delta x}{v_{\max}} \]

Because the time step scales with grid spacing, \(\Delta t\propto \Delta x\), and since grid spacing decreases as maximum frequency increases, \(\Delta x\propto 1/f_{\max}\), the number of time steps required for a fixed propagation time scales as \(N_t\propto f_{\max}\).

Section 7 — Number of Time Steps

\[ N_t = \frac{T_{\mathrm{prop}}}{\Delta t} \]

Scaling:

\[ N_t \propto f_{\max} \]

This is correct only because \(\Delta x\propto 1/f_{\max}\) and \(\Delta t\propto \Delta x\) from the CFL condition above.

To be specific, since the linear slope is non-trivial:

\[ N_t= \frac{T_{\mathrm{prop}}n_\lambda v_{\max}}{C_{\mathrm{CFL}}v_{\min}}f_{\max} \Rightarrow N_t = K_t f_{\max} \]
\[ K_t = \frac{T_{\mathrm{prop}}n_\lambda v_{\max}}{C_{\mathrm{CFL}}v_{\min}} \]

So higher frequency yields more time steps linearly, but the slope depends on model physics and numerical methods.

Section 8 — Wavefield Work per Propagation

\[ W_{\mathrm{prop}} = N_{\mathrm{cell}}N_t \]

This counts cell updates, not FLOPs.

Spatial discretization: \(N_{\mathrm{cell}}\propto f_{\max}^3\)

Temporal discretization: \(N_t\propto f_{\max}\)

Scaling:

\[ W_{\mathrm{prop}}\propto f_{\max}^4 \]

This is the core seismic HPC scaling law for uniform 3D discretization with fixed physical model size.

\[ W_{\mathrm{prop}} = K_xK_tf_{\max}^4 \]
\[ K_x=\frac{L_xL_yL_zn_\lambda^3}{v_{\min}^3}, \qquad K_t=\frac{T_{\mathrm{prop}}n_\lambda v_{\max}}{C_{\mathrm{CFL}}v_{\min}} \]
\[ \Rightarrow W_{\mathrm{prop}}= \frac{L_xL_yL_zT_{\mathrm{prop}}n_\lambda^4v_{\max}}{C_{\mathrm{CFL}}v_{\min}^4} f_{\max}^4 \]

\(W_{\mathrm{prop}}\) counts total cell updates and is therefore dimensionless; the apparent time units in the constants cancel when combined with \(f_{\max}^4\).

These constants assume explicit time stepping and uniform grid spacing determined by minimum velocity and a fixed points-per-wavelength rule. Adaptive and anisotropic grids break this constant.

FWI-Specific Sections

Section 9 — Full-Waveform Inversion (FWI) Cost Model

FWI builds directly on the same wave-equation propagation cost used in RTM. The key difference is that FWI requires multiple forward-equivalent propagations per shot per iteration, and repeats this across many iterations and frequency bands.

The correct way to think about FWI sizing is:

RTM gives you the cost of one propagation.
FWI multiplies that cost by how many forward-equivalent propagations are required by the inversion algorithm.

9.1 Baseline propagation work (from RTM)

From earlier sections:

\[ W_{\mathrm{fwd}}\propto N_{\mathrm{cell}}N_t \]

This represents one forward wave-equation solve (cell updates, not FLOPs).

All FWI cost builds on this baseline.

9.2 Decomposing FWI cost correctly

FWI cost must be separated into three independent components:

(1) Propagation count (algorithmic work)

How many forward-equivalent propagations are required per shot per iteration.

\[ \alpha_{\mathrm{prop}} = 2 + \alpha_{\mathrm{replay}} + N_{\mathrm{trial}} \]

Where:

(2) Kernel overhead (per time step cost)

Additional cost per time step relative to a baseline forward solve:

\[ \alpha_{\mathrm{kernel}} \ge 1 \]

Captures:

Typical planning range:

(3) Campaign structure (problem size)

FWI is repeated across:

This is not a multiplier, but a summation over work:

\[ W_{\mathrm{campaign}}\propto \sum_b N_{\mathrm{shot},b}N_{\mathrm{iter},b}W_{\mathrm{shot,iter},b} \]

9.3 Per-shot, per-iteration cost

Combining the above:

\[ W_{\mathrm{shot,iter}}\propto \alpha_{\mathrm{FWI,iter}}N_{\mathrm{cell}}N_t \]

where:

\[ \alpha_{\mathrm{FWI,iter}} = \alpha_{\mathrm{prop}}\alpha_{\mathrm{kernel}} \]

9.4 Practical planning values (acoustic FWI)

For standard adjoint-state acoustic FWI:

ComponentTypical Value
Forward + adjoint2.0
Replay overhead0.2 – 0.5
Trial models0 – 1.0
Kernel overhead1.0 – 1.2

Resulting in:

\[ \alpha_{\mathrm{FWI,iter}}\approx 2.5 \text{ to } 4.0 \]

Recommended planning values:

This is the single most useful number for early-stage compute sizing.

9.5 Frequency-band staging (multi-scale FWI)

FWI is typically performed in multiple frequency bands:

Each band has its own:

Total work:

\[ W_{\mathrm{campaign}}\propto \sum_b N_{\mathrm{shot},b}N_{\mathrm{iter},b}\alpha_{\mathrm{FWI,iter},b}N_{\mathrm{cell},b}N_{t,b} \]

Important:

9.6 Simplified uniform approximation

If:

Then:

\[ W_{\mathrm{campaign}}\propto N_{\mathrm{shot,iter}}N_{\mathrm{iter}}\alpha_{\mathrm{FWI,iter}}N_{\mathrm{cell}}N_t \]

This is the most useful form for back-of-the-envelope cluster sizing.

9.7 What the FWI multiplier does (and does NOT do)

The FWI multiplier applies to propagation count, not discretization.

Correct:

\[ W \propto \alpha_{\mathrm{FWI}}\left(N_{\mathrm{cell}}N_t\right) \]

Incorrect (common mistake):

FWI increases how many propagations you run, not how large each propagation is.

9.8 Runtime model (HPC interpretation)

Per shot, per iteration runtime:

\[ t_{\mathrm{shot,iter}}= \frac{\alpha_{\mathrm{FWI,iter}}N_{\mathrm{cell}}N_t}{P_{\mathrm{gpu,eff}}G_{\mathrm{shot}}} \]

Where:

Total campaign wall-clock depends on:

Section 10 — Extension to Elastic and Advanced Physics

FWI cost increases significantly with more complex physics.

10.1 Elastic modeling multiplier

Elastic wave equations introduce:

Planning estimate:

\[ \beta_{\mathrm{elastic}}\sim 4 \text{ to } 10 \]

Relative to acoustic propagation.

Example:

10.2 Combined cost

\[ W_{\mathrm{FWI,elastic}}\propto \beta_{\mathrm{elastic}}\alpha_{\mathrm{FWI,iter}}N_{\mathrm{cell}}N_t \]

This is why elastic FWI is often an order of magnitude more expensive than acoustic.

10.3 Shot Concurrency

In a cluster environment, total time is not the sum of shots because of concurrency. Wall-time scaling is better represented as:

\[ \frac{N_{\mathrm{shots,total}}}{N_{\mathrm{shots,concurrent}}} \]

multiplied by per-shot runtime.

Section 11 — Memory Considerations for FWI

Memory is often the limiting factor in FWI, not compute.

11.1 First-order estimate

\[ \mathrm{Mem}\approx N_{\mathrm{cell}}\,n_{\mathrm{field}}\times(\text{bytes per value}\times\text{overhead}) \]

11.2 What actually consumes memory

FWI memory includes:

Checkpointing strategy directly affects:

This creates a fundamental trade-off between memory and compute in FWI. Increasing checkpointing reduces memory usage but increases \(\alpha_{\mathrm{replay}}\), directly increasing total compute cost.

11.3 Elastic memory scaling

Elastic models require more stored fields:

\[ \mathrm{Mem}_{\mathrm{elastic}}\approx 2\text{–}4\times \mathrm{Mem}_{\mathrm{acoustic}} \]

depending on formulation and implementation.

Bottom line

\[ \alpha_{\mathrm{FWI,iter}}\approx 2.5\text{–}4.0\quad (\text{acoustic}) \]