Seismic HPC FWI Sizing Worksheet

Section 1 — Model Parameters

Parameter	Symbol	Example
Model length X	\(L_x\)	40000 m
Model length Y	\(L_y\)	40000 m
Model depth	\(L_z\)	12000 m
Minimum velocity	\(v_{\min}\)	1500 m/s
Maximum velocity	\(v_{\max}\)	4500 m/s
Maximum frequency	\(f_{\max}\)	15 Hz
Grid points per wavelength	\(n_\lambda\)	10
CFL constant	\(C_{\mathrm{CFL}}\)	0.45
Propagation time	\(T_{\mathrm{prop}}\)	8 s
Number of shots	\(N_{\mathrm{shot}}\)	20000

Section 2 — Hardware Parameters

Parameter	Symbol	Example
GPU effective throughput (cell updates/s)	\(P_{\mathrm{gpu}}\)	\(6\times10^9\)
Scaling efficiency	\(\eta\)	0.75
GPUs per shot domain	\(G_{\mathrm{shot}}\)	64
Cluster utilization	\(U\)	0.85
Target campaign time	\(H_{\mathrm{wall}}\)	120 hours

Section 3 — FWI-Specific Parameters

Parameter	Symbol	Example
Shots per FWI iteration	\(N_{\mathrm{shot,iter}}\)	500
FWI iterations	\(N_{\mathrm{iter}}\)	8
Frequency bands	\(N_{\mathrm{band}}\)	4

Repeat from RTM sizing for work per propagation

CFL constant

\[ C_{\mathrm{CFL}} = \frac{v\,\Delta t}{\Delta x} \]

where \(v\) is maximum propagation velocity, \(\Delta t\) is time step, and \(\Delta x\) is spatial grid spacing.

Intuitively, this expresses how small the time step must be relative to the grid spacing and the maximum wave speed so information does not travel farther than one grid cell per time step.
The numerical domain of dependence must contain the physical domain of dependence, and the stable CFL limit depends on dimension, stencil, and discretization.
If CFL forces a smaller \(\Delta t\), timesteps, FLOPs, and runtime all increase.

Section 4 — Spatial Grid

\[ \Delta x \approx \frac{v_{\min}}{n_\lambda f_{\max}} \]

Where \(n_\lambda\) is grid points per wavelength and \(v_{\min}\) is minimum velocity in the model.

This ensures that the shortest wavelength is resolved and determines grid spacing.

Section 5 — Number of Cells

The number of grid cells is determined by the physical model dimensions and the spatial grid spacing.

\[ N_{\mathrm{cell}}=\frac{L_xL_yL_z}{\Delta x^3} \]

where \(L_x\), \(L_y\), and \(L_z\) are the physical model dimensions and \(\Delta x\) is the spatial grid spacing.

Frequency scaling

Grid spacing is chosen to resolve the shortest wavelength in the model:

\[ \Delta x \approx \frac{v_{\min}}{n_\lambda f_{\max}} \]

where \(v_{\min}\) is minimum velocity in the model, \(f_{\max}\) is maximum propagated frequency, and \(n_\lambda\) is grid points per wavelength.

Substituting this into the cell-count expression gives

\[ N_{\mathrm{cell}}=K_x f_{\max}^3 \qquad\text{where}\qquad K_x=\frac{L_xL_yL_zn_\lambda^3}{v_{\min}^3} \]

So \(N_{\mathrm{cell}}\propto f_{\max}^3\) for a fixed physical model size.

Padded computational domain

In practice, the solver operates on a larger computational grid than the physical model. Padding is required for several reasons:

PML absorbing boundary layers
finite-difference stencil halo regions
domain decomposition alignment or rounding

Let the padded computational dimensions be

\[ L_x^{\mathrm{comp}} = L_x + 2L_{\mathrm{PML},x},\qquad L_y^{\mathrm{comp}} = L_y + 2L_{\mathrm{PML},y},\qquad L_z^{\mathrm{comp}} = L_z + L_{\mathrm{PML,top}} + L_{\mathrm{PML,bot}} \]

The true number of grid cells used by the simulation is then

\[ N_{\mathrm{cell,true}} = \frac{L_x^{\mathrm{comp}}L_y^{\mathrm{comp}}L_z^{\mathrm{comp}}}{\Delta x^3} \]

Determining absorber thickness

Absorbing boundary layers are usually specified in grid points rather than physical distance.

\[ L_{\mathrm{PML}} = N_{\mathrm{PML}}\Delta x \]

Typical values in production wave-equation solvers are on the order of 20–40 grid cells per absorbing boundary, depending on absorber formulation and reflection tolerance requirements.

The \(f_{\max}^3\) scaling law describes the interior model scaling. Boundary padding slightly increases the total cell count beyond the interior \(f_{\max}^3\) scaling, so actual simulations should compute the cell count using the padded dimensions above rather than the asymptotic scaling rule alone.

One-line practical formula

For a uniform grid, if you know the model volume, minimum velocity, target frequency, and points per wavelength, collapse the spatial constant and work in km\(^3\):

\[ N_{\mathrm{cell}} \approx \left(\frac{n_\lambda}{v_{\min}}\right)^3 \left(10^9\right) V_{\mathrm{km}^3}f_{\max}^3 \]

\(10^9\) comes from \(1~\mathrm{km}^3 = 10^9~\mathrm{m}^3\), where \(V_{\mathrm{km}^3}\) is model volume in km\(^3\), \(v_{\min}\) is in m/s, and \(f_{\max}\) is in Hz.

Back-of-the-napkin size and padding correction shortcuts, assuming \(n_\lambda\approx 10\) and \(v_{\min}\approx 1500~\mathrm{m/s}\):

\[ N_{\mathrm{cell,true}}\approx (1.15\text{ to }1.30)N_{\mathrm{cell}} \qquad\text{where}\qquad N_{\mathrm{cell}}\approx 300V_{\mathrm{km}^3}f_{\max}^3 \]

\[ \text{so from } W_{\mathrm{prop}}\propto f_{\max}^4 \Rightarrow W_{\mathrm{prop}}\propto N_{\mathrm{cell}}f_{\max} \]

Section 6 — Time Step

CFL constant:

\[ C_{\mathrm{CFL}} = \frac{v_{\max}\Delta t}{\Delta x} \]

where \(v_{\max}\) is maximum propagation velocity, \(\Delta t\) is time step, and \(\Delta x\) is spatial grid spacing.

Intuitively, it controls how small the time step must be relative to the grid spacing and maximum wave speed so information does not travel farther than one grid cell per time step.
The numerical domain of dependence must contain the physical domain of dependence, and the stable CFL limit depends on dimension, stencil, and discretization.
If CFL forces a smaller \(\Delta t\), timesteps, FLOPs, and runtime all increase.

\[ \Delta t \le \frac{C_{\mathrm{CFL}}\Delta x}{v_{\max}} \]

Production codes usually choose close to the limit for efficiency:

\[ \Delta t \approx \frac{C_{\mathrm{CFL}}\Delta x}{v_{\max}} \]

Because the time step scales with grid spacing, \(\Delta t\propto \Delta x\), and since grid spacing decreases as maximum frequency increases, \(\Delta x\propto 1/f_{\max}\), the number of time steps required for a fixed propagation time scales as \(N_t\propto f_{\max}\).

Section 7 — Number of Time Steps

\[ N_t = \frac{T_{\mathrm{prop}}}{\Delta t} \]

Scaling:

\[ N_t \propto f_{\max} \]

This is correct only because \(\Delta x\propto 1/f_{\max}\) and \(\Delta t\propto \Delta x\) from the CFL condition above.

To be specific, since the linear slope is non-trivial:

\[ N_t= \frac{T_{\mathrm{prop}}n_\lambda v_{\max}}{C_{\mathrm{CFL}}v_{\min}}f_{\max} \Rightarrow N_t = K_t f_{\max} \]

\[ K_t = \frac{T_{\mathrm{prop}}n_\lambda v_{\max}}{C_{\mathrm{CFL}}v_{\min}} \]

So higher frequency yields more time steps linearly, but the slope depends on model physics and numerical methods.

Section 8 — Wavefield Work per Propagation

\[ W_{\mathrm{prop}} = N_{\mathrm{cell}}N_t \]

This counts cell updates, not FLOPs.

Spatial discretization: \(N_{\mathrm{cell}}\propto f_{\max}^3\)

Temporal discretization: \(N_t\propto f_{\max}\)

Scaling:

\[ W_{\mathrm{prop}}\propto f_{\max}^4 \]

This is the core seismic HPC scaling law for uniform 3D discretization with fixed physical model size.

\[ W_{\mathrm{prop}} = K_xK_tf_{\max}^4 \]

\[ K_x=\frac{L_xL_yL_zn_\lambda^3}{v_{\min}^3}, \qquad K_t=\frac{T_{\mathrm{prop}}n_\lambda v_{\max}}{C_{\mathrm{CFL}}v_{\min}} \]

\[ \Rightarrow W_{\mathrm{prop}}= \frac{L_xL_yL_zT_{\mathrm{prop}}n_\lambda^4v_{\max}}{C_{\mathrm{CFL}}v_{\min}^4} f_{\max}^4 \]

\(W_{\mathrm{prop}}\) counts total cell updates and is therefore dimensionless; the apparent time units in the constants cancel when combined with \(f_{\max}^4\).

These constants assume explicit time stepping and uniform grid spacing determined by minimum velocity and a fixed points-per-wavelength rule. Adaptive and anisotropic grids break this constant.

FWI-Specific Sections

Section 9 — Full-Waveform Inversion (FWI) Cost Model

FWI builds directly on the same wave-equation propagation cost used in RTM. The key difference is that FWI requires multiple forward-equivalent propagations per shot per iteration, and repeats this across many iterations and frequency bands.

The correct way to think about FWI sizing is:

RTM gives you the cost of one propagation.
FWI multiplies that cost by how many forward-equivalent propagations are required by the inversion algorithm.

9.1 Baseline propagation work (from RTM)

From earlier sections:

\[ W_{\mathrm{fwd}}\propto N_{\mathrm{cell}}N_t \]

This represents one forward wave-equation solve (cell updates, not FLOPs).

All FWI cost builds on this baseline.

9.2 Decomposing FWI cost correctly

FWI cost must be separated into three independent components:

(1) Propagation count (algorithmic work)

How many forward-equivalent propagations are required per shot per iteration.

\[ \alpha_{\mathrm{prop}} = 2 + \alpha_{\mathrm{replay}} + N_{\mathrm{trial}} \]

Where:

\(2 = 1\) forward + \(1\) adjoint solve (mandatory)
\(\alpha_{\mathrm{replay}}\) = checkpoint/reconstruction overhead
typically 0.2–0.5 forward-equivalents
\(N_{\mathrm{trial}}\) = extra forward solves for line search / trial models
well-behaved line search: 0–0.5
unstable / early iterations: up to 1–2

(2) Kernel overhead (per time step cost)

Additional cost per time step relative to a baseline forward solve:

\[ \alpha_{\mathrm{kernel}} \ge 1 \]

Captures:

higher-order stencils
additional filtering or stabilization
preconditioning / regularization
multi-parameter updates
implementation inefficiencies

Typical planning range:

Acoustic: 1.0 – 1.2
More complex formulations: higher
example: 8th-order stencil versus 4th-order → ~1.1–1.2x
example: additional filtering / gradient conditioning → +5–10%

(3) Campaign structure (problem size)

FWI is repeated across:

shots
iterations
frequency bands

This is not a multiplier, but a summation over work:

\[ W_{\mathrm{campaign}}\propto \sum_b N_{\mathrm{shot},b}N_{\mathrm{iter},b}W_{\mathrm{shot,iter},b} \]

9.3 Per-shot, per-iteration cost

Combining the above:

\[ W_{\mathrm{shot,iter}}\propto \alpha_{\mathrm{FWI,iter}}N_{\mathrm{cell}}N_t \]

where:

\[ \alpha_{\mathrm{FWI,iter}} = \alpha_{\mathrm{prop}}\alpha_{\mathrm{kernel}} \]

9.4 Practical planning values (acoustic FWI)

For standard adjoint-state acoustic FWI:

Component	Typical Value
Forward + adjoint	2.0
Replay overhead	0.2 – 0.5
Trial models	0 – 1.0
Kernel overhead	1.0 – 1.2

Resulting in:

\[ \alpha_{\mathrm{FWI,iter}}\approx 2.5 \text{ to } 4.0 \]

Recommended planning values:

Optimistic: 2.5×
Realistic: 3.0×
Conservative: 3.5–4.0×

This is the single most useful number for early-stage compute sizing.

9.5 Frequency-band staging (multi-scale FWI)

FWI is typically performed in multiple frequency bands:

low frequency → coarse grid, long wavelengths
high frequency → fine grid, short wavelengths

Each band has its own:

\(f_{\max,b}\)
\(N_{\mathrm{cell},b}\)
\(N_{t,b}\)
\(N_{\mathrm{iter},b}\)
\(N_{\mathrm{shot},b}\)

Total work:

\[ W_{\mathrm{campaign}}\propto \sum_b N_{\mathrm{shot},b}N_{\mathrm{iter},b}\alpha_{\mathrm{FWI,iter},b}N_{\mathrm{cell},b}N_{t,b} \]

Important:

Banding is not a simple multiplier.
It changes the grid and timestep, so it changes base cost itself.

9.6 Simplified uniform approximation

If:

grid is fixed
propagation time is fixed
shot count per iteration is constant
method is unchanged

Then:

\[ W_{\mathrm{campaign}}\propto N_{\mathrm{shot,iter}}N_{\mathrm{iter}}\alpha_{\mathrm{FWI,iter}}N_{\mathrm{cell}}N_t \]

This is the most useful form for back-of-the-envelope cluster sizing.

9.7 What the FWI multiplier does (and does NOT do)

The FWI multiplier applies to propagation count, not discretization.

Correct:

\[ W \propto \alpha_{\mathrm{FWI}}\left(N_{\mathrm{cell}}N_t\right) \]

Incorrect (common mistake):

scaling \(N_{\mathrm{cell}}\) or \(N_t\) with FWI factors
treating \(\alpha_{\mathrm{FWI}}\) as changing physics resolution

FWI increases how many propagations you run, not how large each propagation is.

9.8 Runtime model (HPC interpretation)

Per shot, per iteration runtime:

\[ t_{\mathrm{shot,iter}}= \frac{\alpha_{\mathrm{FWI,iter}}N_{\mathrm{cell}}N_t}{P_{\mathrm{gpu,eff}}G_{\mathrm{shot}}} \]

Where:

\(P_{\mathrm{gpu,eff}}\) = effective throughput per GPU
\(G_{\mathrm{shot}}\) = GPUs per shot domain

Total campaign wall-clock depends on:

how many shots run concurrently
cluster size
scheduling efficiency

Section 10 — Extension to Elastic and Advanced Physics

FWI cost increases significantly with more complex physics.

10.1 Elastic modeling multiplier

Elastic wave equations introduce:

vector wavefields (multiple components)
stress/strain tensors
additional material parameters
higher memory traffic

Planning estimate:

\[ \beta_{\mathrm{elastic}}\sim 4 \text{ to } 10 \]

Relative to acoustic propagation.

Example:

2–3x from additional wavefield components
1.5–2x from extra memory traffic / reduced efficiency

10.2 Combined cost

\[ W_{\mathrm{FWI,elastic}}\propto \beta_{\mathrm{elastic}}\alpha_{\mathrm{FWI,iter}}N_{\mathrm{cell}}N_t \]

This is why elastic FWI is often an order of magnitude more expensive than acoustic.

10.3 Shot Concurrency

In a cluster environment, total time is not the sum of shots because of concurrency. Wall-time scaling is better represented as:

\[ \frac{N_{\mathrm{shots,total}}}{N_{\mathrm{shots,concurrent}}} \]

multiplied by per-shot runtime.

Section 11 — Memory Considerations for FWI

Memory is often the limiting factor in FWI, not compute.

11.1 First-order estimate

\[ \mathrm{Mem}\approx N_{\mathrm{cell}}\,n_{\mathrm{field}}\times(\text{bytes per value}\times\text{overhead}) \]

11.2 What actually consumes memory

FWI memory includes:

wavefield state (current + previous time steps)
material properties (velocity, density, etc.)
absorbing boundary regions
halo/ghost cells
gradient accumulation arrays
illumination / preconditioning terms
checkpoint storage (or recomputation buffers)

Checkpointing strategy directly affects:

\(\alpha_{\mathrm{replay}}\) (compute cost)
total memory footprint

This creates a fundamental trade-off between memory and compute in FWI. Increasing checkpointing reduces memory usage but increases \(\alpha_{\mathrm{replay}}\), directly increasing total compute cost.

11.3 Elastic memory scaling

Elastic models require more stored fields:

\[ \mathrm{Mem}_{\mathrm{elastic}}\approx 2\text{–}4\times \mathrm{Mem}_{\mathrm{acoustic}} \]

depending on formulation and implementation.

Bottom line

RTM defines the cost of one propagation
FWI multiplies that cost by propagation count and campaign structure
The most important sizing parameter is:

\[ \alpha_{\mathrm{FWI,iter}}\approx 2.5\text{–}4.0\quad (\text{acoustic}) \]