Input Parameters

The SpecFWAT employs a YAML file to define the parameters for forward simulation, adjoint simulation, post-processing, and optimization. This file is typically named fwat_params.yml and is located in the DATA directory of your SpecFWAT project. Below shows section of the fwat_params.yml file:

Download a full template of fwat_params.yml here.

`NOISE` Section

This section defines the noise parameters for the forward simulation and measurements of adjoint sources.

YAML


NOISE:
  MESH_PAR_FILE: DATA/meshfem3D_files/Mesh_Par_file # Mesh file
  RCOMPS: ['Z'] # Components of the receiver
  CH_CODE: BX # Channel code
  NSTEP: 4500 # Number of time steps
  DT: 0.06 # Time step for the noise data
  IMEAS: 5
  SHORT_P: [6, 10, 20] # Short period of filters
  LONG_P: [15, 25, 40] # Long period of filters
  GROUPVEL_MIN: [2.3, 2.3, 2.5] # Approximate minimum group velocity
  GROUPVEL_MAX: [3.2, 3.5, 4.0] # Approximate maximum group velocity
  ADJ_SRC_NORM: False # Set the following to .true. to normalize adjoint sources across different bands
  USE_NEAR_OFFSET: False # Set the following to false if use only data > 1 average wavelength
  SUPPRESS_EGF: False # Set to .false. when the data are cross-correlation functions
  PRECOND_TYPE: 1 # 1: inner product of acceleration
  SIGMA_H: 5000
  SIGMA_V: 5000

Mesh Parameters

MESH_PAR_FILE: Path to the mesh parameter file with the same format as meshfem3D. SpecFWAT uses internal mesh generator of Specfem3D to create the mesh based on the model parameters, but allow users to specify the mesh file rathar than a fixed mesh file DATA/meshfem3D_files/Mesh_Par_file of Specfem3D.

Solver Parameters

RCOMPS: Components of the receiver, For tomography of isotropic media, it is usually set to ['Z'] for vertical component.
CH_CODE: Channel code, which is used to identify the channel in the data. It is usually set to BX for broadband data.
NSTEP: Number of time steps for the forward/adjoint simulation. It overrides the NSTEP in DATA/Par_file, which allow users to specify the number of time steps for different data types.
DT: Time step for the noise data, It overrides the DT in DATA/Par_file, which allow users to specify the time step for different data types.

Adjoint source

IMEAS: Option of adjoint source measurements. It is used to determine the objective function. See measure_adj manual for more details of adjoint source measurements. The default value is 5, which means the adjoint source are time shift based on cross-correlation.
SHORT_P: Short cut-off period of filters for the adjoint source measurements. It is a list of values, which are used to filter the data in different frequency bands.
LONG_P: Long cut-off period of filters for the adjoint source measurements. It is a list of values with the same length as SHORT_P.
GROUPVEL_MIN: Approximate minimum group velocity to determine the time window for the adjoint source measurements. It is a list of values with the same length as SHORT_P.
GROUPVEL_MAX: Approximate maximum group velocity to determine the time window for the adjoint source measurements. It is a list of values with the same length as SHORT_P.

💡

The time window for the adjoint source measurements is determined by the Approximated group velocity and the cut-off periods. The time window $[T_{start}, T_{end}]$ is calculated as follows:

T_{start} = \frac{\Delta}{U_{max}}-0.5 T_{long}

T_{end} = \frac{\Delta}{U_{min}}+0.5 T_{long}

ADJ_SRC_NORM: Set to True to normalize the adjoint sources across different bands.
USE_NEAR_OFFSET: Set to False if you want to use only data with distance greater than 0.5 * wavelength in each period band. It is usually set to False in practice.
SUPPRESS_EGF: Whether calculate difference to obtain empirical Green’s functions (EGF). Set to False if the data are cross-correlation functions.

💡

For checkerboard test, please note that the data are EGF and the SUPPRESS_EGF should be set to True.

Post-processing parameters of noise data

PRECOND_TYPE: Type of preconditioning for gradient. 0 ( $P_0$ ) or 1 ( $P_1$ ) are accepted value based on inner product of acceleration, The default value is 1.

P_0 = \left| \sum_{i=1}^{N} \int \partial_t^2 u(x, t) \partial_t^2 u^{\dag}(x, T-t) dt \right|

P_1 = \sum_{i=1}^{N} \left| \int \partial_t^2 u(x, t) \partial_t^2 u^{\dag}(x, T-t) dt \right|

SIGMA_H: Horizontal smoothing length in meter of gradient.
SIGMA_V: Vertical smoothing length in meter of gradient.

`TELE` Section

This section defines the parameters for the teleseismic FWAT.

YAML


TELE:
  MESH_PAR_FILE: DATA/meshfem3D_files/Mesh_Par_file # Mesh file
  TELE_TYPE: 2 # 1: teleseismic data, 2: receiver function, 3: Teleseismic cross-convolution
  RCOMPS: ['Z', 'R'] # Components of the receiver
  CH_CODE: BX # Channel code
  SAVE_FK: True # Save the FK wavefield
  COMPRESS_LEVEL: 0 # Compression level of the saved FK wavefield in hdf5 format
  SUPPRESS_STF: True # Whether to convolve source time function when forward simulating teleseismic data
  NSTEP: 2500 # Number of time steps
  DT: 0.025 # Time step for the teleseismic data
  SHORT_P: [1] # Short period of filters
  LONG_P: [20] # Long period of filters
  TIME_WIN: [-5, 25] # Time window for the teleseismic data
  PRECOND_TYPE: 3 # 2: abs(Z); 3: root squared z
  SIGMA_H: 5000
  SIGMA_V: 5000
  RF:
    F0: [1.5] # Gaussian width for the RF
    MAXIT: 200 # Maximum number of iterations
    MINDERR: 0.001 # Minimum residual error when the RF converges
    TSHIFT: 5.0 # Time shift before P

Adjoint Source

The parameters in the TELE section with the same name as in the NOISE section are similar, but with different values. The following parameters are specific to the TELE section:

TELE_TYPE: Type of objective function for teleseismic data. It can be
- 1 for teleseismic waveform difference (Wang et al., 2021 ).
- 2 for receiver function difference (Xu et al., 2023 ).
- 3 for teleseismic cross-convolution difference.
SAVE_FK: Set to True to save the FK wavefield for teleseismic data. The FK wavefield will be checked in the LOCAL_PATH (Set in DATA/Par_file) before running the forward simulation. If the FK wavefield is not found, it will be generated and saved in the LOCAL_PATH/FK_{event_name} directory, otherwise, it will be loaded from the existing FK wavefield directory to reduce time consumption. It will be useful for GPU acceleration.
COMPRESS_LEVEL: Compression level of the saved FK wavefield in hdf5 format. same as compression_opts in h5py.File. It can be set to 0 (no compression), 1 (fastest), 2, 3, 4, 5, 6, 7, or 9 (best compression). The default value is 0.

💡

The FK wavefield take high disk storage space. Please check the disk space before running the forward simulation.
For CPU parallelization, the FK simulation will be very fast due to so many processors. Thus, it is recommended to set SAVE_FK to False for CPU parallelization.

SUPPRESS_STF: Whether to convolve source time function when forward simulating teleseismic data. If set to False, the source time function will be convolved, The STF files named STF_{event_name}.sac will be prepared in the src_rec directory.

💡

The time 0 in the STF should correspond to the onset time of the earthquake. An example is shown below:

Source: SCARDEC Source Time Functions Database

TIME_WIN: Time window for the teleseismic data. It is a list of two values, which are the start and end time before and after direct P arrival.
PRECOND_TYPE: Type of preconditioning for gradient. 2 ( $P_2$ ) or 3 ( $P_3$ ) are accepted value based on Z-precondition The default value is 3.

P_2 = |x_z|

P_3 = |x_z|^{1/2}

Receiver function parameters

For Receiver function adjoint tomography, the synthetic receiver function is calculated based on the iterative deconvolution method. The following parameters are used to control the receiver function inversion:

RF.F0: Gaussian width for the receiver function. It is a list of values, which are used to filter the data in different frequency bands.
RF.MAXIT: Maximum number of iterations for the receiver function inversion.
RF.MINDERR: Minimum residual error when the receiver function converges.
RF.TSHIFT: Time shift before P arrival for the receiver function inversion. It is used to align the receiver function with the direct P arrival.

`ADJOINT_SOURCE` Section

SpecFWAT employs ForAdjoint to measure adjoint sources of ambient noise and local earthquakes, which provides built-in measurement methods including:

cross-correlation time shift and amplitude ratio
waveform difference
multi-taper phase shift and amplitude ratio
Cross-convolution waveform difference
Receiver function difference
Exponentiated phase misfit

YAML


ADJOINT_SOURCE:
  ITAPER_TYPE: 1 # 1: Hanning, 2: Hamming, 3: Cosine, 4: Cosine P10:
  TAPER_PERCENTAGE: 0.3
  CC:
    TSHIFT_LIM: [-5.0, 5.0]
    DLNA_LIM: [-1.5, 1.5]
    CC_MIN: 0.7
    DT_SIGMA_MIN: 1.0
    DLNA_SIGMA_MIN: 0.5
  MT:
    NUM_TAPER: 5
    MT_NW: 4.0
    PHASE_STEP: 1.5
    TRANSFUNC_WATERLEVEL: 1e-10
    WATER_THRESHOLD: 0.02
    DT_FAC: 2.0
    ERR_FAC: 2.5
    DT_MAX_SCALE: 3.5
    MIN_CYCLE_IN_WINDOW: 3
    USE_MT_ERROR: False
  ENV:
    WTR_ENV: 0.2

Adjoint Source Tapering

ITAPER_TYPE: Type of tapering window for the adjoint source measurements. It can be
- 1 for Hanning window.
- 2 for Hamming window.
- 3 for Cosine window.
- 4 for Cosine P10 window.
TAPER_PERCENTAGE: Percentage of the tapering window. It is used to determine the length of the tapering window based on the total length of the time window.

Cross-correlation (CC) Measurement Parameters

CC.TSHIFT_LIM: Time shift limits in seconds for cross-correlation measurement.
CC.DLNA_LIM: Logarithmic amplitude ratio limits for cross-correlation measurement.
CC.CC_MIN: Minimum cross-correlation coefficient to accept a measurement.
CC.DT_SIGMA_MIN: Minimum standard deviation of time shift measurement in seconds.
CC.DLNA_SIGMA_MIN: Minimum standard deviation of logarithmic amplitude ratio.

Multi-taper (MT) Measurement Parameters

MT.NUM_TAPER: Number of tapers to use in multi-taper measurement.
MT.MT_NW: bin width of multitapers (nw*df is the half bandwidth of multitapers in frequency domain, typical values are 2.5, 3., 3.5, 4.0)
MT.PHASE_STEP: maximum step for cycle skip correction .
MT.TRANSFUNC_WATERLEVEL: Waterlevel for the transfer function in multi-taper measurement.
MT.WATER_THRESHOLD: The triggering value to stop the search. If the spectra is larger than 10*water_threshold it will trigger the search again, works like the heating thermostat.
MT.DT_FAC: percentage of wave period at which measurement range is too large and MTM reverts to CCTM misfit.
MT.ERR_FAC: percentage of error at which error is too large.
MT.DT_MAX_SCALE: Used to calculate maximum allowable time shift
MT.MIN_CYCLE_IN_WINDOW: Minimum number of cycles in the time window for multi-taper measurement.
MT.USE_MT_ERROR: Whether to use multi-taper error for normalization.

Envelope (ENV) Measurement Parameters

ENV.WTR_ENV: Waterlevel for envelope measurement.

`MODEL_GRID` Section

SpecFWAT update model parameters on a regular grid, which is easily to take sum of gradients of different data sets with different mesh. The size of each regular grid should be small to ensure the accuracy of gradient information. We recommended set the size of each regular grid to be at least half of element size of the mesh.

YAML


MODEL_GRID:
  REGULAR_GRID_MIN_COORD: [833950, -44274.0, -80000]
  REGULAR_GRID_INTERVAL: [2000, 2000, 1000]
  REGULAR_GRID_SIZE: [104, 48, 81]

REGULAR_GRID_MIN_COORD: Minimum coordinate of the regular grid in meter. It is a list of three values, which are the minimum x, y, and z coordinates of the regular grid.
REGULAR_GRID_INTERVAL: Interval of the regular grid in meter. It is a list of three values, which are the interval in x, y, and z directions.
REGULAR_GRID_SIZE: Number of regular grid. It is a list of three values, which are the number of grid points in x, y, and z directions.

💡

the region of the regular grid must cover the whole mesh region. the minimum coordinate of the regular grid should be less than the minimum coordinate of the mesh, and the maximum coordinate of the regular grid should be greater than the maximum coordinate of the mesh.

`POSTPROC` Section

This section defines the common parameters for the post-processing of the gradient after the adjoint simulation.

YAML


POSTPROC:
  INV_TYPE: [False, True] # Inversion type of noise and teleseismic data
  JOINT_WEIGHT: [0.5, 0.5]
  TAPER_H_SUPPRESS: 5000
  TAPER_H_BUFFER: 10000
  TAPER_V_SUPPRESS: 0
  TAPER_V_BUFFER: 0
  IS_PRECOND: True

INV_TYPE: Inversion type of ambient noise and teleseismic data. It is a list of two boolean values, which are used to determine whether to perform inversion for noise and teleseismic data, respectively. The first value is for noise data, and the second value is for teleseismic data.
JOINT_WEIGHT: Weight between ambient noise and teleseismic data. It is a list of two values, which are used to weight the gradients of ambient noise and teleseismic data, respectively.
TAPER_H_SUPPRESS: Horizontal tapering length in meter to suppress the gradient at the margin of the mesh.
TAPER_H_BUFFER: Horizontal tapering length in meter to buffer the gradient at the margin of the mesh.
TAPER_V_SUPPRESS: Vertical tapering length in meter to suppress the gradient at the margin of the mesh.
TAPER_V_BUFFER: Vertical tapering length in meter to buffer the gradient at the margin of the mesh.

💡

The taper on gradient is necessary for teleseismic FWAT to avoid updating margin of the model. It guarantees SEM domain of updated model coupling with the FK domain with iterations.

IS_PRECOND: Whether directly apply preconditioning to the gradient. If set to False, the preconditioner will be saved and apply on L-BFGS as rescale vector (Modrak and Tromp 2016 ).

`MODEL_UPDATE` Section

This section defines the parameters for the model update based on the gradient and optimization method.

YAML


MODEL_UPDATE:
  INIT_MODEL_PATH: initial_model.h5
  MODEL_TYPE: 1 # 1: vp,vs,rho; 2: L,Gc,Gs
  OPT_METHOD: 2 # Optimization method, 1: SD; 2: LBFGS 
  ITER_START: 0
  LBFGS_M_STORE: 5
  MAX_SLEN: 0.02 # Maximum step length
  MAX_SHRINK: 0.618 # Maximum shrink factor
  MAX_SUB_ITER: 6
  DO_LS: True # Do line search
  C1: 0.1
  VPVS_RATIO_RANGE: [1.3, 2.5] # Min and max limitation of Vp/Vs ratio

INIT_MODEL_PATH: Path to the initial model file. It should be a HDF5 file with the same format as DATA/tomo_files/tomography_model.h5, but the size could be different from the MODEL_GRID section. An interpolation will be performed to interpolate INIT_MODEL_PATH to MODEL_GRID size and save to optimize/model_M00.h5.
MODEL_TYPE: Type of model parameters to be updated. It can be
- 1 for updating $Vp$ , $Vs$ , and $\rho$ .
- 2 for updating azimuthal anisotropic parameters ( $Vp$ , $Vs$ , $\rho$ , $Gc'$ , and $Gs'$ ).
OPT_METHOD: Optimization method for model update. It can be
- 1 for steepest descent (SD).
- 2 for L-BFGS.
- 3 for conjugate gradient (CG).
ITER_START: Starting iteration number of L-BFGS optimization.
LBFGS_M_STORE: Number of previous iterations to store in L-BFGS optimization.
MAX_SLEN: Maximum step length for model update.
MAX_SHRINK: Maximum shrink factor for model update. It is used to control the step length reduce when the model update does not decrease the objective function.
MAX_SUB_ITER: Maximum number of sub-iterations for model update.
DO_LS: Whether to perform line search for model update. If set to True, the line search will be performed to find the optimal step length for model update, otherwise, the step length will be fixed to MAX_SLEN.
C1: Constant for the Armijo condition in line search. It is used to determine whether the step length is sufficient to decrease the objective function.

OUTPUT Section

This section controls the verbose output of the SpecFWAT workflow

YAML


OUTPUT:
  IS_OUTPUT_PREPROC: True # Output preprocessed data
  IS_OUTPUT_ADJ_SRC: False # Output adjoint sources
  IS_OUTPUT_EVENT_KERNEL: True # Output kernels
  IS_OUTPUT_SUM_KERNEL: True # Output sum of kernels
  IS_OUTPUT_HESS_INV: False # Output inverse Hessian when precond_type = 1
  IS_OUTPUT_DIRECTION: False

IS_OUTPUT_PREPROC: Whether to output the preprocessed data in SAC format. If set to True, the preprocessed data will be saved in the solver/{model}.{simu_type}/{event_name}/OUTPUT_FILES/ directory.
IS_OUTPUT_ADJ_SRC: Whether to output the adjoint sources in SAC format. If set to True, the adjoint sources will be saved in the solver/{model}.{simu_type}/{event_name}/SEM/ directory.
IS_OUTPUT_EVENT_KERNEL: Whether to output the kernels for each event. If set to True, the kernels will be kept in the solver/{model}.{simu_type}/{event_name}/EKERNEL/ directory, otherwise, the kernels will be deleted after the post-processing step.
IS_OUTPUT_SUM_KERNEL: Whether to output the sum of kernels after post-processing. If set to True, the sum of kernels will be saved in the optimize/SUM_KERNEL_{model}/ directory.
IS_OUTPUT_HESS_INV: Whether to output the inverse Hessian. If set to True, the inverse Hessian will be saved in the optimize/SUM_KERNEL_{model}/ directory.
IS_OUTPUT_DIRECTION: Whether to output the final descent direction of each iteration. If set to True, the direction will be saved in the optimize/ directory.

Input Parameters

NOISE Section

Mesh Parameters

Solver Parameters

Adjoint source

Post-processing parameters of noise data

TELE Section

Adjoint Source

Receiver function parameters

ADJOINT_SOURCE Section

Adjoint Source Tapering

Cross-correlation (CC) Measurement Parameters

Multi-taper (MT) Measurement Parameters

Envelope (ENV) Measurement Parameters

MODEL_GRID Section

POSTPROC Section

MODEL_UPDATE Section

`NOISE` Section

`TELE` Section

`ADJOINT_SOURCE` Section

`MODEL_GRID` Section

`POSTPROC` Section

`MODEL_UPDATE` Section