Interactive comment on “ Wave-equation based traveltime seismic tomography – Part 1 : Method ”

In this paper, we propose a wave-equation-based travel-time seismic tomography method with a detailed description of its step-by-step process. First, a linear relationship between the travel-time residual 1t = T − T syn and the relative velocity perturbation δc(x)/c(x) connected by a finite-frequency travel-time sensitivity kernel K(x) is theoretically derived using the adjoint method. To accurately calculate the travel-time residual 1t , two automatic arrivaltime picking techniques including the envelop energy ratio method and the combined ray and cross-correlation method are then developed to compute the arrival times T syn for synthetic seismograms. The arrival times T obs of observed seismograms are usually determined by manual hand picking in real applications. Travel-time sensitivity kernel K(x) is constructed by convolving a forward wavefield u(t,x) with an adjoint wavefield q(t,x). The calculations of synthetic seismograms and sensitivity kernels rely on forward modeling. To make it computationally feasible for tomographic problems involving a large number of seismic records, the forward problem is solved in the two-dimensional (2-D) vertical plane passing through the source and the receiver by a high-order central difference method. The final model is parameterized on 3-D regular grid (inversion) nodes with variable spacings, while model values on each 2-D forward modeling node are linearly interpolated by the values at its eight surrounding 3-D inversion grid nodes. Finally, the tomographic inverse problem is formulated as a regularized optimization problem, which can be iteratively solved by either the LSQR solver or a nonlinear conjugate-gradient method. To provide some insights into future 3-D tomographic inversions, Fréchet kernels for different seismic phases are also demonstrated in this study.


Introduction
Seismic tomography is one of the core methodologies for imaging the structural heterogeneity of the Earth's interior at a variety of scales.Ever since the pioneering works of Aki and Lee (1976) and Dziewonski et al. (1977), tomographic images have provided crucial information for the understanding of plate tectonics, volcanism, and geodynamics (e.g., Romanowicz, 1991;Liu and Gu, 2012;Zhao, 2012).Seismic tomography itself has also gone through significant development over the last 3 decades, including advances in both methodology and data usage.
In the first 2 decades of its history, seismic tomography was mainly based on the ray theory which assumes that seismic travel-time is determined by the structure along the infinitely thin ray path only.However, because of scattering, wave front healing, and other finite-frequency effects, seismic measurements (such as travel-time and amplitude), especially those made on broadband recordings, are sensitive to three-dimensional (3-D) structures off the ray path (e.g., Marquering et al., 1999;Dahlen et al., 2000;Tape et al., 2007).Ray theory is actually only valid when the scale length of the variation of material properties is much larger than the seismic wavelength (Rawlinson et al., 2010).To take into account the sensitivity to off-ray structures, finite-frequency tomography methods that construct 2-D or 3-D travel-time and P. Tong et al.: Part 1: Method amplitude sensitivity kernels are proposed, including those based on the paraxial approximation and dynamic ray tracing (e.g., Marquering et al., 1999;Dahlen et al., 2000;Tian et al., 2007;Tong et al., 2011) and those based on the normal mode theory (e.g., Zhao et al., 2000;Zhao and Jordan, 2006;To and Romanowicz, 2009).Tomographic models with improved resolutions were reported by recent finite-frequency tomographic studies (e.g., Montelli et al., 2004;Hung et al., 2004Hung et al., , 2011;;Gautier et al., 2008), although comparison to ray-based tomography remains controversial (de Hoop and van der Hilst, 2005a; Dahlen and Nolet, 2005;de Hoop and van der Hilst, 2005b).The underlying problem of the finitefrequency tomography based on paraxial approximation and dynamic ray tracing is that its kernel computation still relies on the ray theory, although it was devised to account for non-geometrical finite-frequency phenomena.In the last decade or so, rapid advances in high-performance computing and forward modeling techniques have made it feasible to solve the seismic wave equations in realistic Earth models by fully numerical methods (e.g., Komatitsch and Tromp, 2002a, b;Komatitsch et al., 2004;Operto et al., 2007).This opens the way to compute sensitivity kernels based on numerical simulation of the full seismic wavefield, avoiding the use of approximate theories (e.g., Liu andTromp, 2006, 2008;Fichtner et al., 2009).It also made the conceptual wave-equation-based seismic inversion methods such as the one presented by Tarantola (1984) feasible in realistic applications (Tape et al., 2009;Fichtner and Trampert, 2011;Zhu et al., 2012).To our best knowledge, adjoint tomography (Tromp et al., 2005;Fichtner et al., 2006), scattering integral methods (L.Zhao et al., 2005;Chen et al., 2007b), and full waveform inversion (FWI) in the frequency domain (Pratt and Shipp, 1999;Operto et al., 2006) are among the most popular tomographic techniques based upon solving full wave equations.FWI in the frequency domain has been mainly used in exploration problems (e.g., Virieux and Operto, 2009;Lee et al., 2010).Adjoint tomography and scattering integral tomography are closely related to each other, and a detailed comparison between adjoint tomography and scattering integral tomography can be found in Chen et al. (2007a).For brevity, we restrict our following discussions to adjoint tomography (Liu and Gu, 2012).
Adjoint tomography is currently one of the most popular and promising tomographic methods for resolving strongly varying structures.It takes advantage of full 3-D numerical simulations in forward modeling and sensitivity kernel calculation, often iteratively improving models through optimization techniques (Tromp et al., 2005;Tape et al., 2007).The use of full numerical simulations allows for the freedom of choosing either 1-D or 3-D reference models and accurate calculations of seismograms (Tong et al., 2014a, c) and sensitivity kernels for complex models (Liu andTromp, 2006, 2008).Using this approach, Tape et al. (2009Tape et al. ( , 2010) ) obtained a 3-D velocity model of the southern California crust that captures strong local heterogeneity up to ±30 %.Similarly, Zhu et al. (2012) generated a tomographic model of the European upper mantle based on adjoint tomography that reveals nice correlations between structural features and regional tectonics and dynamics.Similarly, Rickers et al. (2013) presented a 3-D S wave velocity model of the North Atlantic region, revealing structural features in unprecedented detail down to a depth of 1300 km.These successful applications reveal the promising future of next-generation seismic tomographic models based on full numerical simulations.However, the expensive computation cost associated with adjointtype of wave-equation-based tomographic methods, especially for 3-D problems, is still a major stumbling block to its wider application.For example, for a moderate number of three-component seismograms, 0.8 million and 2.3 million central processing unit hours were used to generate the tomographic models of the southern California crust and the European upper mantle, respectively (Tape et al., 2009;Zhu et al., 2012).The severity of the cost issue may be remedied when simulations are ported to the graphic processing unit (GPU) hardware (e.g., Komatitsch et al., 2010;Michéa and Komatitsch, 2010).However, ray-based tomographic methods remains the most popular and accessible techniques for mapping the heterogeneous structures of the Earth's interior (e.g., Li et al., 2008;Hung et al., 2011;Tong et al., 2012;Zhao et al., 2012).
As mentioned above, full 3-D numerical simulations in forward modeling and sensitivity kernel calculations guarantee the accuracy of synthetic seismograms and sensitivity kernels for 3-D complex models.However, they also make adjoint tomography computationally demanding and even unaffordable.To strike a balance between the computational efficiency and accuracy of full wave-equationbased tomographic methods, we propose conducting the forward modeling and sensitivity kernel calculation in the 2-D source-receiver vertical plane by a high-order finitedifference scheme.As we will show, if only travel-time measurements are considered, this 2-D approximation offers acceptable accuracy.Meanwhile, by numerically solving 2-D wave equations, finite-frequency effects such as wavefront healing are naturally taken into account, and the accuracy of sensitivity kernels in complex heterogeneous models is also improved.Although forward modelings are restricted to 2-D planes, we still plan to invert for 3-D tomographic models on a 3-D inversion grid.The 2-D forwarding modeling and the 3-D tomographic inversion are linked by expressing the model parameters (such as velocity perturbation) at each 2-D forward modeling grid node as a linear interpolation of the model parameters at its surrounding 3-D inversion grid nodes.We name the resultant 2-D-3-D tomographic method the wave-equation-based travel-time seismic tomography (WETST).Compared with the 3-D-3-D adjoint tomography based on the spectral element method (Tromp et al., 2005;Fichtner et al., 2006), this 2-D-3-D WETST based upon a 2-D finite-difference scheme is generally more computationally affordable.This also entails that WETST can be applied to tomographic inversions involving significant amounts of data based on even moderate computational resources.
Arrival time picking is another important issue for traveltime seismic tomography.Since the early era of ray-based seismic tomography, researchers have mainly relied on manually picked arrival times to map subsurface structures (e.g., Aki and Lee, 1976;Zhao et al., 1992).Arrival times are usually picked within time windows centered at the predicted travel times (Kennett and Engdah, 1991;Maggi et al., 2009).In recent years, increasing numbers of deployed broadband seismic arrays have resulted in the proliferation of seismic data.To increase efficiency and reduce the amount of manual labor and human errors in seismic data processing, fast and automatic travel-time picking algorithms with high accuracy are highly demanded in order to process vast amount of seismic recordings.Indeed, various techniques have been presented for the automatic/semi-automatic detecting and picking of the arrivals of different seismic phases, and the most widely used of which is the short-term-average (STA) to long-term-average (LTA) ratio method and its variations (e.g., Coppens, 1985;Baer and Kradolfer, 1987;Saari, 1991;Earle and Shearer, 1994;Han et al., 2010).Zhang et al. (2003) developed an automatic P wave arrival detection and picking algorithm based on the wavelet transform and Akaike information criteria.The cross-correlation method is another routinely used technique to obtain the travel-time anomalies of broadband pulses, which is especially favored by finitefrequency tomographic applications (e.g., Luo and Schuster, 1991;Dahlen et al., 2000;Tape et al., 2007).However, the quality of picked arrivals by these methods may vary in accuracy for data sets of different signal-to-noise ratio (SNR), and often only arrivals on low-noise seismograms can be effectively picked (Akram, 2011).Specifically, the validity of the correlation-based methods requires that the synthetic seismograms be reasonably similar to the observed seismograms.Less restrictive automatic arrival picking algorithms need to be further developed.In this study, we propose two different automatic arrival-time determination methods (Sect.3) that form an integral part of our wave-equation-based travel-time seismic tomography method.
When arrival-time data and sensitivity kernels are determined or computed, wave-equation-based travel-time seismic tomography is cast as an optimization problem.Model parameterization, regularization, and methods solving the optimization problem are discussed in Sects.4 and 5. Finally, examples of sensitivity kernels for different seismic waves are shown in Sect.6, which provide the basis for future tomographic inversions with various seismic phases.This paper focuses on theoretical derivation of the wave-equationbased travel-time seismic tomography.An application of the WETST method is presented in the second paper (Tong et al., 2014b).

Tomographic equation
In this section, we set up a linear relationship between the perturbation of arrival time and velocity perturbation in a reference model.

Travel-time residual
Travel-time seismic tomography generally inverts travel-time residuals of some seismic phases to map internal Earth structures.A travel-time residual t corresponding to the event occurred at x s , and the seismic station located at x r is written as, where the observed travel-time T obs is automatically or manually picked on recorded seismogram d(t), and the synthetic arrival time T syn is predicted based on a reference model.In geometrical ray theory, T syn is usually computed by integrating the slowness along a travelling path.
If the corresponding synthetic seismogram u(t) in the reference model is available, the travel-time residual t can be approximated by the cross-correlation technique (Dahlen et al., 2000): where and w(t) is a weight function over the time interval [0, T ] that can be used to isolate particular seismic phases (Tromp et al., 2005).The accuracy of this approximation improves as data and synthetic pulse becomes more similar, i.e., waveform perturbation d(t) − u(t) in Eq. (2) becomes tiny.Assuming infinitesimal perturbations, Eq. ( 2) becomes which is used further to set up the relationship between travel-time residual and velocity perturbation.

Relationship between travel-time residual and velocity perturbation
We consider seismic wave propagation in a two-dimensional (2-D) vertical plane which contains the source x s and the receiver x r .Within this plane, the seismic wavefield of a particular phase (without mode conversion) could be assumed where u(t, x) is the displacement field, c(x) is either the P or S wave velocity model, f (t) is the source time function for the point source at x s , and n is the normal direction of the boundary ∂S.For a perturbation δc(x) of the velocity model c(x), a consequent perturbed displacement wavefield δu(t, x) will be generated.In the framework of firstorder or Born approximation (e.g., Aki and Richards, 2002;Tromp et al., 2005;Tong et al., 2011), the perturbed wavefield δu(t, x) is the solution to the following wave equation with subsidiary conditions: (5) Multiplying an arbitrary test function q(t, x) on both sides of the first equation in Eq. ( 5) and then integrating in the surface S and the time interval [0, T ] gives us As travel-time residual δt in Eq. ( 3) is measured at the receiver location x r , Eq. ( 3) can be alternatively expressed as Summing up Eq. ( 7) and Eq. ( 8), using the second and third relationships in Eq. ( 5), and assuming that we can get a relationship Note that q(t, x) is not longer an arbitrary function.Instead, it satisfies Eq. ( 9) and represents a wavefield generated reversely in time by backpropagating the windowed and normalized velocity signal recorded at the receiver for the velocity model c(x), also known as the adjoint wavefield (e.g., Liu et al., 2004;Tromp et al., 2005;Fichtner et al., 2006).By defining the travel-time sensitivity kernel Eq. ( 10) provides a concise mathematical expression of the relationship between travel-time residual δt and relative velocity perturbation δc(x)/c(x) The travel-time kernel K(x; x r , x s ) is a weighted convolution of forward wavefield gradient ∇u(t, x) and the adjoint wavefield gradient ∇q(t, x), which can be obtained by solving Eqs. ( 4) and ( 9).Assuming small perturbations, we can set t in Eq. ( 1) equal to δt, and Eq. ( 12) becomes We call relation ( 13) the tomographic equation of waveequation-based travel-time seismic tomography.Once the observed arrival time T obs and synthetic arrival time T syn are measured or calculated, tomographic Eq. ( 13) can be inverted to infer the relative velocity perturbation δc(x)/c(x).
Before proceeding to the next section, we should keep in mind that T obs − T syn in Eq. ( 13) is theoretically needed to be a finite-frequency travel-time residual.Due to the finitefrequency and dispersive effects, seismic waves of different frequencies may have different sensitivities to the subsurface structures and arrive at different times.To measure physically meaningful travel-time residual T obs − T syn , we should first ensure that the observed data d(t) are filtered through a frequency band, which is generally selected to be consistent with the frequency spectrum of the synthetic seismogram s(t).Hereinafter, d(t) will represent the bandpass-filtered data.

Arrival time picking
We first discuss how to pick the arrival times of a particular seismic phase on observed and synthetic seismograms, i.e., T obs and T syn in Eq. ( 13).Since any errors in arrival times will distort the velocity anomalies, this step is crucial for travel-time seismic tomography.Although manual arrival-picking is time-consuming and labor intensive, it is still one of the most reliable and stable techniques to determine the arrival times of specific seismic phases on observed seismograms.For example, the first-arrivals picked by analysts of the combined seismic network in Japan (known as the JMA Unified Catalogue) have accuracies of about 0.1 s for P arrival and 0.1-0.2s for S arrival (Tong et al., 2012).Since there is not yet an automatic, accurate, and robust arrival we prefer to use manually picked arrival times T obs on observed seismograms for tomographic inversion purpose.
Regarding the arrival time T syn of a particular phase on synthetic seismograms, we could also use manual picking.However, extra subjective errors will be introduced into the travel-time residual t and further affect final tomographic results.Since synthetic seismograms are generated by numerical methods, the errors come mainly from numerical dispersion and can be controlled (but cannot be avoided) by employing accurate forward solver or fine meshes in forward numerical modeling.For low-noise seismograms, automatic time-picking schemes such as the STA/LTA method have been proven to be accurate and efficient for detecting the arrivals of different seismic phases (e.g., Saari, 1991;Han et al., 2010).In this study, we present a new envelope energy ratio method to pick up the arrival times on synthetic seismograms, which has a better performance than the STA/LTA method.On the other hand, if the starting model m 0 for a tomographic inversion has (or is near) a simple geometry where traveling paths can be easily and accurately determined, the combined ray and cross-correlation method developed later can be used to obtain arrival times of particular phases on synthetic seismograms.

Envelop energy ratio method
We give a brief introduction to the STA/LTA method and then discuss the envelop energy ratio (EER) method, which is an improved version of the STA/LTA algorithm.Let u(t) represent a seismogram with a dominant period of T 0 in the time window [0, T ], then the average energies in the short-and long-term windows preceding the time t are defined as where t ∈ [0, T ] and 0 < α < β are coefficients determining the lengths of the short and long-term time windows and should be determined by the user.Usually, α and β are chosen to be 2 ≤ α ≤ 3 and 5 ≤ β ≤ 10, respectively (Earle and Shearer, 1994).u(τ ) is assumed to be zero for τ < 0. Let us define the ratio and if the ratio R(t) first exceeds a user-defined threshold at t 0 , t 0 is considered to be the approximate onset time of the first arrival on the seismogram u(t) (Munro, 2004).It is also claimed that the maximum value of the derivative dR(t)/dt may be closer to the break time of the first arrival (Wong et al., 2009).
Into the envelop function of seismogram e(t) = |u(t) + iH [u(t)] |, where H [u(t)] denotes the Hilbert transform of u(t), can be also used in seismic data analysis (e.g., Baer and Kradolfer, 1987;Maggi et al., 2009).Since the envelope function remains positive at zero crossings among different phase arrivals, average energy taken from an envelope Panel (d) displays synthetic seismograms recorded by 51 stations with an equal spacing of 2 km at the surface, which are generated by an earthquake at the depth 12.0 km directly below the 26th station.The computational domain is a crust-over-mantle model.The crust has a thickness of 30.0 km and is homogeneous with the S wave velocity 3.2 km s −1 in the crust and 4.5 km s −1 in the mantle.In (c) and (d), the arrival times of S and SmS phases determined based on the STA/LTA and EER methods are labeled with brown and blue lines, respectively.The theoretical arrivals are marked by red lines.
function may be a better measure of the signal strength (Baer and Kradolfer, 1987;Earle and Shearer, 1994).Meanwhile, Wong et al. (2009) proposed the modified energy method which has excellent performance in determining the break time of first arrival.By incorporating the envelope function and the modified energy method (Wong et al., 2009), we define the following envelop energy ratio function to determine the arrival time of the considered seismic phase filtered by the window function w(t) where α ≥ 0 and β ≥ γ ≥ 1.The peak of the ratio function r(t) is very close to the onset time of the considered seismic phase.
To show the performance of the EER method, we apply it to shear-wave synthetic seismograms generated by an earthquake at 12.0 km depth in a homogeneous crust with a thickness of 30.0 km based on a high-order finite-difference method (in Appendix).Fifty-one surface stations with an equal spacing of 2.0 km are used to record seismograms.α = β = γ = 1.0 are chosen in Eq. ( 16). Figure 1a-c shows the S wave arrival-time picking using the STA/LTA and EER methods on the seismogram for trace number 26 (Fig. 1d).We can see that S arrival time determined by the EER method is very close to the theoretical arrival time with an error smaller than 0.05 s (Fig. 1b and c).For the STA/LTA method, the threshold value is set to be 1.0 × 10 −8 , and the obtained S arrival time is 0.24 s later than the theoretical arrival time (Fig. 1a and c).Note that an error of 0.24 s is unacceptable in travel-time inversion for local structures.We further show S and SmS arrival times on all 51 seismograms in Fig. 1d.For the direct S wave, results of both STA/LTA and EER methods are relatively close to the theoretical arrival times, with errors around 0.3 s and less than 0.1 s, respectively.However, for SmS phase, the STA/LTA algorithm is not able to give accurate estimates on the breaking times.In comparison, the EER method gives picked arrivals with accuracy similar to the direct S wave case, and 70 % of the errors are still less than 0.1 s.We have fixed all parameters for the STA/LTA and EER methods in picking the S and SmS arrival times.Actually, the accuracy of time picking on any single seismogram can be improved by slightly tuning some parameters, such as the threshold value for the STA/LTA method and the lengths of the time windows for both methods.We also find that the accuracy of the STA/LTA method is very sensitive to the threshold value and it is not an easy task to determine an appropriate threshold in practice.For the EER method, however, it is simple to locate the peak of the ratio function r(t).This implies that the EER method could be a better choice for arrival-time picking on synthetic seismograms.

Combined ray and cross-correlation method
Because of the nonlinearity of seismic inverse problems, seismic tomography usually relies on an iterative method to find the optimal model.If the starting model m 0 for travel-time seismic tomography is simple (e.g., a 1-D layered model) and traveling paths of particular phases can be easily traced, the arrival times T syn 0 of synthetics in m 0 can be accurately determined based on ray theory.Meanwhile, we may expect that synthetic seismograms in the (i + 1)th model m i+1 are reasonably similar to those in the ith model m i (i ≥ 0), and the arrival-time shift δt i+1,i of a particular phase in models m i+1 and m i can be calculated with high accuracy by maximizing the cross-correlation equation, where w(t) is the time window function used to isolate the considered phase (Liu et al., 2004).Consequently, the arrival time T syn i+1 of the synthetic seismogram in model m i+1 satisfies the following relationship, Since T syn 0 and δt j +1,j are calculated with ray theory and cross-correlation method, respectively, Eq. ( 18) is called the combined ray and cross-correlation method.
Continuing the numerical example shown in the section of the EER method, we intend to verify the validity of the combined ray and cross-correlation method.Let m 0 be the crust model with an S wave velocity of 3.2 km s −1 and a thickness of 30.0 km.S wave velocity in m 1 is assumed to be 3.456 km s −1 which has a perturbation of 8.0 % with respect to m 0 .Synthetic seismograms generated by an earthquake at 12.0 km depth are calculated and recorded by 51 stations at the surface in both m 0 and m 1 .For models m 0 and m 1 , theoretical arrival times of S and SmS phases at each station can be calculated based on the ray theory (see solid red circles and black squares in Fig. 2a).Based on Eq. ( 17), we also measure arrival shifts of S and SmS in m 1 from those in m 0 .Adding S and SmS arrival shifts to their corresponding arrival times for m 0 (solid black squares in Fig. 2a), we get the approximated arrival times of S and SmS in model m 1 (blue stars in Fig. 2a). Figure 2b shows the errors of the combined ray and cross-correlation method in determining the arrival times of S and SmS in model m 1 .It can be observed that the errors of direct S arrivals are less than 0.005 s, and 70 % errors of the SmS phases are smaller than 0.005 s with maximum error around about 0.175 s occurring at the 4th and 48th stations.Considering that the travel-time differences of the SmS phase in the two models are about 1.7 s at the two stations, these picking errors are relatively small.This numerical example suggests that the combined ray and cross-correlation method could serve as an efficient tool for high-accuracy arrival-time picking on synthetic seismograms in the iterative wave-equation-based tomographic inversions.

Model parameterization
Tomographic Eq. ( 13) needs invariably to be discretized for actual inversions (Nolet et al., 2005).This gives rise to model parameterization, which is an approximation to the true Earth structure.Model parameterization determines the accuracy of forward modeling and hence affects the final form of tomographic inversion results.Most commonly, functional approaches with a set of basis functions or an a prior functional form, such as cells and grid nodes, have been adopted to represent the Earth's structure (e.g., Dziewonski, 1984;Aki and Lee, 1976;Thurber, 1983).Each approach has its own advantages and drawbacks (e.g., Zhao, 2009;Rawlinson et al., 2010).To guarantee accurate computation of synthetic seismograms and travel-time kernels and to adapt to local variations in data coverage, we use two sets of grid nodes (i.e., forward modeling grid and inversion grid) to parameterize the Earth's structure for forward modeling and inversion algorithms in this study.

Forward modeling grid
As discussed in Sect.2, we need to solve wave Eqs. ( 4) and ( 9) to obtain synthetic seismogram u(t) and traveltime kernel K(x).Many numerical methods such as the staggered-grid finite-difference (FD) method (e.g., Virieux, 1984;Graves, 1996) and spectral-element method (Komatitsch and Tromp, 1999) are well suited for this kind of forward modeling.In this study, we choose a FD scheme called high-order central difference method (see Appendix) to conduct forward modeling.The prominent feature of this high-order central difference method is that it simultaneously computes the displacement u(t, x) and the spatial gradient field ∇u(t, x), making the computation of the travel-time kernel K(x) very straightforward.It is also easier to implement the high-order central difference method than the staggered-grid finite-difference (FD) method and spectralelement method.When sensitivity kernels are calculated by solving the full wave equation, there are spurious amplitudes in the immediate vicinity of the sources and receivers (Tape et al., 2007;Tong et al., 2014a).An efficient way of removing these spurious amplitudes is to smooth the travel-time kernel where σ x and σ z are the averaging scale lengths along x and z directions, respectively.Usually, σ x and σ z are chosen to be less than the main wavelength of the seismic waves (Tape et al., 2007).The smoothed travel-time kernel K(x, z) is given by i.e., the smoothed kernel value at a given point is obtained by averaging the unsmoothed kernel values at its neighboring points.
For the 2-D FD numerical simulation, the continuous area S is sampled by a set of n discrete nodes x i (i = 1, 2, • • •, n).By choosing a corresponding set of n basis functions L i (x) (i = 1, 2, • • •, n), the smoothed travel-time kernel K(x) and the relative velocity perturbation δc(x)/c(x) can be expanded into linear combinations of the basis functions as where K i and C i are the corresponding coefficients related to the basis function L i (x).Substituting Eq. ( 22) into Eq.( 13) results in the discrete form of the tomographic equation A general way to define a basis function L i (x) is to construct a local interpolation function on knot node x i and its neighbors.The possibility of different choices for the basis functions L i (x) (i = 1, • • •, n) has led to various inversion algorithms (Nolet et al., 2005).As the high-order central difference method discussed in this study simulates seismic wave propagation on a 2-D regular mesh, we assume the spatial increments along x and z directions are x and z, respectively.Let the knot node x i with a global index i be the grid node (x m , z l ) on the 2-D mesh.In this scenario, the simplest basis function may be the piecewise constant function × z l−1/2 , z l+1/2 ; 0, otherwise.
(23) And the coefficient of the unknown C i in Eq. () is However, the interpolation function with the basis functions (Eq.23) is not even continuous.To make the interpolation function continuous, we can use bilinear interpolation to fit the perturbation field δc(x)/c(x) and the travel-time kernel K(x).Bilinear interpolation performs linear interpolation first in one direction and then in the other direction.The basis function L i (x) for bilinear interpolation takes the following form Correspondingly, the coefficient for the unknown C i in Eq. ( 22) becomes where " • " denotes a two-step operation which first gets the entry-wise product of two matrices and then sums up all the entries of the produced matrix.Kernel values for global and local grids are linked by K i+pM+q = K m+p,l+q (M is the number of grid nodes along x direction, and p, q = −1, 0, 1).To have a smoother fitting function, we could further use bicubic interpolation, which is an extension of cubit interpolation on 2-D regular mesh.Actually, in the framework of piecewise constant interpolation (Eq.23), both bilinear interpolation and bicubic interpolation can be achieved by replacing C i 's coefficient x zK i in Eq. ( 24) with a weighted average value x z K i around the knot node x i and its neighbors such as shown in Eq. ( 26).Since we have previously smoothed the kernel by convolving it with a Gaussian function, using piecewise constant interpolation or bilinear interpolation to construct tomographic Eq. ( 22) is accurate enough for practical applications.Linear interpolation of the material properties on one forward modeling grid node (purple square) with material properties on its eight surrounding inversion grid nodes (purple circles).Forward modeling grid is a regular 2-D mesh with fixed grid intervals (formed by grey lines), and inversion grid is a 3-D regular mesh with variable grid intervals.Black star and black inverse triangle denote the locations of the earthquake and seismic station in the 2-D vertical plane, respectively.

Inversion grid
For the high-order central difference scheme, we assume that seismic waves propagate in 2-D vertical planes, and hence sensitivity kernels are restricted to the same 2-D planes.For a single pair of source x s and receiver x r , the forward grid nodes and equally the velocity model parameters C i are distributed on a 2-D regular mesh in Eq. ( 22).An additional set of grid nodes needs to be introduced to characterize the actual 3-D tomographic region.For simplicity, we use a regular grid with variable grid intervals to represent the final tomographic results, which has the advantage of allowing a fine grid for a target volume with dense data coverage (mostly depending on spatial distribution of source and receivers) to be embedded in coarse grid nodes.
To be consistent with the realistic application in the second paper, we directly set up the inversion grid in a geographical coordinate system (d, φ, λ), where d, φ, and λ are depth, latitude, and longitude, respectively.If the Cartesian coordinate system is adopted for the inversion grid, the following derivation procedure is almost the same.In a 3-D regular inversion grid, each forward modeling grid node x i (i = 1, 2, • • •, n) is located within a cube formed by eight inversion grid nodes (Fig. 3).It is natural and straightforward to use trilinear interpolation between the eight grid nodes (Zhao et al., 1992).Note that the Cartesian coordinate x i should be transformed into geographical coordinate x i prior to locating it in a cube.If assuming that x i is located within the cube formed by (d r+j 1 , φ p+j 2 , λ q+j 3 ) (j 1 , j 2 , j 3 = 0, 1; 1 ≤ r + j 1 ≤ R; 1 ≤ p + j 2 ≤ P ; 1 ≤ q + j 3 ≤ Q; R, P , Q are the numbers of inversion grid nodes along depth, latitude, and longitude, respectively), the unknown velocity model parameter C i corresponding to x i can be expressed as a linear combination of the parameters X r+j 1 ,p+j 2 ,q+j 3 (j 1 , j 2 , j 3 = 0, 1) at the eight inversion grid nodes: and it defines a continuously varying velocity perturbation field δc(x)/c(x).Note that the velocity field c(x) itself can be discontinuous.Substituting Eq. ( 27) into Eq.( 22) gives the tomographic equation on the inversion grid where a r,p,q is the coefficient for the unknown X r,p,q and pre-determined, the accuracy of which relies on not only the accurate calculation of the travel-time kernel K(x) but also on the choice of the inversion grid.For the convenience of this discussion, we convert the 3-D array index (r, p, q) of the inversion grid to 1-D index n = (r−1)P Q+(q−1)P +q (1 ≤ n ≤ N = RP Q).Tomographic Eq. ( 28) can be rewritten as for a single pair of source x s and receiver x r , which relates the travel-time residual T obs − T syn linearly to the unknown relative velocity perturbation X n (1 ≤ n ≤ N) on the inversion grid.

Regularization and inversion method
With a significant increase in both quantity and quality of seismic data from the proliferation of dense seismic arrays, an increasing number of seismic data will be involved in seismic tomography, which may result in higher-resolution tomographic models.Certainly, more data will increase the complexity of the seismic inverse problem.
When M seismic measurements are used to explore the subsurface structure, M tomographic equations take the form of Eq. ( 29 the problem b = AX is always ill-posed (either because of non-uniqueness or non-existence of X), the general way to solve it is to seek a solution that minimizes the following regularized objective function where C d and C m are the a prior data and model covariance matrix which reflect the uncertainties in the data and the initial model (Rawlinson et al., 2010), D is a derivative smoothing operator for model vector X, and and η are the damping parameter and smoothing parameter, respectively (e.g., Tarantola, 2005;Li et al., 2008;Rawlinson et al., 2010).The last two terms on the right-hand side of Eq. ( 30) are regularization terms, which are included to improve the conditioning of the inverse problem b = AX and are designed to give preference to solutions with desirable properties (Aster et al., 2012): damping favors a result that is close to the reference model, while smoothing reduces the differences between adjacent nodes and thus produces smooth model variations (Li et al., 2006).Generally speaking, the objective function (Eq.30) tries to strike a balance between how well the solution satisfies the data, the variations of the solution from the reference model, and the smoothness of the solution model.Calculating the gradient (Fréchet derivative) of the objective function χ (X) is often a key step in finding an optimal solution to the minimization problem (Eq.30) (Rawlinson et al., 2010).Here, the Fréchet derivative of the objective function χ (X) can be expressed as Based on the Fréchet derivative ∂χ (X)/∂X, we describe two different approaches to solve the optimization (minimization) problem (Eq.30).

LSQR solver
The minimizer X of Eq. ( 30) satisfies ∂χ ( X)/∂X = 0 and formally can be expressed as Clearly, to explicitly obtain X we need to invert an N × N matrix.There are various methods available to fulfil this goal, such as LU decomposition, singular value decomposition (SVD), and conjugate-gradient types of methods such as LSQR algorithm.Among these methods, LSQR algorithm may be one of the most efficient and widely used methods to solve a linear system, especially when N is very large (Paige and Saunders, 1982).Additionally, the minimization problem (Eq.30) is equivalent to solving the following linear system in a least square sense and the application of LSQR or SVD to Eq. ( 33) will give the same solution as that of Eq. ( 32) (Rawlinson et al., 2010).
Once we obtain a perturbation velocity field X, the velocity model can be updated from the current velocity model on inversion grid, C, to C+ X.Because of the nonlinearity of the inverse problem, further iteration may be needed to update the velocity model until the objective function χ (X) drops below a tolerance level.

Nonlinear conjugate gradient method
Once we have the Fréchet derivative of the objective function computed in Eq. ( 31), instead of inverting the matrix in Eq. ( 32), we can alternatively use a nonlinear conjugategradient method to iteratively improve the model (e.g., Fletcher and Reeves, 1964;Tromp et al., 2005).Previous studies have shown the feasibility and efficiency of this nonlinear conjugate-gradient method in recovering seismic properties of the Earth's interior (e.g., Tape et al., 2007Tape et al., , 2009;;Zhu et al., 2012).Here, we summarize the step-by-step process of this nonlinear conjugate-gradient method, which starts from k = 0 (Tape et al., 2007;Kim et al., 2011): 1. Calculate the objective function χ (X k ), compute the gradient g k = ∂χ /∂X k .
2. Compute the model update direction p k = −g k + β k p k−1 .For the first iteration k = 0, set β 0 = 0 and p 0 = −g 0 ; otherwise calculate β k based on the equation 3. Determine the step length λ k in the model update direction: and then λ k is given by ≤ , the tolerance level, then X k+1 is the optimal perturbation model; otherwise reiterate from the first step (1) with k + 1.
For the current model m k which has a perturbation X k from the starting model m 0 , we can rewrite the gradient of the objective function as where A k and b k are respectively the Fréchet matrix and travel-time residuals in the kth model.The first term on the right-hand side of Eq. ( 37) is actually the sum of all traveltime kernels (negatively) weighted by their corresponding travel-time residuals.That is to say, if no damping and smoothing operations are applied, the gradient (Eq.37) is simply the sum of all weighted individual travel-time kernels.Since operators C −1 d , C −1 m , and D remain constant throughout the whole process, to update the model from m k to m k+1 we only need to compute the Fréchet matrix and travel-time residuals in model m k .This is different from the approach using the LSQR algorithm as a linear system is solved at each iteration.Generally speaking, the model update with the LSQR algorithm may be larger than the nonlinear conjugategradient method and the LSQR approach probably requires fewer iterations.
Besides the LSQR solver and the nonlinear conjugate-gradient method, the Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm (Nocedal , 1980) is another good candidate for solving the optimization problem of Eq. 30 (Luo, 2012).The L-BFGS algorithm is a quasi-Newton method that only involves vector operations and storage and is therefore well suited for optimization problems with a large number of variables.More details on L-BFGS can be found in Nocedal (1980) and Luo (2012).

Numerical examples
As discussed in Sect.5, computing travel-time sensitivity kernel or the Fréchet derivative of the objective function is one of the key components of wave-equation-based traveltime seismic tomography.In this section, we show examples of Fréchet kernel for one earthquake.These examples provide insights into sensitivities of various seismic phases and the future applications of wave-equation-based travel-time seismic tomography involving tens of thousands of seismic record.
A two-layer S wave velocity model with the Moho discontinuity at a depth of 30.0 km is used as a reference model.The size of the model is 100 km × 50 km.S wave velocities in the crust and the mantle are 3.2 and 4.5 km s −1 , respectively.The "true" model is the same two-layer S wave veloc-ity model but with a −5.0 % low-velocity anomaly (red box in Fig. 5) and a +5.0 % high-velocity anomaly (blue box in Fig. 5) included in the mid-crust.An earthquake is placed at the horizontal distance x = 50.0km and the depth of 12.0 km with the dominant frequency of the Gaussian source time function at 1.0 Hz.There are 51 stations equally spaced on the surface with an interval of 2.0 km.The high-order central difference method is used as the forward solver.Seismograms recorded at x = 14.0 km and x = 86.0km on the surface are shown in Fig. 4a and 4b, respectively.Three main phases can be observed in these seismograms, including the direct S wave, the Moho reflected phase SmS and the surface reflected wave sSmS, which provide complementary information on the crustal structures.For example, D. Zhao et al. (2005) used S, SmS, and sSmS arrivals to conduct crustal tomography in the 1992 Landers earthquake area with a raybased tomographic method.Here, we compute Fréchet kernels for the three seismic phases.Because only sensitivity kernels are computed and no inversion is conducted, the two regularization terms at the right-hand side of Eq. ( 37) are not taken into account in this section.
For seismograms recorded at x = 14.0 km (Fig. 4a) the direct S wave and the Moho reflected SmS phase for the true model arrive closely following the corresponding phases in the reference model.As shown in Fig. 5a and b, the geometrical ray paths of both phases are partially within the lowvelocity zone, and therefore it is reasonable to have delayed S and SmS arrivals in the true model.For the sSmS phase, its geometrical ray path does not pass through the low-velocity zone but its first Fresnel zone partially coincides with the low-velocity anomaly (Fig. 4c).Due to the influence of the low-velocity zone, the arrival time of sSmS is delayed by 0.0025 s obtained through the cross-correlation calculation.The Fréchet kernels for S, SmS, and sSmS are shown in Fig. 5a-c, which closely follows their corresponding geometry ray paths (indicated by dashed lines).The positive Fréchet kernel values in the first Fresnel zones indicate that a reduction of velocity within these regions will result in the reduction of objective function χ.Figures 4b and 5d-f are for the case when seismic waves travel through a high-velocity region in the true model and seismograms are recorded at the station x = 86.0km.Negative Fréchet kernel values in the first Fresnel zones suggest that an increase of velocity in this region of the reference model can reduce the objective function χ.
The Fréchet kernels displayed in Fig. 5a-f are associated with a particular seismic phase at one seismic station, i.e., the individual kernels.Of course one seismic record does not well constrain the subsurface heterogeneous structure.With the 51 stations on the surface, we can compute the Fréchet kernel for one seismic phase defined at all seismic stations, shown in Fig. 5g-i The crust has a thickness of 30 km containing one low and one high-velocity zone in the true model respect to the reference model (Fig. 5).The earthquake is located at x = 50 km at the depth of 12 km.
the bulk part of the kernels.The values of these three kernels are positive within the low-velocity zone and negative within the high-velocity area, which indicates that updating the velocity model in the opposite direction −∂χ (X)/∂X would reduce the objective function χ .We can further define the objective function χ as the sum of S, SmS, and sSmS phases at all seismic stations.The corresponding Fréchet kernel is shown in Fig. 6, which is the sum of the kernels in Fig. 5g-i.It can be observed that kernel values at the anomalous regions are not prominent in Fig. 5g-i, but are dominant in Fig. 6.This suggests that we may simultaneously use different seismic phase data to highlight anomalous structures in future studies.For demonstration purpose, we only worked with one event in this part.To increase the illumination, more seismic events should be included.Once the Fréchet kernels for all events and phases are computed, the LSQR solver or the nonlinear conjugate-gradient method can be used to iteratively improve the velocity model.

Discussion and conclusions
Wave-equation-based travel-time seismic tomography (WETST) involves 2-D forward modeling and 3-D tomographic inversion.Considering adjoint tomography based on 3-D spectral-element method or other 3-D forward modeling techniques as an approach for "3-D-3-D" seismic tomography (e.g., Tromp et al., 2005;Tape et al., 2009;Zhu et al., 2012), WETST can be viewed as a "2-D-3-D" adjoint tomography method.From the computation point of view, 2-D forward modeling with a high-order central difference scheme is computationally efficient and can be conducted on most single PCs.This makes it possible to handle large seismic data sets with WETST.Actually, increasing data amount and data coverage is the best way to improve the resolution of tomographic results, and sometimes may compensate for the approximations in the tomography technique itself.For example, it is well known that one main drawback of ray theory is that it does not consider the influence of off-ray structures (Dahlen et al., 2000); however, a good data set with a dense and even distribution of ray paths can greatly improve the resolution of ray tomography (Tong et al., 2011).A similar problem for the 2-D approximation in WETST is its ignorance of the off-plane influence on seismic arrivals.To what extent this approximation is valid and how it affects the final inversion results should be further investigated.However, by taking advantage of the computational efficiency of 2-D forward modeling, we may be able to reduce the effect of the 2-D approximation by increased data coverage in real applications.WETST only uses travel-time information for two main reasons.First, travel time is quasi-linear with respect to variations in the velocity structures, which greatly assists the convergence of gradient-based inversion methods as presented in Sect. 5. Second, compared with fitting waveforms, it is much easier to only predict the arrival times of particular phases on synthetic seismograms computed through 2-D forward modeling.The envelop energy method or the combined ray and cross-correlation method presented in this study can be easily implemented to pick the arrival times on synthetic seismograms.Additionally, the finite-frequency travel-time residuals T obs − T syn in previous finite-frequency tomography studies (e.g., Hung et al., 2001Hung et al., , 2004Hung et al., , 2011) ) were determined with the cross-correlation travel-time measurement (Dahlen et al., 2000).However, in this study, the residuals are obtained after manually picking the onset times T obs on observed seismograms (bandpass filtered) and calculating T syn on synthetic seismograms with the energy envelop method or the combined ray and cross-correlation method.Considering that the derivations of the travel-time sensitivity kernel in Eq. ( 11) and previous finite-frequency sensitivity kernels (e.g., Dahlen et al., 2000;Zhao et al., 2000;Tromp et al., 2005;Fichtner et al., 2006) are mainly based on the Born approximation, which requires that the reference velocity model c(x) for synthetic seismogram s(t) is very If 3-D finite-frequency effects need to be taken into account and full waveform fitting is required, we suggest the use of 3-D-3-D tomographic techniques such as adjoint tomography based on the spectral-element method (Tromp et al., 2005;Fichtner et al., 2006).In this case, WETST may be used to construct the starting models for 3-D-3-D seismic tomography.The hybrid approach could help reduce the total computational costs and speed up the convergence rate of the inverse algorithm as a "closer" initial model is used.How much efficiency can be gained depends on the problem itself, and a comparison of computation time costs between the 2-D-3-D and 3-D-3-D methods is presented in the companion paper (Tong et al., 2014b).Considering that ray-based seismic tomography methods are still the most prevalent tomographic methods and WETST has the advantage of more accurately computed sensitivity kernels, WETST may be a potentially useful compromise for 3-D tomographic inversions before the wider application of 3-D-3-D seismic tomography in the near future.
Forward modeling in WETST discussed in this paper is based on solving a 2-D acoustic wave equation in the Cartesian coordinates.If the source and the receiver are far apart and the curvature of the Earth cannot be neglected, the acoustic wave equation in Cartesian coordinates needs to be transformed into geographical coordinates, which may be necessary for the use of teleseismic data.Currently, WETST cannot use converted seismic phases such as P -S or simultaneously determine the P wave and S wave velocity structures in tomographic inversions.But these two goals can be achieved by replacing the 2-D acoustic wave equation with the 2-D elastic wave equation.Additionally, a regular grid with variable grid intervals is suggested to represent the final tomographic results in this paper.To automatically adapt the inversion grid to the data distribution, adaptive mesh using Delaunay triangles and Voronoi polyhedra can be alternatively adopted (e.g., Sambridge and Rawlinson, 2005;Zhang and Thurber, 2005;Rawlinson et al., 2010).Source inversion and discontinuity (such as the depth of Moho) determination may also be considered in the future (e.g., Liu and Tromp, 2008;Tong et al., 2014a).
In addition, WETST can include not only direct first arrivals (P wave and S wave) but also later reflected (e.g., P mP , SmS, pP mP , sSmS) and refracted (P n, Sn) phases as the ray-based tomographic methods do (e.g., D. Zhao et al., 1992Zhao et al., , 2005;;Xia et al., 2007).Different seismic phases have different traveling paths and are influenced by structural anomalies differently.The combined use of various seismic phases can increase the illumination of the subsurface structures (Figs. 5 and 6).Given that WETST conducts forward modeling in 2-D vertical planes with an efficient high-order central difference scheme, it is possible to include a large set of seismic data in tomographic inversion.Two different inversion algorithms, LSQR solver and the nonlinear conjugate-gradient method, can be used to find the optimal tomographic results efficiently.Since individual kernels are computed, it is also straightforward and efficient to examine resolution in both data and model spaces (Luo, 2012).In a companion paper, we will use WETST to explore the heterogeneous structures beneath the 1992 Landers earthquake (M w 7.3) area.

Figure 1 .
Figure 1. S arrival time-picking using (a) the STA/LTA method and (b) the envelop energy ratio (EER) method for the synthetic seismogram in (c), which is the seismogram for trace number 26 in (d).Panel (d) displays synthetic seismograms recorded by 51 stations with an equal spacing of 2 km at the surface, which are generated by an earthquake at the depth 12.0 km directly below the 26th station.The computational domain is a crust-over-mantle model.The crust has a thickness of 30.0 km and is homogeneous with the S wave velocity 3.2 km s −1 in the crust and 4.5 km s −1 in the mantle.In (c) and (d), the arrival times of S and SmS phases determined based on the STA/LTA and EER methods are labeled with brown and blue lines, respectively.The theoretical arrivals are marked by red lines.

Figure 2 .
Figure2. S and SmS arrival times on seismograms computed in two models, m 0 and m 1 (a).Numerical computation in m 0 is the same as the example shown in Fig.1.S wave velocity in m 1 has a perturbation of 8 % with respect to m 0 .Black squares, red circles, and blue stars correspond to theoretical arrival times in m 0 , theoretical arrival times in m 1 , and arrival times m 1 computed by using the combined ray and cross-correlation method, respectively.Errors of S (red circles) and SmS (blue circles) arrival times determined by using the combined ray and cross-correlation method (b).

Figure 3 .
Figure 3. Linear interpolation of the material properties on one forward modeling grid node (purple square) with material properties on its eight surrounding inversion grid nodes (purple circles).Forward modeling grid is a regular 2-D mesh with fixed grid intervals (formed by grey lines), and inversion grid is a 3-D regular mesh with variable grid intervals.Black star and black inverse triangle denote the locations of the earthquake and seismic station in the 2-D vertical plane, respectively.
) and form a linear system b = AX at each iteration, where b = [b m ] M×1 and b m = T obs m − T syn m is the iterative travel-time residual vector, A = [a m,n ] M×N is the Fréchet or Jacobin matrix calculated in the current iterative model and X = [X n ] N ×1 is the unknown model vector.Since P. Tong et al.: Part 1: Method the perturbation model X k+1

Figure 4 .
Figure 4. Seismograms recorded by the stations located at (a) x = 14 km and (b) x = 86 km on the surface.Seismograms computed in the reference model are shown as black curves, and those computed in the true model are illustrated by red curves.The computational domain is a crust-over-mantle model with a size of 100 km × 50 km.The crust has a thickness of 30 km containing one low and one high-velocity zone in the true model respect to the reference model (Fig.5).The earthquake is located at x = 50 km at the depth of 12 km.

Figure 5 .Figure 6 .
Figure 5. Fréchet kernels corresponding to direct S (a, d, g), SmS (b, e, h), and sSmS (c, f, i) phases.Panels (a)-(c) are computed only using seismograms recorded at the station x = 14 km.These seismograms are influenced by the low-velocity zone in the red box in the true model.Panels (d)-(f) are only related to seismograms recorded at the station x = 86 km, which are influenced by the high-velocity zone in the blue box in the true model.Panels (g)-(i) are computed for all 51 stations on the surface.To use a uniform color bar indicated at the bottom for all panels, each kernel is amplified by multiplying the number at the bottom left of the corresponding panel.