© Author(s) 2013. CC Attribution 3.0 License. Solid Earth Open Access

Abstract. In a linear ill-posed inverse problem, the regularisation parameter (damping) controls the balance between minimising both the residual data misfit and the model norm. Poor knowledge of data uncertainties often makes the selection of damping rather arbitrary. To go beyond that subjectivity, an objective rationale for the choice of damping is presented, which is based on the coherency of delay-time estimates in different frequency bands. Our method is tailored to the problem of global multiple-frequency tomography (MFT), using a data set of 287 078 S-wave delay times measured in five frequency bands (10, 15, 22, 34, and 51 s central periods). Whereas for each ray path the delay-time estimates should vary coherently from one period to the other, the noise most likely is not coherent. Thus, the lack of coherency of the information in different frequency bands is exploited, using an analogy with the cross-validation method, to identify models dominated by noise. In addition, a sharp change of behaviour of the model e∞-norm, as the damping becomes lower than a threshold value, is interpreted as the signature of data noise starting to significantly pollute at least one model component. Models with damping larger than this threshold are diagnosed as being constructed with poor data exploitation. Finally, a preferred model is selected from the remaining range of permitted model solutions. This choice is quasi-objective in terms of model interpretation, as the selected model shows a high degree of similarity with almost all other permitted models (correlation superior to 98% up to spherical harmonic degree 80). The obtained tomographic model is displayed in the mid lower-mantle (660–1910 km depth), and is shown to be compatible with three other recent global shear-velocity models. A wider application of the presented rationale should permit us to converge towards more objective seismic imaging of Earth's mantle.


Introduction
Until recently, Ray Theory (RT) formed the backbone of global seismic tomography, mainly because of its simplicity and computational efficiency (e.g., Grand et al., 1997;Van der Hilst et al., 1997;Ritsema et al., 1999Ritsema et al., , 2011Fukao et al., 2001;Debayle et al., 2005). RT is based on the approximation that seismic waves have an infinite frequency, or a zero wavelength. It assumes 25 that body-wave traveltimes, which represent onset times, are only dependent upon the Earth structure along geometrical ray paths. In reality, body-waves observed on broad-band seismographs have wavelengths ranging from 10 to 1000 km, or even more. Digital instrumentation has led to the use of cross-correlation to determine traveltimes. However, the cross-correlation time window allows low frequencies to influence the measurements. Therefore, RT can break down when used with cross-30 correlation measurements for imaging small scale heterogeneities, since diffraction effects make traveltimes (and amplitudes) dependent on a 3-D region around the ray path. A recent focus has been to take into account the Finite-Frequency (FF) behaviour of body-waves. This has been supported by continued systematic evidence for body-wave traveltime dispersion due to various forms of scattering (e.g., Hung et al., 2004;Yang et al., 2006;Sigloch and Nolet, 2006;Zaroli et al., 2010). 35 To better constrain the structure of the Earth's interior, new theoretical developments on seismic wave propagation have emerged in recent years, and received increasing attention in tomography. Dahlen et al. (2000) developed an FF approach (hereafter referred to as Banana-Doughnut Theory, BDT) that is efficient enough to be applied to large scale problems and a wide range of frequencies.
It is based on a ray-Born approximation that is much faster than the mode summation approaches 40 proposed earlier by Marquering et al. (1998) and Zhao and Jordan (1998). We refer the reader to Sect. 1.6 of Nolet (2008) for more complete references to work preceding Dahlen et al. (2000). To go beyond tomographic limitations from ray-based FF kernels of BDT (e.g., Nolet, 2008), alternatives are available (e.g., Zhao et al., 2000;Tromp et al., 2005;Nissen-Meyer et al., 2007;Zhao and Chevrot, 2011a,b). The most promising ones are based on numerical techniques (e.g. , Komatitsch Spetzler, 2006;Van der Hilst and de Hoop, 2005;Boschi et al., 2006). It seems pertinent that tomographers ask what are the consequences on seismic models of the subjectivity inherent to the choice of regularisation parameter. Linear tomographic inverse problems, d = Gm, are usually ill-posed 60 and require a subjective degree of regularisation to deal with data errors and stabilise the solution.
One usually uses Tikhonov regularisation (Tikhonov, 1963), which consists in solving the minimisation problem m λ = arg min(||d − Gm|| 2 2 + λ 2 ||m|| 2 2 ), where λ is a real regularisation parameter (damping) to be chosen with care. A convenient and widely used graphical tool (e.g., Aster et al., 2012) for setting the damping λ is to analyse the trade-off curve (L-curve) between the model norm 65 (||m|| 2 2 , squared Euclidean norm) and the data misfit (χ 2 red , reduced chi-square). In a tomographic experiment, data errors are a mix of observational and modelling errors. Ideally, if the statistics of data errors would perfectly be known, the optimal solution would be reached for χ 2 red 1 near the bend of the L-curve. In practice this is never met, as data errors are usually just best guesses, and one faces the dilemma to choose a solution around the L-curve's corner as a best compromise between 70 minimising both the residual data misfit and the model norm. Therefore, the selection of an optimally regularised solution is to a large extent arbitrary -more akin to an art form -which can lead to different model interpretations. To converge towards more reliable tomographic models, one would prefer to lessen the inherent subjectivity of the damping choice. The goal of this study is to present a new approach aimed at objectifying the choice of regularisation parameter in the case of global 75 Multiple-Frequency Tomography (MFT, equivalent to BDT applied in multiple-frequency bands, as named by Sigloch et al., 2008). We will illustrate our approach using a global dataset of 287 078 shear-wave delay-times measured at 10, 15, 22, 34 and 51 s central periods . To summarise, our approach will consist of the following three parts: (i) identifying models with poor data exploitation (over-damped); (ii) identifying models dominated by noise (under-damped); (iii) 80 selecting a preferred model among the remaining ones. We will show that this final choice is of no consequence in terms of model interpretation, and thus is quasi-objective. The preferred model we obtain this way will be displayed in mid lower-mantle (660-1910 km depth), where our data coverage is at its highest. We will briefly discuss its major structural features, and show its compatibility with three other global shear-velocity models from latest generation.

85
2 Global multiple-frequency S-wave tomography 2.1 A global dataset of multiple-frequency S-wave delay-times We use a globally distributed dataset of 287 078 S and SS delay-times measured at 10, 15, 22, 34 and 51 s periods , as summarised in Table 1. Both single-phase (S, SS) and two-phase interference (S + sS, SS + sSS) delays are considered. Measurements are performed on the transverse 90 component by cross-correlating the observed and synthetic waveforms. Synthetics are computed with the WKBJ method (Chapman, 1978) within the spherical IASP91 1-D shear-wave velocity model (Kennett and Engdahl, 1991) extended with PREM's 1-D attenuation (q 0 ) model (Dziewonski and Anderson, 1981). The global 2 • × 2 • crustal model CRUST2.0 (Bassin et al., 2000) is used for incorporating most of crustal reverberations, on receiver side, in the synthetic waveforms. As 95 seismic waves propagate through the Earth, an attenuating medium, a physical dispersion correction should be applied to cross-correlation delay-times. Zaroli et al. (2010) show that "the globally averaged dispersion observed for S and SS delay-times favour a frequency-dependent 1-D attenuation model q(ω) ∝ q 0 × ω −α , with an α value of 0.2 for S and 0.1 for SS ". Thus, S and SS delays are corrected for physical dispersion by injecting those α values in Eq. (16) of Zaroli et al. (2010).

100
After correction for physical dispersion due to 1-D attenuation q(ω) and crustal reverberations, the data exhibit a residual dispersion of the order of 1-2 s in the 10-51 s period range. Zaroli et al. (2010) suggest that this residual dispersion is partly related to seismic heterogeneities in the mantle.
For instance, they show that wavefront-healing phenomenon (e.g., Gudmundson, 1997) is clearly observed for S-waves having passed through negative V S anomalies. Tian et al. (2011) show that 105 the inclusion of additional dispersion due to 3-D attenuation structure has little influence on S-wave models. Sigloch et al. (2008) show that one can neglect it for P-waves, and, similarly, Savage et al. (2010) observe only a small effect on seismic wave travel times. Thus, we neglect the role of 3-D variations of attenuation to explain the residual dispersion in our data. Bolton and Masters (2001) extensively discuss the assignment of quantitative errors in the case of a global S wave dataset.

110
Following their analysis, we aim at identifying the separate contributions to the total data variance σ 2 T in our dataset. We have: σ 2 T = σ 2 3-D + σ 2 X + σ 2 N , where σ 2 3-D is due to 3-D seismic heterogeneities, σ 2 X is due to earthquake location errors, and σ 2 N is attributed to measurement errors. Bolton and Masters (2001) estimate σ X for S waves to be 1.6-2.5 s, assuming a typical depth uncertainty of about 10 km, at epicentral distances of about 70 • , and for mislocation vectors of length 10-20 km. In this 115 study, we have attributed to each datum d i a constant value for the source uncertainty σ X = 2.5 s.
Earthquake mislocation and origin time effects are not simultaneously estimated for each datum during the inversion. In addition, we assume that the source uncertainty is the same for an S-wave measured at either 10, 15, 22, 34, or 51 s period, so that taking it into account would not change the main results of this study. Moreover, Masters et al. (2000) report that their tomographic results 120 of S wave inversions vary little when source effects are included. As for the contribution of the measurement error, we have attributed to each datum d i an individual error {σ N } i computed with Eq. (9) of Zaroli et al. (2010). It varies between 0.1 and 3.7 s, though its average value is 0.49, 0.57, 0.67, 0.73, 1.08 s for data subsets at 10, 15, 22, 34, 51 s periods, respectively. Thus, the total data

Setting up the inverse problem
At each period a waveform is influenced by a weighted average of the Earth's mantle through its corresponding 3-D sensitivity kernel. In principle, measuring the delay-time of a seismic phase at several periods should increase the number of independent information in the inverse problem, and lead to improved tomographic imaging. The general form of the MFT inverse problem is: In appendices A and B, we expand on the computation of: (i) the data-driven model parameterisation, (ii) the "analytical" ray-based FF kernels. For more details, including for the computation of G, we refer the reader to Zaroli (2010).

140
In order to solve this linear inverse problem, we assume that the prior covariance matrices of the data, C d , and of the model parameters, C m , follow Gaussian probability functions. Thus, we may obtain the maximum-likelihood estimate of the model solution by minimising (e.g., Tarantola and Nercessian, 1984;Tarantola, 1987): It leads to solving a system of normal equations: For each source-receiver pair, uncertainties associated with delays should partially be correlated at different periods. For instance, because of common errors on source location and origin time. So that the matrix C d should contain off-diagonal terms. However, for simplicity, we use a diagonal data 150 covariance matrix of the form: C d = diag(σ 2 i ) 1≤i≤N . Tomographic inverse problems are usually illposed and require a degree of regularisation to deal with data errors and stabilise the solution. The model covariance, C m , represents our prior expectation of how model parameters are correlated, e.g.
high correlation for nearby parameters. We use a simple model covariance of the form: C m = σ 2 m I M , where I M is the identity matrix of size M ×M . The parameter σ m influences the solution by damping 155 the model norm and allows us to regularise the inverse problem. Because FF kernels integrate over a volume as wide as several Fresnel zones, a simple regularisation parameter (damping) λ = 1/σ m is sufficient, in our experience, to obtain smoothed solutions. Setting one can write (Eq. 2) as: To solve the weighted damped least-squares problem (Eq. 3), we use LSQR (Paige and Saunders, 1982), an iterative row action method that converges to solution m λ = arg min(||d − G m|| 2 2 + λ 2 ||m|| 2 2 ). Care has been taken to ensure LSQR convergence with sufficient iterations. After a first inversion with zero damping, outliers with data misfit deviations larger than ±3σ were rejected for subsequent inversions. Total number of surviving data is N = 287 078 (cf., Table 1). Let define It is a direct measure of the data misfit, in which we weight the misfits inversely with their standard errors. In the perfect case that data are on average satisfied with a misfit of one standard deviation, we find χ 2 N , so that it is convenient to define the reduced chi square χ 2 red = χ 2 /N . As always, the crucial choice of damping λ controls the balance between both minimising the model norm ||m|| 2 2 and data misfit χ 2 red . In the 170 following, we present an objective rationale for the choice of λ that is tailored to the MFT problem. 2 ) and the data misfit (e.g. measured by χ 2 red ). One usually gets a characteristic L-shaped curve, with a (not often distinct) corner separating "vertical" and "horizontal" parts of the curve. Figure 1 shows the curve of trade-off between ||MB λ || 2 2 and χ 2 red (MB λ , d 10,15,22,34,51 ). One sees that the model with data misfit χ 2 red (MB λ , d 10,15,22,34,51 ) = 1 is probably under-damped as it lies on the quasi vertical leg of the L-curve. This indicates that this model is dominated by noise, as 190 will be confirmed in Sect. 3.2, and is a reminder of our imperfect knowledge of the statistics of data errors (σ i ). Our estimates of σ i appear to be slightly under-estimated, which may be caused by non-Gaussian distributions of source and measurement uncertainties. It is possible to estimate observational errors of teleseismic waves using the concept of summary rays (e.g., Morelli and Dziewonski, 1987;Gudmundsson et al., 1990). However, the problem is larger for MFT, since very little ex-195 perience so far is at hand on the observational error sources (e.g., crustal reverberations) that affect narrow-band cross-correlation estimates. Hansen and O'Leary (1993) suggest that the optimal damp-ing could be the one maximising the L-curve's curvature (maximum curvature criterion). Because of some intrinsic arbitrariness in this criterion, it should be applied with caution (e.g., Boschi et al., 2006). For instance, the L-curve has to be a plot of un-dimensional quantities, by scaling both data 200 and model to prior uncertainty. We compute the curvature as: d 10,15,22,34,51 ). We normalise η and ρ by their extrema values. The prime and double prime mean the first and second derivative with respect to λ, respectively. Figure 1 shows the plot of curvature κ in function of data misfit, superimposed to the L-curve plot. The multi-band model maximising κ is shown with a black diamond marker, and is 205 named MB κmax . In Sect. 3.2, we will identify MB κmax as part of models dominated by noise. Hansen and O'Leary (1993) also suggest that the maximum curvature criterion may yield an under-damped solution if the L-curve is characterised by a "smooth" corner, as it is in our case. Finally, there is a wide range of models with acceptable data misfit around the L-curve bend. Picking one of them is to a large extent arbitrary. This motivates us to develop, for the MFT problem, a more objective 210 approach to determine an "optimal range" of damped solutions. It will primarily consist in identifying models with "poor data exploitation" or "dominated by noise", as summarised in Fig. 1 and explained below.

Identifying multi-band models with poor data exploitation
We aim at identifying over-damped models for which the structural information of data is poorly ex- Fig. 1. To render this smooth corner sharper, one may analyse a new L-curve based on the model ∞ -norm: ||m λ || ∞ = max(|{m λ } j |) 1≤j≤M . We further refer to this new trade-off curve as the L ∞curve. It simply is an alternative evaluation of the size of regularised models plotted in function of data misfit. Figure 2 shows this curve of trade-off between ||MB λ || ∞ and χ 2 red (MB λ , d 10,15,22,34,51 ). As expected, the corner of the L ∞ -curve has been sharpened by the ∞ -norm, but it is interesting to 225 see that two regimes are now apparent on both sides of a breaking point. Let λ poor be the damping value for which this change of regime occurs, and MB poor be the corresponding multi-band model. Figure 2 shows that as the damping λ decreases down to the critical value λ poor , the ∞ -norm (i.e. the largest model component) increases very gently. One sees that as soon as the damping gets smaller than this critical value, then at least one model component starts to behave differently: its amplitude 230 is suddenly more increased while the data misfit continues to decrease. Since the ∞ -norm is equal to the model parameter with largest amplitude, it easily detects the presence of at least one noisy model component with unrealistic high amplitude. In contrast, Fig. 1 shows a smooth increase of the 2 -norm of models slightly less damped than the critical model MB poor . It is due to the fact that the 2 -norm reflects an integrated information over all model parameters, and is less sensitive to a sin-235 gle noisy parameter. Therefore, we interpret λ poor as being the smallest damping value for which no model component is significantly contaminated by noise. The damping range corresponding to poor data exploitation is then defined by: λ ≥ λ poor (where poor stands for "poor data exploitation").
We have visually checked that tomographic models within this range of damping do not seem to be deteriorated by noise. Finally, as one aims at exploiting the most of our dataset (e.g. structural 240 dispersion due to wavefront-healing), it may be rewarding to consider models with a better fit to data than for MB poor , which motivates us to identify models dominated by noise.

Identifying multi-band models dominated by noise
We aim at identifying a class of multi-band models referred to as dominated by noise. The key idea is to take advantage of the expected mutual coherency of delay-times measured in different 245 frequency bands (10, 15, 22, 34, 51 s central periods). That is, for each ray path, the structural information should be coherent from one period to the other, but the noise information most likely not. In the following, using a form of cross-validation method, we show that the lack of coherency of the information in different frequency bands can be exploited to identify models dominated by noise.

Identifying single-band models dominated by noise
The first step is to identify a range of single-band models SB λ 34 (inversion of 34 s data subset, d 34 , with damping λ ) which are "dominated by noise". Our approach consists in testing how well singleband models SB λ 34 do fit the 10 s data subset (d 10 ), as damping varies. The philosophy of this approach is somewhat analogous to the cross-validation (CV) method, in that we aim at estimating 255 the fit of a model (SB λ 34 ) to a data set (d 10 ) that is similar but not identical to the data (d 34 ) that were used to derive the model. However, our data partitioning approach markedly differs from the randomly subdivision of data inherent in CV. Partition into frequency bands, based on similar sensitivities to mantle structure, seems to make more sense in the context of our multi-frequency data set. 260 Figure 3 shows the corresponding curve of trade-off between the model norm, ||SB λ 34 || 2 2 , and its 10 s data misfit, χ 2 red (SB λ 34 , d 10 ). This trade-off curve is not L-shaped anymore, but is characterised by a remarkable "reversal". Let λ noise be the damping value corresponding to this reversal, and SB noise 34 be the associated model. One sees that the more the damping λ is decreased beyond the critical value λ noise , the more the single-band model gets complex (increase of ||SB λ 34 || 2 2 ) while its 265 10 s data misfit gets poorer (increase of χ 2 red (SB λ 34 , d 10 )). We interpret as dominated by noise the range of single-band models SB λ 34 with damping λ ≤ λ noise (where noise stands for "dominated by noise"). Indeed, for large values of damping λ one expects smooth single-band models SB λ 34 giving similar data predictions at all periods. As one decreases the damping λ into the domain where we are fitting the noise more than the structure-imposed trend in the 34 s data, one expects that single-band models SB λ 34 will fail to satisfy the 10 s data, because the 10 and 34 s data noise should not be coherent. Therefore, we consider SB noise 34 as the most damped single-band model among those obviously dominated by noise. As a consequence, the range of single-band models SB λ 34 with damping "slightly larger" than λ noise must somehow be significantly polluted by noise. This may intuitively be understood when looking at inset Fig. 3. The more the damping decreases 275 towards λ noise , the more the trade-off curve becomes vertical, i.e. decreasing the 10 s data misfit costs very much in terms of model norm increase. A priori, one ignores what is the most expensive cost that is affordable, so that one cannot objectively identify this range of models with significant noise contamination, i.e. before the noise predominancy regime. However, its existence will be taken into account for the choice of preferred model (Sect. 3.3).

Inferring multi-band models dominated by noise from single-band models
We aim at identifying multi-band models dominated by noise using our knowledge of the previously identified range of single-band models dominated by noise. The situation is illustrated in Fig. 4.
We know that the single-band model SB noise 34 , which results from an inversion of the 34 s data set only, is dominated by noise. Therefore, it means that one should not try to fit the 34 s data set (d 34 ) 285 better than does the model SB noise 34 . This implies that an acceptable multi-band model, which results from an inversion of data at all periods including at 34 s, should not give a better fit to d 34 than SB noise 34 . Let MB noise be the multi-band model (with damping λ noise ) defined as having the "same 34 s data misfit" as the single-band model SB noise 34 , that is: χ 2 red (MB noise , d 34 ) = χ 2 red (SB noise 34 , d 34 ). Thus, multi-band models MB λ with 34 s data misfit such as χ 2 red (MB λ , d 34 ) ≤ χ 2 red (MB noise , d 34 ) 290 are identified as dominated by noise.
For completeness, the coherency analysis should be applied to all combinations of single-band models and fit of data subsets. The relatively low number of 10, 15, and 51 s data (cf. Table 1) restricted us to analyse single-band models derived from either 22 or 34 s data. To do this, we imposed the same source-receiver geometry for multi-and single-band (22 or 34 s) models, so that differences 295 between models cannot be due to this extraneous factor. We analysed single-band models SB λ 34 (or SB λ 22 ) with their fit to the 10, 15, 22 (or 34), and 51 s data subsets, respectively. Corresponding L-curve reversals were reached for different values of damping (λ ), which mainly reflects, at each period, the data noise level and the sensitivity kernel. We observed that the analysis of models derived from the 34 s data, and their fit to the 10 s data subset, lead to identify the narrowest range of 300 acceptable multi-band models. Thus, for clarity reason, Sect. 3.2 has been focussed on this case.

Quasi-objective choice of preferred multi-band model
We have identified an optimal range (cf. Fig. 1) of multi-band models MB λ with damping λ noise < λ < λ poor . The next step consists in choosing a "preferred" model MB pref among those candidates, i.e. in between the two extreme models MB noise and MB poor as illustrated in Fig. 5a.

305
Picking a particular solution within this range is a matter of compromise between exploiting most of structural information (small damping) and minoring noise influence (large damping). One cannot avoid some subjectivity concerning this ultimate choice. Thus, one needs to estimate if this degree of subjectivity can still lead to signifiant model differences in terms of structural interpretation. We do this by looking at the level of correlation between the candidate solutions. . Figure 5b, d shows that every model within the optimal range has a correlation coefficient larger than ∼ 91 %, up to degree l = 80, with respect to the two extreme models. This indicates that this optimal range contains highly similar models, in terms of shear-velocity anomaly patterns. Thus, selecting any of them would lead to a similar structural interpretation. However, it seems to us wiser to choose the preferred model 325 MB pref as better correlated to MB poor (not significantly influenced by noise) than to MB noise (dominated by noise). To support this choice, note that models slightly more damped than MB noise , though not quantitatively identified, are certainly polluted by noise as suggested at the end of Sect. 3.2.1. Figure 5c shows that our preferred model MB pref has a correlation coefficient larger than ∼ 98 %, up to degree l = 80, with respect to almost every model within the optimal range (excepted those too 330 close from MB noise ). Obviously, such small model differences are not relevant, so that our choice of a preferred solution can be qualified as quasi-objective.

Looking at mid lower-mantle through multiple-frequency S-wave delay-times
We present here our preferred multi-band model, MB pref , at selected depths in mid lower-mantle.
It consists in 3-D shear-wave velocity anomalies (δ ln V S ) resulting from the simultaneous inversion 335 of a global dataset of 287 078 S and SS delay-times measured by cross-correlation in five frequency bands (10, 15, 22, 34, 51 s central periods). Technical tomographic details are given in Sect. 2 and in Appendices A and B. The construction of such a model was motivated by the recent discovery of significant structural dispersion in this dataset, after correction for crustal and 1-D attenuation effects. Taking into account this new observable in global tomography holds the promise to better 340 constrain the structure of the Earth's interior. Our current data coverage is too sparse below ∼ 2000 km depth, owing to the drastic criteria applied by Zaroli et al. (2010) for cross-correlating fully isolated S and SS waveforms only. For instance, S waves with epicentral distance ∆ ≥ 75 • were rejected to avoid interference between S and ScS. This was the price to pay for single-phase cross-correlation delay-time measurements; although BDT can handle multiple phases, to model 345 closely spaced phases as upper-mantle triplications or S-ScS beyond 80 • , good amplitudes are needed and full waveform techniques are easier to use than ray-based FF kernels. Besides, as the paraxial approximation is not valid near antipodal epicentral distances, we did not use SS delays with ∆ ≥ 140 • (Tian et al., 2007). Thus, we focus on mid lower-mantle, where our current data coverage is the most adequate for high-resolution imaging, in the three depth ranges: [660,960], [960, 1510], 350 and [1510, 1910] km. In the following, we discuss major structural features in our model, and show its compatibility with three other recent global shear-wave velocity models. We postpone a detailed structural interpretation of our tomographic results, which is beyond the scope of this paper. documented in the past (e.g., Li and Romanowicz, 1996;Masters et al., 1996;Grand et al., 1997;Van der Hilst et al., 1997;Masters et al., 2000;Ritsema and Heijst, 2000;Megnin and Romanowicz, 2000;Romanowicz, 2003;Montelli et al., 2004aMontelli et al., ,b, 2006Houser et al., 2008;Ritsema et al., 2011). Geodynamicists now widely agree upon a link between these broad-scale anomalies and cold downwellings from ancient subductions, driving an important part of the mantle circulation (e.g., 360 Schuberth et al., 2009). For instance, one sees the seismic signature of the ancient Farallon slab beneath North America (e.g., Grand et al., 1997), and of the remnants of Tethys beneath Mediterranean/Southern Eurasia (e.g., Van der Hilst and Karason, 1999;Houser et al., 2008). They are usually associated with cold material sinking into the lower mantle. Concerning the Farallon slab, it is worth noting that our model features at 960-1510 km depth a detached slab beneath the western 365 quarter of North America (Fig. 6d), as recently discussed by Sigloch and Mihalynuk (2013). The presence of two large low-velocity regions at the Core Mantle Boundary, beneath Africa and Pacific, is a common feature found in previously cited whole-mantle S-wave models. These two regions are often referred to as superplumes, and could be feeding up smaller low-velocity anomalies in mid lower-mantle (e.g., Davaille, 1999;Davaille et al., 2005;Courtillot et al., 2003;Lay, 2005;Montelli 370 et al., 2006;Schuberth et al., 2009). Figure 6 shows that some of them are apparent in our model, most often beneath Africa and Pacific and generally located nearby a known hotspot (Tahiti, Samoa, Horn of Africa, Afar, South Africa, etc). Figures 6 and 7 show that long-wavelength structures, which are very similar across recent tomographic models (e.g., Romanowicz, 2003), are well retrieved in this study. More precisely, Fig. 7 375 shows for visual comparison our model MB pref with three other global shear-velocity models from latest generation: S40RTS (Ritsema et al., 2011), PRI-S05 (Montelli et al., 2006), and TX2007 (Simmons et al., 2007). S40RTS results from Rayleigh wave phase velocity, teleseismic shearwave traveltime and normal mode splitting function measurements, and is parameterised by spherical harmonics up to degree 40 and by 21 vertical spline functions. PRI-S05 and MB pref result 380 from cross-correlation shear-wave delay-times measured in one or several frequency band(s), respectively. They are irregularly parameterised according to data coverage. TX2007 results from a joint inversion of seismic data (teleseismic shear-wave traveltimes) and geodynamic constraints (global free-air gravity, tectonic plate divergence, excess ellipticity of crust-mantle boundary, dynamic surface topography), and is parameterised with an irregular grid. S40RTS and TX2007 are 385 based on RT, while PRI-S05 and MB pref rely on BDT. Model differences are expected to come from: (i) datasets (type of data, sources-receivers geometry, reference model, correction of attenuation and crust, etc), (ii) model parameterisation, (iii) model regularisation, (iv) modelling of wave propagation. The four models exhibit an overall good agreement, in mid lower-mantle, at long wavelengths. As expected, model differences are more significative at short wavelengths. Figure 8 is an attempt 390 to better characterise those differences. We show the spectra of the four models using spherical harmonics decomposition (Fig. 8a-c), and the correlation of our preferred model with respect to the three others (Fig. 8d-f). We find that the correlation is statistically significative up to degree l ∼ 30 in mid lower-mantle, which corresponds to anomalies of ∼ 1000 km horizontal length. Note that all spectra have very little energy for degree l larger than 30. Our model is thus well compatible 395 with other recent global tomographic studies. However, we also report some important differences at short and intermediate wavelengths. PRI-S05 and MB pref tend to show slightly larger amplitudes, in mid lower-mantle, than S40RTS and TX2007. These model differences may partly be explained by the theoretical frameworks (e.g. BDT can compensate for wavefront-healing effect but RT cannot), or by the different nature of the data sets. However, a significative part of these differences 400 likely comes from the used tomographic recipe for mapping the data into the model. We think an important ingredient is the subjective amount of regularisation used in the inversion. In the case of global MFT, we have shown how to strongly lessen this subjectivity, and hence counteract our poor knowledge of data uncertainties. A broader use of this philosophy could help to converge towards more objective seismic imaging of the mantle, which could strengthen the comparison of different 405 tomographic models.

Conclusions
A recent focus in global tomography has been to exploit multiple-frequency body-wave delay-times (referred to as Multiple-Frequency Tomography, MFT) for better constraining the 3-D velocity structure of the Earth's mantle. Solving such linearised ill-posed inverse problem requires the determina-410 tion of an optimal value of the regularisation parameter (damping). Poor knowledge of data uncertainties often makes the selection of damping rather arbitrary, which leaves the choice of optimum model to the investigator. This issue is even more important for MFT, since to date there exists little understanding of the observational error sources that affect narrow-band cross-correlation estimates.
To go beyond that subjectivity, an objective rationale for the choice of damping in the case of MFT 415 has been presented and validated using a large data set of 287 078 S-wave delay-times measured in five frequency bands (10, 15, 22, 34, 51 s central periods). First, we have shown how to take advantage of the expected coherency of delay-time estimates in different frequency bands. That is, whereas for each ray path the delay-time estimates should vary coherently from one period to the other, the noise most likely is not coherent. Thus, we have shown that the lack of coherency of the information 420 in different frequency bands could be exploited, using an analogy with the cross-validation method, to identify models dominated by noise (under-damped). In addition, we have shown that analysing the behaviour of the model ∞ -norm, as damping varies, could give us access to the signature of data noise starting to significantly pollute at least one model component, and thus could help us to identify models constructed with poor data exploitation (over-damped). We have shown that the selection of 425 a preferred model from the remaining range of permitted solutions, i.e. models neither dominated by noise nor with poor data exploitation, could be qualified as quasi-objective. Indeed, the selected model shows a high degree of similarity with almost all other permitted models (correlation superior to 98 % up to spherical harmonic degree 80). For completeness, the resulting tomographic model has been presented at selected depths in mid lower-mantle (660-1910 km depth), and shown to be 430 compatible with three other recent global shear-velocity models. In conclusion, we feel that the presented rationale, for objectifying the choice of damping, could benefit various styles of inverse problem. For instance, the coherency analysis could easily be applied to other data type, provided that data could be subdivided into subsets with similar sensitivity to model parameters -as we did when subdividing our multi-band data set into single-band subsets. A wider application of this ra-435 tionale should help to converge towards more objective seismic imaging of the Earth's mantle, and thus to better understand its dynamics.

Multi-resolution model parameterisation based on spherical triangular prisms
We assume that the whole-mantle shear-wave model is described by a finite number of parameters 440 (m j ) 1≤j≤M . Thus, the continuous model is given by m(r) = M j=1 m j b j (r), where the basis functions b j represent the model parameterisation. A major constraint in global seismic tomography is the irregular illumination of the mantle, due to the non-uniform geographical distribution of earthquakes and receivers. In most tomographic studies (e.g., Houser et al., 2008;Ritsema et al., 1999Ritsema et al., , 2011, the spatial variability in the data's resolving power is largely ignored by using uniform basis 445 functions (e.g., cubic cells or spherical harmonics). An alternative is to use a multi-resolution model grid, that is adapted to the spatially varying constraints of the data (e.g., Michelini, 1995;Spakman and Bijwaard, 2001;Montelli et al., 2006;Simmons et al., 2009). A brief review on irregular parameterisation can be found in Sambridge and Rawlinson (2005) and Rawlinson et al. (2010). In this study, we have built a model parameterisation that attempts to be adapted to the scale of the smallest 450 structure one may expect to resolve with the data. The whole mantle is divided into 18 spherical layers, whose thickness varies between 100 and 200 km (Fig. 9a). Each layer is divided into a set of spherical triangular prisms, whose spatial distribution is laterally irregular. The vertices (nodes) of the spherical triangle at the top of each prism represent the model parameters (red dots in Fig. 9c).
We follow the strategy of Nolet and Montelli (2005) to find an optimal nodes distribution in each 455 layer, based on the ray density, that attempts to maximise the extraction of structural information from the data. The major difference of our approach is that we do not use tetrahedra to fill in the mantle. We prefer to use spherical triangular prisms (Fig. 9c), spanning constant-depth spherical layers, because this makes the refinement of the model grid simpler. The total number of nodes we use to parameterise the mantle is M = 38 125. Nodes spacing ranges from about 200 to 800 km. The 460 current coverage of seismic stations generally allows us to have closely spaced nodes in the Northern Hemisphere; nodes spacing is coarser in the Southern Hemisphere. Figure 9b shows an example of the nodes distribution obtained in one layer (530-660 km depth). Note how the model grid has been refined beneath seismically active regions where the ray density is relatively high. The entire set of nodes (latitude, longitude, depth) of our global parameterisation is available upon request from the 465 first author.

Appendix B
Analytical expressions for finite-frequency ray-based sensitivity kernels We aim at deriving analytical expressions for single-phase (K S , K SS ) and two-phase interference (K S+sS , K SS+sSS ) kernels, in order to decrease the computational cost of banana-doughnut kernels 470 (Dahlen et al., 2000). This is valuable as we aim at computing hundreds of thousands of kernels on a very fine grid (regular cells with edges of 20 km), and it may be useful for double-checking the validity of numerical calculations. Examples of such analytical kernels are displayed in Fig. 10. The finite-frequency delay-time sensitivity kernel of a single-phase, with respect to velocity perturbation (δc/c), is (Dahlen et al., 2000): where r x is the spatial position of scatterer x, ∆Φ is the phase shift due to passage through caustics or super critical reflection, R rs , R xr , R xs are the geometrical spreading factors, |ṁ(ω)| 2 is the source power spectrum, ∆T is the detour time of the scattered wave, c r and c(r x ) are the velocities 480 at receiver and scatterer position, respectively. Earthquakes catalogues contain a large majority of shallow earthquakes, which implies to take into account the interference of a direct phase (e.g. S, SS) with its associated depth phase (e.g. sS, sSS) within the cross-correlation time window. The kernel associated with two-phase interference is (Dahlen et al., 2000): [ · · · ] 11 = | det(H 1 )| sin(ω∆T 1 − ∆Φ) [ · · · ] 12 = | det(H 1 )| sin(ω(∆T 1 + t 1 − t 2 ) − ∆Φ)

485
where "1" and "2" stand for the direct and depth phases, and where t i , ∆T i , and H i are the predicted arrival times, detour times, and Hessian matrices of the direct (i = 1) and depth (i = 2) phases, respectively. Provided the filtered source time function m(t) has a known analytical expression, one sees that it is possible to calculate analytical expressions for both single-and two-phase inter-490 ference kernels. We only need to calculate the ratios of integrals over ω in Eqs. (B1) and (B2); the remaining terms can easily be computed using the software by Tian et al. (2007), available at https://www.geoazur.net/GLOBALSEIS/Soft.html. Thus, following Hung et al. (2001), we assume a Gaussian filtered source time function whose spectrum is given by |ṁ(ω)| 2 = ω 2 T 2 2π e −( ωT 2π ) 2 , with T the dominant wave period. This assumption is appropriate for S-waves within the 10-51 s period 495 range. Unless the wave is super-critically reflected, with an angle-dependent phase shift, ∆Φ takes three possible values: 0, −π/2, and −π. For S waves, ∆Φ is always equal to 0, while for SS waves ∆Φ = −π/2 in between the two caustics and ∆Φ = 0 elsewhere. Equations (B1) and (B2) neglect the effects of transmission coefficients as well as directional effects in the scattering, which are small as long as the sensitivity kernel is narrow. Thus, the maximum detour time ∆T to which we com-500 pute the kernel is equal to T (wave dominant period), which neglects third and higher order Fresnel zones. The resulting analytical expressions we obtain for single-phase kernels (K S , K SS ) are: For two-phase interference kernels (K S+sS , K SS+sSS ), we find: We use the derived Maclaurin series expansion for the evaluation of the imaginary error function, that is: The numerical approach of Simpson et al. (2003) is used to estimate the number n of terms needed for the series to converge. We find that n = 10 is appropriate in our case. For completeness, note that for compound rays (sS, SS, sSS), one scatterer can, sometimes, be associated to more than one perpendicular projection point on the geometrical ray path (e.g., Hung et al., 2000;Tian et al., 2007;Nolet, 2008). Thus, for scatterers near a discontinuous interface, such as the surface, we made 515 sure to take into account incoming rays that hit the scatterer directly as well as those that first visit a boundary.  1. Summary of our approach for objectifying the choice of regularisation parameter (damping). We plot the curve of trade-off (L-curve) between the model norm, ||MB λ || 2 2 , and its residual data misfit, d10,15,22,34,51), as damping λ varies. The L-curve is displayed with solid-line parts in blue, red and light orange, corresponding to different model ranges. Its curvature κ is shown in grey dashed-line (with arbitrary amplitude-scaling). The model MB κmax (black diamond) corresponds to the maximum curvature of the Lcurve. The two extreme models MB noise and MB poor delimit the range of models "dominated by noise" or "with poor data exploitation", respectively. In Sect. 3.3, we show that the selection of our preferred model MB pref Fig. 2. Plot of the curve of trade-off between the model ∞ -norm, ||MB λ ||∞, and its residual data misfit, d10,15,22,34,51). A sharp change of behaviour of the model ∞ -norm occurs as the damping λ gets lower than a threshold value λpoor, which is interpreted as the signature of data noise starting to significantly pollute at least one model component. Models with damping λ ≥ λpoor are identified as with "poor data exploitation". Fig. 3. Illustration of the coherency analysis. It consists in testing how well single-band models SB λ 34 (inversion of 34 s data) do fit the 10 s data (d10) as damping λ varies. The curve of trade-off between the single-band model norm, ||SB λ 34 || 2 2 , and its 10 s data misfit, χ 2 red (SB λ 34 , d10), is characterised by a "reversal" which occurs for damping λ noise . Single-band models with damping λ ≤ λ noise are identified as "dominated by noise".

Fig. 4.
Illustration showing how to infer multi-band models dominated by noise from single-band models. The curve of trade-off, between model norm and 34 s data misfit, is plotted for single-(SB λ 34 , dashed line) and multiband (MB λ , solid line) models. An acceptable multi-band model should not give a better fit to the 34 s data (d34) than the single-band model SB noise 34 . Thus, multi-band models with χ 2 red (MB λ , d34) ≤ χ 2 red (MB noise , d34) are identified as "dominated by noise". The multi-band model MB noise is defined as having the same 34 s data misfit as SB noise 34 .

Fig. 5.
Illustration showing that the selection of our preferred multi-band model MB pref (black star) is quasiobjective in terms of model interpretation. (a) Trade-off curve corresponding to the "optimal range" of models, after identification of those "dominated by noise" (MB noise ) or with "poor data exploitation" (MB poor ). (b-d) Plot of the correlation of three specific models (MB noise , MB pref , MB poor , as indicated by white arrows) with several models MB λ spanning the "optimal range" of models (respectively). Correlation is shown for spherical harmonic degrees l = 1 to 80, and is displayed using the colorscale on the right (black is 100 %, and white is 98 %). One sees that MB pref has a correlation coefficient greater than 98 %, up to degree l = 80, with respect to almost all models MB λ within the optimal range. The characteristic horizontal length associated with l = 80 is ∼ 400 km. Here, the correlation is shown for models averaged over [960, 1510] km depth, but we checked that it holds true at other depths.