Introduction

Solid Earth

1869-9529

Copernicus Publications

Göttingen, Germany

10.5194/se-7-1521-2016

Fully probabilistic seismic source inversion – Part 2: Modelling errors and station covariances

Stähler

Simon C.

mail@simonstaehler.com Sigloch

Karin

https://orcid.org/0000-0002-9876-4628

1Department of Earth and Environmental Sciences, Ludwig-Maximilians-Universität (LMU), Theresienstr. 41, 80333 Munich, Germany 2Munich Centre of Advanced Computing, Department of Informatics, Technische Universität München, Munich, Germany 3Leibniz Institute for Baltic Sea Research (IOW), Seestr. 15, 18119 Rostock, Germany 4Department of Earth Sciences, University of Oxford, South Parks Road, Oxford OX1 3AN, UK

Simon C. Stähler (mail@simonstaehler.com)

7November2016

7 6 15211536 6June2016 27June2016 10October2016 13October2016

This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/

This article is available from https://se.copernicus.org/articles/7/1521/2016/se-7-1521-2016.html

The full text article is available as a PDF file from https://se.copernicus.org/articles/7/1521/2016/se-7-1521-2016.pdf

Seismic source inversion, a central task in seismology, is concerned with the estimation of earthquake source parameters and their uncertainties. Estimating uncertainties is particularly challenging because source inversion is a non-linear problem. In a companion paper, Stähler and Sigloch (2014) developed a method of fully Bayesian inference for source parameters, based on measurements of waveform cross-correlation between broadband, teleseismic body-wave observations and their modelled counterparts. This approach yields not only depth and moment tensor estimates but also source time functions.

A prerequisite for Bayesian inference is the proper characterisation of the noise afflicting the measurements, a problem we address here. We show that, for realistic broadband body-wave seismograms, the systematic error due to an incomplete physical model affects waveform misfits more strongly than random, ambient background noise. In this situation, the waveform cross-correlation coefficient CC, or rather its decorrelation D=1-CC, performs more robustly as a misfit criterion than ℓp norms, more commonly used as sample-by-sample measures of misfit based on distances between individual time samples.

From a set of over 900 user-supervised, deterministic earthquake source solutions treated as a quality-controlled reference, we derive the noise distribution on signal decorrelation D=1-CC of the broadband seismogram fits between observed and modelled waveforms. The noise on D is found to approximately follow a log-normal distribution, a fortunate fact that readily accommodates the formulation of an empirical likelihood function for D for our multivariate problem. The first and second moments of this multivariate distribution are shown to depend mostly on the signal-to-noise ratio (SNR) of the CC measurements and on the back-azimuthal distances of seismic stations. By identifying and quantifying this likelihood function, we make D and thus waveform cross-correlation measurements usable for fully probabilistic sampling strategies, in source inversion and related applications such as seismic tomography.

Symbols frequently used in this paper

Model vector of earthquake source parameters (M-dimensional) as defined in : earthquake depth, moment tensor and source time function

Geophysical data vector (N-dimensional)

Number of model parameters

Number of data

g(m)

Forward operator acting on a model parameter vector m

L(m|d)

Likelihood of a model m given the data d

L∗(m|d)

Likelihood-equivalent function of a model m given the data d, constructed from the distribution of misfit values. Termed “empirical likelihood” in this article.

Data covariance matrix

Total misfit of one model m and data d, Φ=-ln⁡L

ΦW

Misfit between one recorded and predicted seismogram

uj,i,i={1,…,nj}

Time-discrete seismogram j in a time window around a phase of interest.

uj,i,i={1,…,nj}c

Synthetic seismogram j predicted by a model m and a forward operator g(m)

Index of seismogram time window

Index of sample in seismogram time window

Number of samples in a time window j

Number of time windows. nS≡N if the decorrelation misfit is used.

ntot=∑j=1nSnj

Total number of samples in all time windows. ntot≡N if a ℓp misfit is used. CCkui,uic (Normalised) cross-correlation function between time series u and uc using a window function wi: CCkui,uic=∑i=1n(wiui-kc⋅ui)∑i=1n(wiui-kc)2⋅∑in(wiui)2 CCui,uic Maximum of CCkui,uic over k; the “correlation between ui and uic ”

Dui,uic

Decorrelation, Dui,uic=1-max⁡k{CCkui,uic}

Coefficient for the level of waveform perturbation in the synthetic testsdescribed in Sect.

Coefficient for the level of background noise in said test

Introduction

Visual summary of the fully probabilistic source inversion algorithm PRISM presented in the companion paper , on the example of a magnitude-5.7 earthquake in the US state of Virginia on 23 August 2011. (a) Candidate source solutions are evaluated according to the cross-correlation fit they produce between observed broadband, teleseismic P waveforms (black) or SH waveforms (blue), and their modelled counterparts (red). The present study is concerned with quantifying the noise distribution on these cross-correlation measurements CC – one scalar per source–receiver pair, 48 in total for this earthquake. (b) To reduce the dimensionality of the model space to a number accessible to Bayesian sampling, the source time function (STF) is parameterised as a linear combination of 15 empirical orthogonal functions found to best span the space of a large set of 900 reference STFs . (c) The “Bayesian beach ball”, a visual average of the posterior ensemble of well-fitting solutions, conveys not only the nature of the moment tensor but also the magnitude and nature of its uncertainties. (d) The marginal probability of the hypocentre depth. (e) Weighted average of STFs from the posterior ensemble of good solutions permits assessment of the uncertainties in STF shape. This STF is clearly unimodal and of less than 5 s duration. (f) As a secondary benefit, this procedure yields the uncertainties (standard deviations) of cross-correlation travel time measurements at all stations, and their inter-station correlations. Travel times are the primary input data for seismic tomography, and these insights into their uncertainties are not readily available from other methods.

The quantitative estimation of seismic source characteristics is one of the most important inverse problems in geophysics, from both scientific and societal points of views. Source parameters not only can be used to locate earthquakes and to understand earthquake mechanisms and their implications for tectonic settings and seismic hazard, but they are also important in seismic tomography, where accurate source information is a prerequisite for achieving optimal fits between observed and modelled (waveform) data.

Estimation of seismic source parameters includes an earthquake's location, depth, fault plane and temporal rupture evolution. The inverse problem is non-linear, and parameter correlations result in trade-offs and non-uniqueness, e.g. the correlation between dip and scalar moment that was discovered by . Source depth is a particularly challenging parameter; for example often find multiple local minima in waveform data misfits as a function of depth, even when source time functions (STFs) are explicitly estimated. This makes global search methods and ensemble sampling particularly attractive if the associated computational hurdles can be surmounted. For finite-fault inversion of large earthquakes, Bayesian methods have been developed in recent years , as they also have been for non-kinematic inversions of regional events , but we focus on the inversion of source time functions of intermediate-sized events (mb 5.5 to 7.5) from broadband, teleseismic waveforms.

In a companion paper , we developed the PRobabilistic Interference of Source Mechanisms (PRISM) algorithm, a fully probabilistic inversion for source depth, moment tensor and STF, via sampling by both stages of the neighbourhood algorithm NA;. Figure sums up the procedure and its results.

The need for PRISM arose from our work in global-scale waveform tomography, which fits broadband body-wave seismograms of moderate to large earthquakes to modelled synthetics, up to the highest occurring frequencies (≈1 Hz). This can only be achieved with good a priori estimates of source depth, which strongly shapes the synthetic Green's functions, and of source time functions, which convolve the Green's functions. At the time, no data centre delivered routine estimates of broadband STFs by now, efforts other than ours are underway;. Hence developed a linearised, iterative approach that semi-automatically deconvolved broadband source time functions, source depths and moments tensors of more than 2000 earthquakes, which were subsequently used in several waveform tomographies .

The required human supervision time called for full automatisation, preferably in a Bayesian setting that would circumvent the occasional divergence of the non-linear optimisation and would automatically diagnose parameter trade-offs of the kind described. PRISM solved this problem, but we left the justification of its misfit criterion and the derivation of its noise model and likelihood function to the present study.

To render ensemble sampling with the NA computationally feasible, the dimensionality of the model parameter space has to be as small as possible, preferably less than 20. Depth is one parameter, and a normalised description of the moment tensor requires five more a more rigorous and uniform parameterisation of the moment tensor has been derived by. Although latitude and longitude could easily be added to this list, we do not consider them here, because the lateral location problem is adequately addressed by existing data centres National Earthquake Information Center (NEIC) or, and in any case we would re-estimate all hypocentres at the time of tomographic inversion. The STF is a high-dimensional parameter vector, which and parameterised simply as a time series of 256 unknowns (10 Hz sampling rate, 25.6 s length). To reduce its dimensionality for Bayesian sampling, made use of a dataset of >2000 deterministic earthquake source solutions (depth, moment tensor and STF) obtained by . We selected the 900 best-constrained STFs and composed this set into empirical orthogonal functions (EOFs), denoted sl(t). Any broadband STF s(t) of events up to magnitudes of about 7.5 is well described by a linear combination of the first L EOFs, where L≈15 delivers sufficient accuracy for our purpose: s(t)=∑l=115alsl(t). These EOFs sl(t), shown in Fig. b, are the primary means by which we feed a priori expert knowledge into the Bayesian sampling problem. PRISM's STF parameterisation consists of the first L EOF weights al, bringing the total dimensionality of the parameter space to ≈20.

This space is sampled by both stages of the neighbourhood algorithm, resulting in an ensemble of source solutions m (cf. Table ). From this ensemble, marginal probabilities for any model parameter can be estimated, e.g. for the depth (Fig. d) or the STF (Fig. e). As a visual means of conveying uncertainties in the moment tensor, we invented “Bayesian beach ball” plots (Fig. c), a superposition of many beach ball representations in the a posteriori ensemble. A valuable side benefit is full uncertainties on travel time measurements ΔTj at stations j. These travel time delays are incidental in the context of source inversion (as the time shifts between observed and synthetic seismograms that maximise the cross-correlation coefficients CCj, Fig. f), but they represent the primary input data for our seismic waveform tomographies.

The primary measure of fit (or “input data”) for PRISM's source inversions is the CCj. When parameter estimation is performed as a deterministic optimisation problem, (only) a relative measure of fit or misfit is required: the optimal solution is the one that yields the smallest misfit between observations and model predictions, in our case the largest possible values of cross-correlation coefficients CCj. By contrast, Bayesian parameter estimation requires not just a measure of misfit but also a likelihood function for it, which is derived from the probability distribution on the data (the “noise model”). In the absence of a noise model, the likelihood of a randomly drawn candidate solution cannot be evaluated. Obtaining a noise model for a misfit requires much more information about the measurement process and its statistics than the mere adoption of a misfit measure. This is the big challenge of Bayesian “inversion”, which will be covered in this paper.

Section argues for the adoption of the signal decorrelation D=1-CC as a robust measure of misfit, where CC is the normalised cross-correlation coefficient (Table ). To our knowledge, the decorrelation D of seismological waveforms has not been used as a misfit criterion in Bayesian inference other than by because its noise model and likelihood function were unknown – a shortcoming D shared with other deterministic misfit choices, such as the instantaneous phase coherence , time phase misfits or multi-tapers .

Section shows that the popular ℓ2 and ℓ1 norms would be sub-optimal misfit criteria because noise in seismic signals is not simply additive Gaussian or Laplacian but rather partly signal-generated, i.e. highly correlated across time samples and stations, and better described by a transfer function. Figure shows an example of this systematic noise “coda”. Section defines the general requirements of a good misfit criterion, and Sect. demonstrates that the signal decorrelation D performs more robustly than sample-by-sample (ℓp) norms on realistic seismological waveform data.

To identify a likelihood function L(m|d) of misfit D in Sect. , we draw once more on the prior knowledge contained in our set of deterministic source solutions for 900 earthquakes and on the 200 000 measurements of CC=1-D made to obtain them. From this large, representative and highly quality-controlled dataset of confident source solutions, we obtain the statistics of the residual misfits D, which we use to construct an empirical likelihood L∗(m|d). Thus we can instruct the probabilistic inversion to explore subspaces of solutions m that yield similarly low levels of misfit D as these best-fitting deterministic solutions.

Section presents a worked example for the construction of a likelihood function L(m|d) from data of a typical earthquake, the 2011 Virginia event used throughout this paper and its companion . We conclude with a discussion in Sect. .

Noise and misfit criteria

Three noise cases for compressional (P) waves in source inversion; the waveforms were produced by the M 5.7 earthquake in Virginia (23 August 2011). Station BFO has a high signal-to-noise ratio (no wiggles preceding the P pulse), and the waveform is fit well by a WKBJ synthetic using our best source solution for this earthquake. Station LPAZ has a high signal-to-noise ratio, but 3-D structure produces a strong coda following the P pulse, i.e. signal-generated, systematic “noise” not fit by the synthetic waveform. Station LCO has a low signal-to-noise ratio and a coda. Since the coda cannot be modelled, it must be considered noise, albeit of a systematic nature and correlated across time samples and across stations. By contrast, ambient noise is random and not correlated across stations, only across time samples (since the signal is band-limited).

Bayesian inference

Bayesian inference estimates the posterior distribution π(m) of the parameters m given d, using the prior distribution p(m) of the model parameters m and the likelihood L(m|d) of the data d, given the model m, by applying Bayes' rule: π(m|d)=1p(d)L(m|d)p(m). p(d) is the prior distribution of the data d and does not depend on the experiment. A likelihood function L(m|d) is equivalent to the probability distribution p(d|m) of data d given the model parameters m . It depends on the difference between measured data d and predicted data g(m). This difference or misfit is defined, following convention, as Φ(d,g(m))=-ln⁡(L(m|d)), so that a model with a high likelihood has a diminishing misfit. Since the likelihood of a model can vary by orders of magnitude, the logarithm brings the misfit back to natural scaling.

The exact formula for L(m|d) depends on the assumed noise model and potential error sources in the forward model. Equation () requires that the misfit criterion take those into account as well. Next, we will show that this is straightforward only for specific assumptions about the noise, which are usually not realistic.

Metric-based misfit criteria

“Good” solutions m are associated with small misfits Φ, where the exact definition of Φ depends on the nature of the data d, which may be hand-picked arrival times; dispersion curves; or, in our case, seismic displacement time series (“waveforms”). A waveform misfit is generally a functional ΦW:RN×RN↦[0,∞) on d,g(m)∈RN.

The misfit functional has similar properties to a metric on RN, but it should be noted that there is no natural choice; rather, its choice implies a strong assumption of prior knowledge about the statistical properties of the noise on d. In the case of seismic waveform data, the data vector d is the measured time-sampled seismogram ui and the separate data are the samples ui,i={1,…,n} of this time series. The vector g(m) is the synthetic seismogram uic,i={1,…,n} predicted by the forward operator g for the model m.

When the method of least squares is used to calculate the ℓ2 misfit, Φℓ2W(m|d)=k′12(d-g(m))TSD-1(d-g(m)), the assumption is that the noise ϵ is additive and Gaussian-distributed: d=g(m)+ϵ,ϵ∼N(0,SD). The size [N×N] data covariance matrix SD∈SymN describes the correlation between the error of individual measurements di. k′ is a normalisation constant.

In the case of a seismic waveform ui, ΦW is Φℓ2W(m|d)=k′∑i=1n∑i′=1n(ui-uic)T(SD-1)i,i′(ui′-ui′c), and SD describes mainly the band-limited spectrum of environmental noise. Since a simple time shifting of ui or uic will violate the assumption of Eq. (), the ui or uic need to be aligned first. Because we assume this noise to be time-invariant, we can build SD from the autocorrelation function Rϵϵ of the (discrete) noise time series ϵi. SD is a Toeplitz matrix, where the rows are shifted instances of the autocorrelation function Rϵϵ. SD,k,k+l=Rϵϵ(l)=∑inϵiϵi-l See for an example of how to construct SD under the assumption of an autoregressive (AR) noise model.

For the estimation of the parameters m of one earthquake source, we would normally use seismograms measured at different stations, cut into a total of nS time windows ui, counted with index j. The overall misfit Φ(m) for a source solution will be comprised of the misfits of the single waveforms Φℓ2,jW(m). If the noise on each waveform j is assumed to be uncorrelated with the noise on all others, then it is legitimate to define the overall misfit as being simply additive: Φ(m)=∑j=1nSΦℓ2,jW(m). If the noise on the waveforms is correlated, then Eq. () has to be extended, such that d, m and SD contain all time samples of all waveforms recorded at different stations. This effort has – to our best knowledge – not been made in seismic inverse problems.

If each measurement i is considered to be uncorrelated with the others and has a variance σi, then SD is a diagonal matrix with diagonal elements σi2 and Eq. () reduces to Φℓ2W(m|d)=k′2∑i=1N(di-gi(m))2σi2 or, in the case of waveforms, Φℓ2W(m|d)=k′2∑i=1N(ui-uic)2σi2. With a set of nS waveforms ui,j, the total misfit defined in Eq. () becomes Φ=k′2∑j=1nS∑i=1nj(ui,j-ui,jc)2σi2, the weighted least-squares criterion.

If the noise can be described well by the normal distribution, the ℓ2 norm can be successfully applied. It is, however very sensitive to data di deviating strongly from the prediction gi(m). Outlier samples can dominate the whole inversion process, while the residual misfit of almost-fitting parts of the waveform has no influence. Experience shows that realistic noise on seismic waveforms usually has more outliers than predicted by Eq. ().

Hence, have proposed to use the more outlier-resistant ℓ1 norm as a misfit criterion of observed and modelled seismograms. They assume that noise on the time samples ui is independently Laplace-distributed with width bi, i.e. no temporal correlation: d=g(m)+ϵ,ϵi∼Laplace(0,bi), Φℓ1W(m|d)=-∑i|di-gi(m)|bi-ln⁡2bi. Time samples of realistic, band-limited seismograms are strongly correlated, which calls for the use of multivariate Laplace distributions. This is the subject of ongoing research , but the resulting probability density functions (PDFs) are still too complex to be used in ensemble inference. To make things worse, seismograms recorded at different stations j will generally also be correlated. Hence the simplicity of the univariate Laplace distribution is not applicable, and the robustness of the ℓ1 norm currently cannot be harnessed.

Other authors have proposed to use misfits based on general ℓp norms e.g. p=1.5 in, which allow the robustness of the misfit to be tuned to the noise on the data. ΦℓpW(m|d)=∑i=1ndi-gi(m)pσp1/p The underlying noise model is an exponential power distribution. However, all problems described for the ℓ1 norm apply here as well, and no multivariate forms exist in general.

In summary, it is tempting to chose ℓp misfits based on the time-sample-wise distance between observed and modelled waveforms because the underlying noise models are straightforward to state (uncorrelated or correlated Gaussian, uncorrelated Laplace distribution) and to translate into corresponding likelihood functions. Unfortunately, these noise models are very crude approximations of the pervasive noise characteristics and correlation found in real time series.

These serious shortcomings motivate our proposal of alternate misfit criteria.

Noise-model-based misfit

In a Bayesian context, the likelihood L(m|d) is a defined by the noise model on the data. An equivalent function L∗(m|d) can be constructed from the distribution p(F) of any functional F of the observed and predicted waveforms ui,uic∈R: F:R×R↦[0,∞). In our attempt to move beyond F being a sample-wise distance between ui and uic, we generally want a candidate F to meet the following conditions:

For ui=uic, F should take a fixed value, say 0.

With decreasing similarity of ui and uic, F should increase, irrespective of the exact definition of similarity (Sect. will consider this further).

F should be robust against time shifts Δt=k⋅dt or amplitude errors a affecting the waveform ui, i.e. Fa⋅ui+k,uic≊Fui,uic for any a∈R,k∈N, because such unknown time shifts will affect real-world seismograms.

F should have discriminative power with respect to the model parameters m, combined with robustness against realistic noise and theoretical errors.

Concerning the noise, we need to be able to calculate the distribution of F for a waveform afflicted by the typical three error sources: background noise, waveform modelling error and instrument error.

Ambient noise ϵnoise: this is noise from man-made or natural sources around the receiver. It can be described very well by an additional term, like ϵnoise∼N(0,S) (see Eq. ).

Waveform modelling error Tmodel,i: the synthetic waveform uic can never be identical to the observed ui, even in the absence of ambient noise. In the context of source modelling, the earth's impulse response (Green's function) can be considered a linear, time-invariant operator that acts on the source time function. The calculation of this Green's function is not perfect (e.g. due to errors in the earth model or imperfect computational methods). called this the theoretical density function and proposed to model this systematic error by an additive term on uic, but we think that it should rather take the form of a transfer function Tmodel,i, between ui and uic, which will hopefully be Dirac-like in character. However, Tmodel,i will include the site response (receiver side reverberations), which can create strong waveform coda; see Fig. . Hence, Tmodel,i could in practice be rather oscillatory.

Instrument error Tinst,i: a displacement seismogram ui is assumed to have been corrected for the instrument response of its seismic sensor. In practice, this correction may be imperfect , e.g. due to erroneous sensor metadata. We model this systematic error by another (hopefully Dirac-like) transfer function Tinst,i convolving ui.

In summary, the difference between a modelled uic and observed waveform ui is ui=uic∗Tmodel,i∗Tinst,i+ϵnoise,i. It is this complex mixture of noises that misfit criterion F should be robust against while retaining discriminatory power toward source model parameters m.

Next, we will test the signal decorrelation D as an alternative to ℓp norms against these four criteria.

Signal decorrelation coefficient as a misfit

Comparison of the ℓ1,ℓ2 norm and the signal decorrelationComparison of the ℓ1,ℓ2 norm and the signal decorrelation D=1-CC as misfit criteria in noisy signals. A perturbed synthetic waveform upertc for a 10 km deep explosion source, measured at a station at 40∘ epicentral distance, was compared to synthetic seismograms uc for other depths, using the three misfit criteria. The shaded colours mark the 95 % quantiles of the misfit values, calculated by perturbing the reference waveform with different random seeds. The figure shows the relatively high robustness of the cross-correlation coefficient in recognising reference signals in perturbed measurements. For better visualisation, all misfit values have been normalised separately to have an average values of 1 between 20 and 30 km.

Distance between misfit value for the true source depth vs. the plateau for depths 20–30 km in standard deviations. See Fig. for waveforms and misfit curves. The “weak-perturbation” curve is calculated with perturbation factor α=0.1, and the “strong-perturbation” curve with α=0.9 (see Eq. ). For all SNR values, the decorrelation has a higher discriminative power than ℓ1 or ℓ2.

We choose the signal decorrelation D as a misfit criterion, defined as Dui,uic=1-max⁡k{CCkui,uic}, where CCkui,uic=∑i=1n(wiui-kc⋅ui)∑i=1n(wiui-kc)2⋅∑i=1n(wiui)2 is the normalised cross-correlation coefficient and k is the time delay between uic and ui for which the normalised cross-correlation function CCkui,uic takes its maximum value. wi is a window function that allows to select a time window for the cross-correlation measurement. D satisfies three of the four criteria that we desired of a misfit in the last section:

Dui,uic takes the value 0 for identical signals uic≡ui, since CCk=0ui,uic=1.

For ui≠uic, 0<Dui,uic<2, i.e. D values larger than for the case uic≡ui, and Dui,uic increases with decreasing similarity of ui and uic.

If a time shift k′ is small compared to the window length, we have

CCkui,uic≈CCk+k′ui,ui+k′c and thus Dui,uic≈Dui,ui+k′c .Due to the normalisation in Eq. (), D is amplitude-independent:

CCui,uic=CCui,a⋅uic and thus Dui,uic=Dui,a⋅uic

The fourth criterion, discriminative power and robustness against noise is less straightforward to demonstrate. We proceed empirically by showing its superior performance over the ℓ2 and ℓ1 misfits on an example of the kind of waveforms we typically use for source inversion. Figure shows in black a simulated, broadband, noise-free P wave train, recorded at 40∘ epicentral distance. The seismograms were modelled using the WKBJ method of in the IASP91 velocity model , assuming an explosion source with M0=1020 Nm. Since the chosen source depth is shallow (10 km), the P pulse is followed within seconds by depth phases like pP, which effectively permits inversion for source depth. However, once this waveform gets perturbed by realistic modelling error (convolutive) and additive noise, resulting in the red waveform, the fit to the unperturbed original becomes tedious. A meaningful robustness test is as follows: if the perturbed (red) waveform is modelled for different candidate source depths, will the smallest misfit be achieved for the perturbed wave simulated at the correct depth of 10 km? This is a meaningful test of robustness, because source depth tends to be the most challenging parameter to retrieve in source inversions. Algorithmically, the perturbation is done in two steps:

Perturbation by convolution with a “modelling error function” Terror,i, which encompasses effects of Tmodel,i and Tinst,i. It is defined as having a unit amplitude spectrum and a random phase spectrum between 0 and α⋅π/2.um.e.=uic∗Terror,iThis method adds realistic coda to the waveform, which simulates the effects of structure, that was not included in the forward simulation. The parameter α regulates the perturbing effect of the modelling error function.

By adding a band-limited noise termupert=um.e.+βϵ,whereϵ∼N(0,SD),the covariance matrix SD is set to model a band-limited noise with corner frequencies of (1/15,1/6Hz), similar to microseismic background noise at the seismic station. The peak amplitude is normalised to that of uic, so that the parameter β controls the relative amplitude of this noise term.

Figure shows the resulting reference waveform (left) and perturbed waveforms for α=0.4 and β=0.8, i.e. moderate perturbation of the signal and strong background noise. The unperturbed waveform ui is plotted in solid, thin black, the waveform perturbed with modelling error um.e. in dotted blue and the resulting reference trace in solid red. It bears little resemblance to the unperturbed waveform.

The right plot shows the value of the three waveform misfits ℓ1, ℓ2 and D between uic and upert over varying source depths. It simulates an inversion for the depth of an earthquake using seismic waveforms. The waveform contains the P and pP arrival. The depth is mainly constrained by the relative arrival time of the three and the resulting waveform of the whole P-pP wave train. The perturbation of Eq. () adds artificial coda with additional arrivals to the waveform, which a good waveform misfit should be robust against. The misfit should have a distinctively lower value for the “true” depth of 10 km than for any of the others. To take into account the stochastic nature of these perturbations, 500 realisations of upert were calculated for the same parameters, α and β, but with different random numbers. The coloured shades mark the 95 % (2σ) quantiles of the misfit values; the solid line marks the median.

The ℓ2 misfit could not recognise uic in upert anymore and assigns the lowest misfit to a depth of 3 km. An analysis of different noise and perturbation levels shows that the ℓ2 norm is relatively robust against background noise, but not against perturbations from a modelling error; see Fig. S1 in the Supplement. This seems reasonable given the underlying noise model of this misfit.

The ℓ1 norm does better, in that it has a minimum at 9 km depth, close to the true value. The zigzag shape however suggests that the value of 9 km is stochastic. The median value at 9 to 10 km reaches only slightly below the lower quartile for other depths, meaning that in reality the resolution power of the ℓ1 norm for this kind of problem will be very limited. The studies for different noise and perturbation levels show that it is generally more robust against background noise and modelling error than the ℓ2 norm but less so than the cross-correlation coefficient.

The cross-correlation misfit has the strongest difference between the plateau of wrong depth solutions and the true one. For low noise levels, the minimum is slightly wider than the one for the ℓ1 norm. More values of α and β are shown in Fig. S1. The analysis of the confidence intervals shows that the values for CC scatter slightly more than the ones for ℓ2 and much more than for ℓ1. To employ it in Bayesian inference, a detailed analysis of the statistical properties will be necessary. The analysis also shows that the actual values of D are influenced more strongly by the background noise level than by the modelling error. We will use that observation in Sect. .

Figure compares the resolution power of the three misfits for different perturbation levels and signal-to-noise ratios (SNRs). It shows the difference between the misfit value for the true depth 10 km and the average misfit value for the depths between 20 and 30 km. The difference is expressed in numbers of standard deviations (sigmas) from the 500 separate noise realisations. The dashed line shows the result for weak perturbation (α=0.1), and the solid line for strong perturbation (α=0.9). It can be seen that, for strongly perturbed waveforms, the ℓ1 and ℓ2 norm cannot recognise the true depth with more than 2σ, even for high signal-to-noise ratios, while the decorrelation D stays well above 3σ, even for SNRs of 6.

Empirical likelihood function for the signal decorrelation Empirical likelihood function obtained from high-quality, deterministic source estimates

Probability distribution of D, the decorrelation of measured and synthetic P waveforms used for deterministic source inversions. (a) Empirical histogram of D is shown as grey bars. From 200 000 broadband, teleseismic P waveforms for 900 earthquakes, only waveforms with signal-to-noise ratios between 20.0 and 21.0 were considered for this figure (because the scaling parameters of analytic fitting functions depend mainly on SNR). Coloured lines show best-fitting realisations of three analytic probability density distributions: beta (red), exponential (green) and log-normal (blue). The log-normal distribution yields the best fit to data. (b) Quantile–quantile plot for the three candidate distributions of (a) confirms that the log-normal distribution best fits the empirical histogram of D. The values on the x axis are percentiles of the cumulative histogram of D in our dataset. The y axis shows the percentiles of the best-fitting distribution of each class. The closer the percentiles are to the line y=x, the better the fit of the distribution to the underlying data over the entire range of values. Both subfigures indicate that a log-normal distribution best fits the values of D=1-CC.

In seismology, the cross-correlation coefficient CC=1-D has been used as a measure of goodness of fit to detect predicted waveforms in noisy signals , to filter bad recordings, to detect temporal changes in repeating signals e.g. and to estimate the spatial extents of earthquake clusters . It has rarely been used as a misfit criterion in source inversion – we are only aware of and . CC and D=1-CC have not been used in probabilistic inversion, and the main obstacle would have been their unknown statistics.

We present an empirical solution to this problem by drawing on a large, pre-existing database of cross-correlation measurements that we assembled in the context of deterministic source inversions, as described in Section 1. Essentially we assert that our human expert knowledge and extensive experience have generated a large, representative and highly quality-controlled set of 900 teleseismic source parameter estimates that are sufficiently close to the true source parameters to reveal the statistics of the noise in the measurements d these estimates m are based upon. The measurements d consisted of 200 000 cross-correlation coefficients CC obtained from 200 000 broadband fits of observed seismograms to WKBJ synthetics. The synthetic waveforms were calculated using the WKBJ method in velocity model IASP91 , with attenuation and density taken from PREM . To the extent that our source solutions mj approach the true source parameters m0,j, the histogram of the CC (or D=1-CC) values approximate the probability density function of CC (or D) in the presence of noise and modelling errors. Thus we can obtain an “empirical likelihood function” L∗(m|d) even in the absence of an analytically describable noise model. We preface the term “likelihood” by “empirical” because strictly speaking the likelihood would be associated with the noise model on the raw samples i, rather than with the noise on the composite measure D. A similar approach has been adopted independently and recently by in the context of receiver-function inversion. Note that the term “empirical likelihood” has been used differently in statistics .

Our reasoning and procedure can be summed up as follows:

We can consider the measurements of misfit functional Φj(m0|d) for one earthquake at j=1,…,nS recording receivers as realisations of a random process that follows a yet unknown probability density function p(x). m0 are the true source parameters, and any misfit Φj is therefore due to ambient noise and modelling errors in the seismograms, as described in section .

In practice we never get to know m0 but only a (hopefully close) estimate mest, the result of a deterministic source inversion procedure. Hence all we can actually observe is Φ(mest|d), some of which is due to the estimation error mest-m0. However, by estimating mest carefully and repeatedly (for 900 different earthquakes), and by considering the resulting 900 sets of misfits Φ (at 200 000 source–receiver pairs) jointly, the histogram of their 200 000 D values should approximate a histogram of the true Φ(m0|d) as closely as we can hope to get. Figure a shows this empirically obtained histogram Φcumulative of D in grey (for the subset of P seismograms that had a SNR of 20; reason to be discussed).

To evaluate the likelihood of a misfit value Φ′ encountered in a future (Bayesian) inversion, we could in principle compare it to this empirical histogram Φcumulative. It would however be more convenient and computationally efficient to identify an analytic expression for the p(x) that produced this histogram Φcumulative and to evaluate any Φ′ against this p(x).

The best we can do is to identify a suitable type of distribution and fit its parameters to the empirical histogram Φcumulative of Fig. a, thus obtaining a PDF pfit(x) as our best estimate for the true p(x).

The likelihood of a data vector d given model m is then considered to beL∗(m|d)=pfitΦ(d|m).

Approximate log-normal distribution of decorrelation <inline-formula><mml:math display="inline"><mml:mi>D</mml:mi></mml:math></inline-formula>

We will consider three candidate distributions for fitting an analytic pfit(x): beta, exponential and log-normal. They are all positive one-sided (defined only for D>0) and can take negligible values for D>2, where strictly they should be 0. Figure a shows their fits to the empirical histogram after determining the best-fitting scale parameters for each.

The beta and the exponential distributions are seen to overestimate the number of very small D values (i.e. values of CC ≈ 1). Hence these distributions would predict more excellent waveform fits than observed. The likelihood of actually well-fitting waveforms would be estimated too low; i.e. we would be too pessimistic about the achievability of good waveform fits.

The log-normal distribution clearly yields the best approximation of the D histogram. This is confirmed by the quantile–quantile plot of Fig. b. Hence we choose the log-normal distribution to express our likelihood function.

The (univariate) log-normal distribution function is defined by two scale parameters μ and σ: f(x)=1x2πσ2exp⁡-(ln⁡x-μ)22σ2.

The log-normal distribution also yields the best fit to our synthetic data from Sect. , as calculated with the perturbations in Eqs. () and (). See Fig. S4 for a corresponding quantile–quantile plot.

If random variable x in Eq. () is equated with the decorrelation Dj of one waveform j, the logarithm ln⁡(Dj) is normally distributed with mean μ and standard deviation σ. This fortunate link of our empirical D histogram to the Gaussian distribution makes it trivial to express the joint, multivariate distribution of all nS waveform measurements of an earthquake, collecting the Dj in vector D and the inter-station covariances in nS×nS covariance matrix SD.

The nS-variate likelihood function for D becomes LD∗=exp⁡-12ln⁡(D)-μTSD-1ln⁡(D)-μ(2π)n2|det(SD)|, and the misfit becomes Φ=12∑j=1n∑k=1nln⁡(Dj)-μjT(SD-1)jkln⁡(Dk)-μk+12ln⁡((2π)n|det(SD)|). This is the Mahalanobis distance, not between the individual samples of two waveforms ui and uic as in Eq. () but between the decorrelation Dj of these two waveforms and its expected value μj, taking into account correlated noise between two stations in SD.

Thus the use of D as a misfit criterion reduces the number of misfit values to nS per earthquake (the number of source receiver paths, or waveforms) compared to ∑j=1nSnj in the case of the ℓ1 or ℓ2 norms (nj is the number of samples on waveform j). In other words, Dj itself accounts for any correlations across time samples on seismogram j and subsumes them into a single number, leaving only spatial (inter-station) correlations to be dealt with in SD and in the empirical likelihood function L∗.

Distribution coefficients determined by signal-to-noise-ratio

Colour shade map out a two-dimensional histogram of waveform decorrelation D, as a function of waveform SNR along the y axis. All 200 000 waveform measurements from our 900 deterministic source inversions entered this histogram. Black lines are the best-fitting log-normal distributions for SNRs of 10, 20 and 30. (The 1-D histogram for SNR =20 was discussed in Fig. .) Toward smaller SNRs (high-noise conditions), the D distribution widens (more occurrences of poorly fitting waveforms).

Here we describe how μ and SD can be estimated for one earthquake. So far it was implicitly assumed that a single distribution pfit might fit Φcumulative for all source–receiver paths.

This may be an oversimplification since ambient noise levels ϵnoise show significant diurnal and seasonal variations, and are elevated at stations close to coastlines or cities . Hence we might expect goodness of fit to vary across stations, which could be modelled by adjusting the scale parameters of the log-normal distribution for each station. Goodness of fit is also influenced by earthquake magnitude, and by station distance and back azimuth, so we might even require different scale parameters for each source–receiver pair.

To avoid this level of complexity, recall the investigation of Sect. that revealed the distribution of D to be most sensitive to the level of ambient noise ϵnoise. Hence we bin our 200 000 source–receiver pairs by SNR and estimate only one pair of (μ,σ) distribution parameters per SNR bin. This hopefully subsumes all individual sources of random misfit.

SNR is defined as the integrated spectral energy in the signal time window, divided by that of a 120 s noise window prior to the arrival of the first body-wave energy. Signal time windows ui,i=1,…,Nsignal are as follows: for P phase, 5 s before to 20.6 s after its theoretical arrival time in IASP91, on the Z component; for SH phase, 10 s before to 41.2 s after, on the T component. Noise time windows ni,i=1,…,Nnoise are as follows: for both P and SH phases, -150 to -30 s before theoretical arrival time. We calculate SNRs for P and SH waves as SNR=Nnoise∑i=1Nsignalui2Nsignal∑i=1Nnoiseni2. Note that this way the noise window of the P wave measurement contains only ambient noise, whereas the SH wave noise window is in addition afflicted by some signal-generated noise: P coda and phases like PP or PcP, which get scattered into the transverse component due to lateral heterogeneities and anisotropy in the real earth.

Figure shows the D histogram and three fitted probability densities pfit(D), as a function of SNR. Under low-noise conditions (high SNR), the log-normal distributions are narrower and centred on smaller D misfit values, which seems plausible.

By fitting functions of the form h(SNR)=a1+a2⋅exp⁡(a3⋅SNR) to the SNR-binned D histograms, we determined distribution parameters μP(SNR), μSH(SNR), σP(SNR) and σSH(SNR) for SNR ranging from 1 to 1000 for P waveforms and from 1 to 200 for SH waveforms (see Supplement for details).

Hence the log-normal distribution pfit(D) ascribed to a given source–receiver pair depends only on the ambient signal-to-noise ratio of the receiver i, and its scale parameters are given by μi=aμ,1+aμ,2⋅exp⁡(aμ,3⋅SNRi),σi=aσ,1+aσ,2⋅exp⁡(aσ,3⋅SNRi).

The exact values for ai depend on the velocity model and the solution method. Here, we used the WKBJ method, which results in a simplistic crustal response. Other methods, like the spectral-element method, in combination with a waveform database as implemented in Instaseis by may produce more realistic seismograms, resulting in higher average values of D. What matters is that the actual inversion uses exactly the same solver and velocity model as was used to determine the distributions of D.

Estimating inter-station covariances

Correlation in misfit between neighbouring stations. The measured Pearson correlation (see Eq. ) is plotted over the difference in azimuths between two station for the same earthquake. A fit function gb1,b2,b3(ϑ)=b1+b2⋅exp⁡(-b3ϑ2) is plotted in dashed red lines.

Decorrelation values D measured at different stations cannot be expected to be uncorrelated, because systematic modelling errors (due to differences between assumed earth model and true earth, and to methodical inadequacies in the Green's function computations) will affect neighbouring stations in similar ways. A reasonable guess is that stations at similar azimuths from the source would show the strongest correlations because their wave paths have sampled similar parts of the sub-surface, in particular similar parts of the crust and upper mantle – regions to which the strongest modelling errors can be ascribed.

To check these systematics, we calculated the Pearson correlation coefficient r(ϑ) as a function of azimuthal distance ϑ as follows. For each earthquake, we calculated the azimuthal distances ϑjk between all station pairs (j,k) and binned those. A set {j,k}ϑ then contains all stations pairs for one event that have the same azimuthal distance ϑ (in bins of 5∘ width).

We need to adjust for the fact that stations j and k usually have different SNR and hence different μj and σj in their log-normal distributions of D. Hence we calculate the standard score of each station j as zj=ln⁡(Dj)-μj/σj and from this the Pearson correlation coefficient of a ϑ bin {j,k}ϑ, using all nϑ station pairs in that bin: r(ϑ)=1nϑ-1∑{j,k}ϑzjzk. The use of standard scores permits comparison of stations of different SNR and hence log-normal distribution parameters. The values for r(ϑ) are then fit by a function (see Fig. ) g(ϑ)=b1+b2⋅exp⁡(-b3ϑ2).

This permits comparison of Dj for stations with different SNR and distributions of Dj. Then the correlation coefficient was calculated for each azimuthal bin ϑ using all nϑ pairs {i,j}ϑ in this bin.

This azimuth-dependent correlation coefficient g(ϑ) can be used to fill the elements of covariance matrix SD in Eq. (): SD,i,j=σiσj⋅b1+b2⋅exp⁡(-b3ϑ2),i≠jσi2,i=j.

An example of such a covariance matrix is shown in Fig. . It is for the 2011 earthquake in the US state of Virginia that was used as a detailed working example of Bayesian source inversion in the companion paper .

Visualisation of an inter-station covariance matrix SD for misfit D (centre panel; cf. Eq. ), on the example of an mb 5.7 earthquake that occurred in the US state of Virginia in 2011. Two maps for P and SH data show the recording seismic stations as dots; colour fill indicates the SNR of each waveform measurement. Inter-station correlation depends directly on the azimuthal proximity of two stations. This results in a block-diagonal matrix structure for SD, because we have sorted stations by azimuth from the source. Blocks correspond to groups of stations with an expected high correlation of errors: (1) a Northern Hemisphere cluster of P wave measurements (circled in dark red), (2) a South American cluster of P waveforms (green) and (3) a Northern Hemisphere cluster of SH waveforms measurement (olive). P and SH measurements are modelled as being uncorrelated. For the analysis, only stations between 32 and 85∘ epicentral distance have been used, as marked by the dashed lines.

Covariance matrix of the stations for the Virginia event

Misfit distribution of waveform amplitude measurements

Waveform amplitudes have not been considered so far, even though they provide crucial constraints on focal mechanisms. Our amplitude measurement consists of a comparison of the logarithmic energy content ln⁡(A) in a 1 s time window around the peak i=i1,…,i2 of the measured seismogram and its synthetic: Δln⁡(A)j=ln⁡∑i=i1i2uj,i2-ln⁡∑i=i1i2uj,ic2.

Again our goal it to approximate the distribution of this misfit in order to obtain an empirical likelihood function. The distribution of Δln⁡(A) is almost symmetric around 0; see Fig. S2. The amplitude misfit |Δln⁡(A)| approximately follows a Laplace distribution, where parameter k does not vary much with SNR (see Supplement). We construct the likelihood function LAmp∗=∑j=1nS12kexp⁡-|Δln⁡(A)|k, which assumes no correlation in amplitude misfit between two stations. This assumption is not without problems, but motivated by the fact that amplitude errors are often caused by localised site effects.

Application in Bayesian source inversion

In practice these concepts are integrated with the Bayesian source inversion procedure of as follows:

For every new earthquake, download and archive a suitable selection of broadband, three-component, teleseismic seismograms (Δ=32 to 85∘). A pragmatic approach is to use stations from a handful of international, permanent networks (e.g. II, IU, G and GE) to ensure high quality, reliability and relatively even azimuthal coverage, avoiding station clustering in any particular region. This is easily automated using the freely available data management software ObsPyDMT .

Bandpass filter between 0.02 and 1.0 Hz. Rotate horizontal components to the RTZ system. Select signal time windows and noise time windows, and calculate SNR as defined in Eq. ().

For each station, and for P and SH separately, use SNR to calculate distribution parameters μi and σi from Eq. (). Populate the diagonal of covariance matrix SD,ii with the σi2.

Estimate correlation coefficient r(ϑj,k) between two stations (j,k) using Eq. (). Fill off-diagonal elements: SD,jk=r(ϑj,k)σjσk.

Insert μi and SD in the likelihood equation (Eq. ), and combine with LAmp∗ (Fig. ) to create the total likelihood function L∗=LD∗+LAmp∗.

For each source model m proposed by the sampling algorithm, calculate synthetic seismograms and pass them through the filters of step 2. Calculate the empirical likelihood L∗(m|d) (Eq. ), which is multiplied with a suitable prior to obtain a posterior probability for m. Parameterisation of m, Bayesian sampling strategy and construction of the posterior distribution of m are described in the companion paper .

Discussion

The most common approach to Bayesian inversion is to assert a simple noise model for which an analytic likelihood function is known: this determines the measure of misfit. We have gone the opposite route in designing a misfit D based on considerations of robustness and dimensionality reduction. Since no noise model was known, we had to investigate the actual noise statistics and thus derive an empirical noise model and likelihood function from the data D. We were fortunate to find that the (multivariate) log-normal distribution provides the best fit to our decorrelation data because it can be evaluated almost as easily and cheaply as the most favourable of all distributions, the Gaussian (normal) distribution.

In fact, analytic probability densities are known for only a few misfit functionals. By far the most commonly used are the Gaussian (normal) distribution, associated with the ℓ2 norm misfit, and the Laplace distribution, associated with the ℓ1 norm. Evaluating residuals of data fits against these analytic distributions is straightforward and fast, which is important in the computationally expensive Bayesian realm.

In practice however, the adoption of ℓ1 or ℓ2 misfits may be inappropriate or even impossible. Gauss and Laplace functions may be (too-)poor approximations of the actual distributions of data residuals. Even if they can be deemed adequate for some measurements (e.g. for the sample-wise distance of two times series), they may generate huge and non-sparse covariance matrices (because time samples are numerous and correlated), which are difficult to estimate from the data. Even worse in such multivariate scenarios, analytic expressions of the joint distribution functions may not exist – as is the case for the Laplace distribution (ℓ1 norm). Effectively this often leaves as the only “choice” for a noise model the (multivariate) normal distribution – whether or not it fits the data at hand.

More often than not, real data contain many more outliers than expected by the normal distribution, certainly in the case of seismic data. Under the ℓ2 norm, outliers disproportionally bias the solution (deterministic case) or posterior distribution (Bayesian case) and also affect convergence in the Bayesian case. The problem may be mitigated by manual removal of very poorly fitting waveforms, but this is usually time-intensive guesswork and likely to result in other biases.

The ℓ1 norm is more robust against outliers, and with the same motivation distance norms with non-integer exponents ℓp have been proposed and successfully applied, including for source inversion . But all norms with p≠2 share the serious limitation that no analytic expressions are known for the multivariate case.

Samples of real-world, band-limited time series are correlated. If a measured seismogram of length N samples is considered, ui=uic+ϵnoise,i, then an (N×N) covariance matrix for ϵnoise needs to be estimated under the ℓ2 norm. Hierarchical Bayesian methods can be applied to estimate the noise level and covariance from the data itself see), but in many cases it may be more guessed than estimated.

The situation is further complicated if the noise model can no longer be purely additive (“+ϵnoise”). We have argued that our noise model needs to be ui=uic∗Tmodeli∗Tinst,i+ϵnoise,i, where the convolving terms are systematic modelling error. In theory this type of error might be eliminated with computationally powerful waveform forward modelling and more research into detailed earth structure. But since those efforts would be tangential to the problem at hand (source inversion), the cost would seem prohibitive. Hence we do want the option of treating the modelling error as “just another source of noise”, to be accommodated by a more sophisticated noise model, the analytic expression of which will be unknown.

Another reason for leaving the Gaussian or ℓ2 realm might be a change of measurement. In our case, the cross-correlation or decorrelation measurements collapse N×2 samples of two times series into a single scalar CC or D. Even if inter-sample correlations of the time series actually were multivariate Gaussian, the statistics of CC or D would be something more complicated. On the upside, the dimensionality of the multivariate problem is reduced by a factor of N, which helps substantially when forced to take the empirical path toward obtaining a likelihood function. Thus inter-station covariances are the only correlations to estimate, and the fact that they are simple covariances (second moments) is, again, owed to the fortunate fact that the log-normal distribution yielded the best fit to the misfit histogram.

We are not sure whether there is a theoretical reason that the log-normal distribution should be associated with the decorrelation misfit D, and thus effectively with CC. Whatever the case, this finding is highly relevant in that it also opens up the path to Bayesian sampling of other optimisation problems that have previously adopted the cross-correlation coefficient CC of seismograms as their misfit criterion, e.g. other flavours of seismic source inversion , seismic tomography or the estimation of earthquake cluster sizes .

As noted, the proposed empirical likelihood function L∗(m|d) is no likelihood function in a strict sense because it is not derived from the noise on the raw data samples but rather from the noise (i.e. residual) of misfit functional D. For other inverse problems, it has to be evaluated separately, whether or not a noise model exists that can describe the difference between modelled and measured seismograms completely as an additive term. If that is the case, a classical likelihood can be used, but many inverse problems in seismology are similar to the one presented here, and the proposed empirical likelihood offers a path to a more thorough Bayesian treatment. It is just important to remember that the distribution of D has to be determined from synthetic seismograms calculated with the same velocity model and forward solver as it is used for the actual inversion.

Other misfit criteria have been used in optimisation contexts in seismology. For the purpose of source parameter inversion, their noise properties could be investigated along the lines laid out by this work, and their empirical likelihood functions studied. But unless their noise distributions turn out to be as simple as for the D misfit (they would essentially have to follow the normal or log-normal distribution), these other misfit choices will be computationally more costly to sample. It is pleasing that the cross-correlation, long appreciated for its robust performance in deterministic optimisation, is now also vindicated in a Bayesian context by the results of our study.

Conclusions

This paper presents an approach to Bayesian inference using the new misfit criterion of waveform (de)correlation. Decorrelation D greatly reduces the number of data uncertainties and correlations, by collapsing the temporal correlations of samples in a broadband seismogram into a single scalar D, or into n scalars per source estimate, where n is the number of time windows on different seismogram components used to estimate the source parameters of one earthquake. This leaves only nS inter-station correlations to be determined, and we show how they depend on the SNR of the D measurements and on the azimuthal distances of seismic stations. The noise on D turns out to have simple characteristics, approximately following an nS-variate log-normal distribution, a finding that renders the formulation of the likelihood function for D straightforward.

This opens the way for the methodically correct Bayesian sampling of parameter estimation problems that use the cross-correlation CC or decorrelation D=1-CC of seismological broadband waveforms as their measure of data (mis)fit – including not only our source inversion procedure PRISM but also certain flavours of waveform tomography or earthquake cluster analysis. In terms of data dimensionality reduction the present work complements its companion , which focused on reducing the dimensionality of model parameters to a number amenable to Bayesian sampling. It can also serve as a template for the empirical derivation of noise models and likelihood functions for other misfit measures on broadband seismograms.

Data availability

The analysis has been performed on publicly available seismological data. All waveform data came from the IRIS and ORFEUS data management centres.

The Supplement related to this article is available online at doi:10.5194/se-7-1521-2016-supplement.

Simon C. Stähler conceived of the concept of the empirical likelihood in source inversion and did the data analysis. Karin Sigloch wrote the original source inversion code and created the earthquake database for the correlation misfit statistics. Both authors shared in the writing of the paper.

Acknowledgements

We thank M. Sambridge, R. Zhang, H. Igel and B. L. N. Kennett for fruitful discussions in an earlier stage of the work. T. Bodin and C. Tape helped improve the paper in the review process. Simon C. Stähler was supported by the Munich Centre of Advanced Computing (MAC) of the International Graduate School of Science and Engineering (IGSSE) at Technische Universität München. IGGSE also funded his research stay at the Research School for Earth Sciences at the Australian National University in Canberra, where part of this work was carried out. Karin Sigloch acknowledges funding by ERC Grant 639003 “DEEPTIME”, and Marie Curie CIG grant RHUM-RUM. This work was supported by the German Research Foundation (DFG) and the Technische Universität München within the funding programme Open Access Publishing. Edited by: C. Krawczyk Reviewed by: T. Bodin and C. Tape

References Bodin(2010)

Bodin, T.: Transdimensional Approaches to Geophysical Inverse Problems, Ph.D. thesis, Australian National University, 2010.

Bodin et al.(2012)

Bodin, T., Sambridge, M., Rawlinson, N., and Arroucau, P.: Transdimensional tomography with unknown data noise, Geophys. J. Int., 189, 1536–1556, 10.1111/j.1365-246X.2012.05414.x, 2012.

Bodin et al.(2016)

Bodin, T., Leiva, J., Romanowicz, B., Maupin, V., and Yuan, H.: Imaging anisotropic layering with Bayesian inversion of multiple data types, Geophys. J. Int., 206, 605–629, 10.1093/gji/ggw124, 2016.

Bogert(1962)

Bogert, B.: Correction of seismograms for the transfer function of the seismometer, Bull. Seismol. Soc. Am., 52, 781–792, 1962.

Bondár and Storchak(2011)

Bondár, I. and Storchak, D. A.: Improved location procedures at the International Seismological Centre, Geophys. J. Int., 186, 1220–1244, 10.1111/j.1365-246X.2011.05107.x, 2011.

Chapman(1978)

Chapman, C. H.: A new method for computing synthetic seismograms, Geophys. J. R. Astron. Soc., 54, 481–518, 1978.

Dettmer et al.(2014)

Dettmer, J., Benavente, R., Cummins, P. R., and Sambridge, M.: Trans-dimensional finite-fault inversion, Geophys. J. Int., 199, 735–751, 10.1093/gji/ggu280, 2014.

Duputel et al.(2012)

Duputel, Z., Rivera, L., Fukahata, Y., and Kanamori, H.: Uncertainty estimations for seismic source inversions, Geophys. J. Int., 190, 1243–1256, 10.1111/j.1365-246X.2012.05554.x, 2012.

Duputel et al.(2014)

Duputel, Z., Agram, P. S., Simons, M., Minson, S. E., and Beck, J. L.: Accounting for prediction uncertainty when inferring subsurface fault slip, Geophys. J. Int., 197, 464–482, 10.1093/gji/ggt517, 2014.

Dziewoński and Anderson(1981)

Dziewoński, A. M. and Anderson, D. L.: Preliminary reference Earth model, Phys. Earth Planet. Inter., 25, 297–356, 10.1016/0031-9201(81)90046-7, 1981.

Gilks et al.(1996)Gilks, Richardson, and Spiegelhalter

Gilks, W. R., Richardson, S., and Spiegelhalter, D. J.: Markov Chain Monte Carlo in Practice, Chapman & Hall/CRC, London, 1996.

Hosseini and Sigloch(2015)

Hosseini, K. and Sigloch, K.: Multifrequency measurements of core-diffracted P waves (Pdiff) for global waveform tomography, Geophys. J. Int., 203, 506–521, 10.1093/gji/ggv298, 2015.

Houser et al.(2008)

Houser, C., Masters, G., Shearer, P. M., and Laske, G.: Shear and compressional velocity models of the mantle from cluster analysis of long-period waveforms, Geophys. J. Int., 174, 195–212, 10.1111/j.1365-246X.2008.03763.x, 2008.

Kanamori and Given(1981)

Kanamori, H. and Given, J. W.: Use of long-period surface waves for rapid determination of earthquake-source parameters, Phys. Earth Planet. Inter., 27, 8–31, 10.1016/0031-9201(81)90083-2, 1981.

Käufl et al.(2013)

Käufl, P., Fichtner, A., and Igel, H.: Probabilistic full waveform inversion based on tectonic regionalization – development and application to the Australian upper mantle, Geophys. J. Int., 193, 437–451, 10.1093/gji/ggs131, 2013.

Kennett and Engdahl(1991)

Kennett, B. L. N. and Engdahl, E. R.: Traveltimes for global earthquake location and phase identification, Geophys. J. Int., 105, 429–465, 10.1111/j.1365-246X.1991.tb06724.x, 1991.

Kikuchi and Kanamori(1991)

Kikuchi, B. Y. M. and Kanamori, H.: Inversion of complex body waves – III, Bull. Seismol. Soc. Am., 81, 2235–2350, 1991.

Kotz et al.(2001)

Kotz, S., Kozubowski, T. J., and Podgórski, K.: The Laplace Distribution and Generalizations: A Revisit With Applications to Communications, Economics, Engineering, and Finance, Springer, 2001.

Kozubowski et al.(2013)

Kozubowski, T. J., Podgórski, K., and Rychlik, I.: Multivariate generalized Laplace distribution and related random fields, J. Multivar. Anal., 113, 59–72, 10.1016/j.jmva.2012.02.010, 2013.

Kristekova et al.(2006)

Kristekova, M., Kristek, J., Moczo, P., and Day, S. M.: Misfit Criteria for Quantitative Comparison of Seismograms, Bull. Seismol. Soc. Am., 96, 1836–1850, 10.1785/0120060012, 2006.

Kummerow(2010)

Kummerow, J.: Using the value of the crosscorrelation coefficient to locate microseismic events, Geophysics, 75, MA47, 10.1190/1.3463713, 2010.

Larose et al.(2010)

Larose, E., Planès, T., Rossetto, V., and Margerin, L.: Locating a small change in a multiple scattering environment, Appl. Phys. Lett., 96, 2010–2012, 10.1063/1.3431269, 2010.

Mahalanobis(1936)

Mahalanobis, P. C.: On the generalized distance in statistics, Proc. Natl. Inst. Sci. India, 2, 49–55, 1936.

Malinverno and Briggs(2004)

Malinverno, A. and Briggs, V. A.: Expanded uncertainty quantification in inverse problems: Hierarchical Bayes and empirical Bayes, Geophysics, 69, 1005–1016, 10.1190/1.1778243, 2004.

Marson-Pidgeon and Kennett(2000)

Marson-Pidgeon, K. and Kennett, B. L. N.: Source depth and mechanism inversion at teleseismic distances using a neighborhood algorithm, Bull. Seismol. Soc. Am., 90, 1369–1383, 10.1785/0120000020, 2000.

Menke(1999)

Menke, W.: Using waveform similarity to constrain earthquake locations, Bull. Seismol. Soc. Am., 89, 1143–1146, 1999.

Menke et al.(1990)

Menke, W., Lerner-Lam, A. L., Dubendorff, B., and Pacheco, J. F.: Polarization and coherence of 5 to 30 Hz seismic wave fields at a hard-rock site and their relevance to velocity heterogeneities in the crust, Bull. Seismol. Soc. Am., 80, 430–449, 1990.

Mustać and Tkalčić(2016)

Mustać, M. and Tkalčić, H.: Point source moment tensor inversion through a Bayesian hierarchical model, Geophys. J. Int., 204, 311–323, 10.1093/gji/ggv458, 2016.

Owen(1988)

Owen, A. B.: Empirical likelihood ratio confidence intervals for a single functional, Biometrika, 75, 237–249, 10.1093/biomet/75.2.237, 1988.

Peterson(1993)

Peterson, J.: Observations and Modeling of Seismic Background Noise, Tech. Rep., USGS, Albuquerque, New Mexico, 1993.

Sambridge(1999)

Sambridge, M.: Geophysical inversion with a neighbourhood algorithm – II. Appraising the ensemble, Geophys. J. Int., 138, 727–746, 10.1046/j.1365-246x.1999.00900.x, 1999.

Sambridge and Kennett(2001)

Sambridge, M. and Kennett, B. L. N.: Seismic event location: nonlinear inversion using a neighbourhood algorithm, Pure Appl. Geophys., 158, 241–257, 10.1007/PL00001158, 2001.

Scheingraber et al.(2013)

Scheingraber, C., Hosseini, K., Barsch, R., and Sigloch, K.: ObsPyLoad: A Tool for Fully Automated Retrieval of Seismological Waveform Data, Seismol. Res. Lett., 84, 525–531, 10.1785/0220120103, 2013.

Schimmel(1999)

Schimmel, M.: Phase cross-correlations: design, comparisons and applications, Bull. Seismol. Soc. Am., 89, 1366–1378, 1999.

Sigloch(2011)

Sigloch, K.: Mantle provinces under North America from multifrequency P wave tomography, Geochemistry, Geophys. Geosystems, 12, Q02W08, 10.1029/2010GC003421, 2011.

Sigloch and Mihalynuk(2013)

Sigloch, K. and Mihalynuk, M. G.: Intra-oceanic subduction shaped the assembly of Cordilleran North America, Nature, 496, 50–56, 10.1038/nature12019, 2013.

Sigloch and Nolet(2006)

Sigloch, K. and Nolet, G.: Measuring finite-frequency body-wave amplitudes and traveltimes, Geophys. J. Int., 167, 271–287, 10.1111/j.1365-246X.2006.03116.x, 2006.

Sigloch et al.(2008)

Sigloch, K., McQuarrie, N., and Nolet, G.: Two-stage subduction history under North America inferred from multiple-frequency tomography, Nat. Geosci., 1, 458–462, 10.1038/ngeo231, 2008.

Stähler and Sigloch(2014)

Stähler, S. C. and Sigloch, K.: Fully probabilistic seismic source inversion – Part 1: Efficient parameterisation, Solid Earth, 5, 1055–1069, 10.5194/se-5-1055-2014, 2014.

Stähler et al.(2012)

Stähler, S. C., Sigloch, K., and Nissen-Meyer, T.: Triplicated P-wave measurements for waveform tomography of the mantle transition zone, Solid Earth, 3, 339–354, 10.5194/se-3-339-2012, 2012.

Stutzmann et al.(2009)

Stutzmann, E., Schimmel, M., Patau, G., and Maggi, A.: Global climate imprint on seismic noise, Geochemistry, Geophys. Geosystems, 10, Q11016, 10.1029/2009GC002619, 2009.

Tape et al.(2009)

Tape, C., Liu, Q., Maggi, A., and Tromp, J.: Adjoint tomography of the southern California crust, Science, 325, 988–92, 10.1126/science.1175298, 2009.

Tape and Tape(2015)

Tape, W. and Tape, C.: A uniform parametrization of moment tensors, Geophys. J. Int., 202, 2074–2081, 10.1093/gji/ggv262, 2015.

Tape and Tape(2016)

Tape, W. and Tape, C.: A confidence parameter for seismic moment tensors, Geophys. J. Int., 205, 938–953, 10.1093/gji/ggw057, 2016.

Tarantola and Valette(1982)

Tarantola, A. and Valette, B.: Inverse problems = quest for information, J. Geophys., 50, 159–170, 10.1016/j.pepi.2016.05.012, 1982.

Vallée and Douet(2016)

Vallée, M. and Douet, V.: A new database of Source Time Functions (STFs) extracted from the SCARDEC method, Phys. Earth Planet. Int., 257, 149–157, 2016.

Vallée et al.(2011)

Vallée, M., Charléty, J., Ferreira, A. M. G., Delouis, B., and Vergoz, J.: SCARDEC: a new technique for the rapid determination of seismic moment magnitude, focal mechanism and source time functions for large earthquakes using body-wave deconvolution, Geophys. J. Int., 184, 338–358, 10.1111/j.1365-246X.2010.04836.x, 2011.

van Driel et al.(2015)

van Driel, M., Krischer, L., Stähler, S. C., Hosseini, K., and Nissen-Meyer, T.: Instaseis: instant global seismograms based on a broadband waveform database, Solid Earth, 6, 701–717, 10.5194/se-6-701-2015, 2015.

</app></app-group></back> </article>