We present an efficient probabilistic workflow for the estimation of source parameters of induced seismic events in three-dimensional heterogeneous media. Our workflow exploits a linearized variant of the Hamiltonian Monte Carlo (HMC) algorithm. Compared to traditional Markov chain Monte Carlo (MCMC) algorithms, HMC is highly efficient in sampling high-dimensional model spaces. Through a linearization of the forward problem around the prior mean (i.e., the “best” initial model), this efficiency can be further improved. We show, however, that this linearization leads to a performance in which the output of an HMC chain strongly depends on the quality of the prior, in particular because not all (induced) earthquake model parameters have a linear relationship with the recordings observed at the surface. To mitigate the importance of an accurate prior, we integrate the linearized HMC scheme into a workflow that (i) allows for a weak prior through linearization around various (initial) centroid locations, (ii) is able to converge to the mode containing the model with the (global) minimum misfit by means of an iterative HMC approach, and (iii) uses variance reduction as a criterion to include the output of individual Markov chains in the estimation of the posterior probability. Using a three-dimensional heterogeneous subsurface model of the Groningen gas field, we simulate an induced earthquake to test our workflow. We then demonstrate the virtue of our workflow by estimating the event's centroid (three parameters), moment tensor (six parameters), and the earthquake's origin time. Using the synthetic case, we find that our proposed workflow is able to recover the posterior probability of these source parameters rather well, even when the prior model information is inaccurate, imprecise, or both inaccurate and imprecise.

The need to understand earthquake source mechanisms is an essential aspect in fields as diverse as global seismology

For the purpose of monitoring induced seismicity, arrays of seismometers can be installed over the exploration area. The waveforms recorded by these
seismometers can subsequently be exploited to characterize the induced events. For example, the time of the first arrival (typically the direct
P wave) is sensitive to the earthquake hypocenter and origin time. There are many inversion algorithms that exploit first arrivals to obtain estimates
of earthquake hypocenters and origin times, such as the double-difference

In terms of computational efficiency, each combination of a specific inversion algorithm and a specific subsurface model has both advantages and
disadvantages. In general, the main advantage of using a probabilistic approach is that the output does not consist of a single set of (source) model
parameters that minimizes an objective function, but the posterior distribution

The algorithm used in our workflow is the Hamiltonian Monte Carlo (HMC) algorithm, which, for sampling high-dimensional posterior distributions, has
been shown to be significantly more efficient than the conventional probabilistic Metropolis–Hastings family of algorithms

Due to the higher frequencies present in recordings of induced events, the wavelengths are significantly shorter. Layers of sediment–basin infill
close to the Earth's surface may exacerbate this, since velocities usually decrease rapidly in this case. Shorter wavelengths matter because, other
things being equal, they increase nonlinearity. In essence, however, the degree to which the relation between the source parameters and the recorded
waveforms is nonlinear depends on the ratio between the nominal event–receiver separation and the wavelength. For example, consider (i) an induced
seismic event at 3

In this study, the absence of a well-constrained prior and an increase in nonlinearity receive significant attention. First, the challenge of a weaker prior is met by means of a workflow in which the initial prior is updated before running the HMC algorithm. In addition, multiple chains of the HMC variant are run sequentially, with the results of the current chain serving as priors for the next chain. This iterative HMC is meant to provide improved prior information resulting in an adequate linear approximation. We demonstrate the validity of our workflow using data from a synthetically generated induced earthquake, which was simulated using the velocity model of the Groningen subsurface. It should be understood that the proposed workflow is of interest for the characterization of induced seismic events in general. The Groningen case is merely chosen because of the quality and density at which the induced wavefields are sampled and the relatively high resolution of the available velocity model.

The Groningen gas field is one of the largest gas reservoirs in Europe. Since production began in 1963, more than 2115 billion

In what follows, we first introduce the forward problem of obtaining surface displacements (recorded wavefields) due to induced seismic source activity, including the description of a seismic source in terms of elementary moment tensors. Subsequently, we introduce the Bayesian formulation and detail the linearized HMC algorithm. Afterward, we proceed with the description and implementation of our workflow, which involves several steps that are specific to the characterization of induced seismic sources. We then test the proposed workflow using synthetic recordings of an induced earthquake source. We end by giving a perspective discussion of our results, including an outlook of applying our workflow to actual field recordings of induced earthquakes from the Groningen gas field.

As with all Markov chain Monte Carlo algorithms, HMC involves an evaluation of forward-modeled data against observed data. In our case, this
evaluation is between (forward) modeled surface displacement and observed displacement. Specifically, we compute synthetic displacement seismograms

Instead of repeatedly computing

To facilitate the computation of seismograms for a specific

Under the assumption that each of these elementary moment tensors has the same time dependence (e.g., in the case of pure shear, this would imply
that faulting occurs along a straight “trajectory”), a specific

Consequently, we obtain

In practice, all

The HMC algorithm originated from the field of classical mechanics and its application to statistical mechanics

Similar to other probabilistic algorithms, HMC is deployed in the context of Bayesian inference. The objective of Bayesian inference is to obtain an
estimate of the posterior probability distribution

The HMC algorithm relies on the sequential calculation of two quantities. These are the potential energy

A model

We parenthetically coined

By sequentially evaluating Eqs. (

Comparison between the sampling strategy of the

In Fig.

Assuming Gaussian-distributed, uncorrelated, and coinciding data variance

In our context,

In our workflow, most of the computational burden in running HMC involves the evaluation of Eq. (

Substituting this linearized expression in Eq. (

Differentiating Eq. (

Because the displacement depends linearly on the moment tensor components (see Eqs.

In practice, the elementary seismograms discussed in Sect.

Scenario of an induced earthquake in the Groningen area.

Comparison between elementary seismograms due to a source at the actual location (red star) and the receiver at G094 (green) as well as the elementary seismograms resulting from the implementation of source–receiver reciprocity (yellow). The equality of the traces confirms successful implementation of source–receiver reciprocity. Along the vertical axis, all six (independent) elementary seismograms are depicted.

To confirm the successful implementation of source–receiver reciprocity, we simulate a scenario of an induced event in the Groningen gas reservoir
(Fig.

We integrate the above HMC variant into our workflow by implementing a leapfrog algorithm for evaluating Eq. (

The performance of the linearized HMC variant strongly depends on the prior means (see Eq.

We test our workflow for an induced event shown in Fig.

Before running the first HMC chain, we need to estimate the initial prior means and variances. In short, we propose an approach in which a
first-arrival-based algorithm is used to estimate the centroid. Subsequently, the origin time can be estimated, after which Eq. (

Numerous algorithms exist that allow one to estimate an earthquake's hypocenter and/or centroid. Here we propose using first-arrival-based algorithms
for this purpose since these are computationally more efficient than waveform-based algorithms. First-arrival-based algorithms only require the
computation of the P- and S-wave arrival times, and by adopting a high-frequency approximation

As an alternative to using a first-arrival-based algorithm, the prior means of the centroid can instead be retrieved from the literature if it
exists. For example, in the case of the induced seismicity in the Groningen field,

Given a centroid prior mean that was either calculated or acquired from the literature, the prior mean of the origin time can be estimated by computing
the P-wave travel times from a centroid prior to each of the receivers. These travel times can be computed using the same eikonal solver that was used
to obtain the centroid prior

To refine the initial origin-time prior estimate, we cross-correlate the envelope of the observed seismograms

We test Eq. (

The results of estimating the prior mean of origin time using Eq. (

Given arbitrary MT components, we show in Fig.

Having sufficiently accurate prior means for the centroid and origin time, we then estimate the prior mean of the MT. For this purpose, we keep the
centroid and origin time constant but solve for the remaining six parameters (the independent MT elements). In Sect.

It should be understood that Eq. (

In practice, the prior means resulting from Eq. (

Full workflow of our iterative HMC scheme.

In Fig.

The determination of the initial prior consists of the following four steps.

Estimate the initial prior mean for the centroid, either by running a first-arrival-based probabilistic inversion algorithm or by extracting it from existing literature.

Estimate the initial prior mean of the origin time using (P-wave) travel times from the centroid obtained in step 1 to the receiver
locations. This estimate is refined by evaluating Eq. (

Estimate the initial prior mean of the MT by fixing centroid and origin time to their prior means (steps 1 and 2) and solving
Eq. (

Determine the standard deviation for each of the 10 model parameters: centroid (3), origin time, and moment tensor (6). These standard
deviations are needed to construct our first mass matrix

Now that the (initial) prior means and standard deviations are determined, the HMC variant is run iteratively up to

Collect the prior means and associated standard deviations, and construct the mass matrix

Run a new HMC chain with a preset number of iterations and burn-in period. Note that for each chain, the results are stored for latter use.

Collect the results. The means and standard deviations will serve as input of for the next iteration (see step 5).

Results of our iterative HMC scheme for a total of 20 chains, each involving 2500 steps, the first 500 of which are discarded as burn-in samples (not shown). The red lines are the true values.

After a total of

For each of the

Define a VR threshold. Posteriors associated with

A total of 10 two-dimensional marginal probability densities of the inverted model parameters. Black dots are all the samples given the results from all chains (Fig.

The final marginal posterior distributions (green samples in Fig.

Comparison between the true seismograms (red), observed (true

We use the above workflow to estimate the parameters of the synthetic event shown in Fig.

The above workflow might not be optimal if the initial prior information is “weak” in the sense that the initial centroid prior mean deviates
significantly from the true value. This is due to the fact that our forward problem is in essence a nonlinear problem, whereas the adopted
linearization (Sect.

To showcase the effect of weak prior information in the context of induced seismicity in the Groningen gas field, we re-use the synthetic earthquake
in Fig.

Summary of running the workflow using the 25 initial prior means. The distance between each of the initial centroid prior means and the true centroid is indicated in

The same 10 two-dimensional marginal probability densities as in Fig.

Marginal posterior distributions for each model parameter given the selected chains (depicted as green dots in Fig.

Overall, given our 5

Using synthetic events, we demonstrate that the proposed probabilistic workflow is able to efficiently estimate the posterior probability of the
various parameters describing induced seismic events. A number of caveats need to be made though. First, the synthetic recordings used to test our
probabilistic workflow are the result of propagating a wavefield through the very same velocity model as the one used to estimate the posterior (i.e.,
the velocity model in our probabilistic workflow). In application to field data, this would obviously not be the case. Part of the misfit between
modeled recordings and observed recordings would then be the result of discrepancies between the true velocity model and the employed numerical
velocity model. Second, and in the same vein, we employed the same code (SPECFEM3D-Cartesian) for generating the synthetic recordings as for modeling
the wavefield in the probabilistic workflow. And although this code is known to be rather accurate

The aforementioned deviation of the available numerical velocity model from the true subsurface velocities will pose a number of challenges. First,
the estimated posterior probability would give a lower bound in terms of the variability of the source parameters: inaccuracies in the velocity model
necessarily imply broader posterior probabilities. Second, in the presence of strong anisotropy, the posterior could be adversely affected. In
particular, in the case of non-pure shear mechanisms this effect could be significant

Our workflow includes a systematic approach to obtain meaningful initial priors, which is particularly important for the employed HMC variant: the
linearization of the forward problem around the prior mean requires the initial priors to be sufficiently close to the true event
location. Furthermore, we show that by using an iterative scheme, we can update the prior mean such that convergence is obtained to a centroid
location that allows the estimation of a meaningful posterior. The iterative scheme involves sequentially updating the prior mean of each new HMC
chain using the posterior estimate obtained from the previous HMC chain. This approach is based on the suggestion of

Prior to executing the workflow, one needs to compile a database of the elementary seismograms, which often requires significant computing power. In
our case, it took about one day to generate the database using one node of our computer cluster that consists of 24 CPU cores (Intel(R) Xeon(R) CPU
E5-2680 v3 at 2.50

We would like to emphasize that our workflow is, in principle, not limited to inversions of the parameters we use here. We could extend our
probabilistic inversion to parameters such as stress drop, velocity, or inverting for finite fault source parameters. Furthermore, it is important to
mention that our workflow aims to invert seismic source parameters using seismic surface recordings in a specific frequency range. That is, it is
specifically geared towards inverting for induced seismic events. We found that the workflow works well when applied to data with frequencies between
1 and 3

The seismogram database was generated using the spectral element solver SPECFEM3D-Cartesian available at

The Groningen field subsurface models used in this study are available at

LOMM conceptualized the methodology, performed the inversion, prepared the figures, and wrote the initial draft. TC helped in generating the elementary seismogram database and edited the paper. CW helped in developing the methodology and substantially improved the draft. All authors edited the paper.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank Andreas Fichtner and Tom Kettlety for their insightful reviews.

This research has been supported by the project DeepNL, funded by the Netherlands Organization for Scientific Research (NWO) (grant no. DEEP.NL.2018.048).

This paper was edited by Tarje Nissen-Meyer and reviewed by Andreas Fichtner and Tom Kettlety.