We present a series of new open-source deep-learning algorithms to accelerate Bayesian full-waveform point source inversion of microseismic events. Inferring the joint posterior probability distribution of moment tensor components and source location is key for rigorous uncertainty quantification. However, the inference process requires forward modelling of microseismic traces for each set of parameters explored by the sampling algorithm, which makes the inference very computationally intensive. In this paper we focus on accelerating this process by training deep-learning models to learn the mapping between source location and seismic traces for a given 3D heterogeneous velocity model and a fixed isotropic moment tensor for the sources. These trained emulators replace the expensive solution of the elastic wave equation in the inference process.

We compare our results with a previous study that used emulators based on Gaussian processes to invert microseismic events. For fairness of
comparison, we train our emulators on the same microseismic traces and using the same geophysical setting. We show that all of our models provide
more accurate predictions,

The monitoring of microseismic events is crucial to understand induced seismicity and to help quantify seismic hazards caused by human activity

Seismic inversion for earthquake location has traditionally been based on the minimization of a misfit function between theoretical and observed
travel times

Bayesian inference has been successfully used to locate earthquakes and to estimate moment tensors

Ideally, the inversion could be carried out jointly for the moment tensor components and the location of the microseismic event

To overcome this issue,

In this paper, we build on the method developed in D18 by training multiple generative models, based on deep-learning algorithms, to learn to predict
the seismic traces corresponding to a given source location for fixed moment tensor components. Similar to D18, we consider an isotropic moment
tensor for our sources; a follow-up paper

In Sect.

In this section we describe the deep generative models that we train as emulators of the seismic traces given their source location. Our final goal is to develop fast algorithms that can learn the mapping between source location and seismic traces recorded by receivers in a geophysical domain.

We start in Sect.

Training, validation, and testing procedures for our generative models are described in Sect.

In order to train fast emulators to replace the simulation of microseismic traces for a given source location we need to generate representative
examples of the seismograms to be learnt, given a fixed velocity model for the geophysical scenario considered. The complexity of the forward
modelling of seismic traces by means of, e.g. pseudo-spectral methods

Example of preprocessing applied to the seismograms. We consider one random reference seismogram, shown in black in

To preprocess our seismograms we first identify the maximum positive amplitude

A consequence of this type of preprocessing is that, in order to recover the original seismograms, one also needs to learn the coefficients

Schematic of the generic framework for seismograms emulation developed in this work. Two Gaussian processes (GP) are trained to learn the preprocessing parameters

Schematic of the seven proposed algorithms to learn the mapping between coordinates

When performing GP regression given a generic function

Here we present in detail the algorithms we developed for emulation of the seismic traces. Given a set of coordinates

The first method we propose is a simple direct mapping between source location and preprocessed seismograms, without any intermediate data
compression. The mapping is learnt by a fully connected neural network, which consists of a stack of layers, each made of a certain number of
neurons. Each layer maps the input of the previous layer

For the specific application considered in Sect.

The second method proposed makes use of a signal compression stage prior to the emulation step. We first perform principal component analysis (PCA) of
the preprocessed seismograms in the training set. PCA is a technique for dimensionality reduction performed by eigenvalue decomposition of the data
covariance matrix. This identifies
the principal vectors, maximizing the variance of the data when projected onto those vectors. The projections of each data point onto the principal
axes are the “principal components” of the signal. By retaining only a limited number of these components, discarding the ones that carry less
variance, one achieves dimensionality reduction. For example, in our application to the test case described in Sect.

Once PCA has been performed on the training set, as an alternative to a neural network one can train multiple GPs to learn the mapping between the
source coordinates and the PCA coefficients. We train one GP for each PCA component. Figure

Typical architecture of an autoencoder. A bottleneck architecture allows for the compression of the input signal into a central layer through the “encoder” part of the network (in red). The central layer is characterized by fewer nodes than the input one, thus leading to dimensionality reduction on condition that the “decoder” part (in blue) can efficiently reconstruct the input signal (to a good degree of accuracy) starting from the central encoded features. In this schematic we highlight that training of the autoencoder is performed by feeding a seismogram to the encoder, and then comparing the output of the decoder with the same input seismogram. Once the autoencoder has been trained, the encoder can be removed, and the decoder can be used as a generative model for the seismograms, inputting some encoded features.

An autoencoder (AE) is a neural network with equal number of neurons in the input and output layers, trained to reproduce the input in the output

Once the AE has been trained, the new input signals can be compressed into the central features of the AE. Our aim is to learn the mapping between the
source coordinates and these features. For example, this can be achieved with an additional neural network. Once this NN is trained, it can be used to
generate new encoded features of the AE from new coordinates, decoding new features into preprocessed seismograms. This procedure is summarized in
Fig.

In our test case of Sect.

Similar to what we did with the PCA+GP method described in Sect.

In general, the encoded features in the latent space of an autoencoder have no specific structure, as the only requirement is for the reconstructed
data points to be similar to the input points. However, it is possible to enforce a desired distribution over the latent space, which is driven by our
preliminary knowledge of the problem and is therefore called a prior distribution. This is one of the advantages of variational autoencoders

In simple terms, when maximizing the objective in Eq. (

VAEs can be used both as a compression algorithm and a generative method. Since we want to map source coordinates to seismograms, we choose to employ
a supervised version of VAEs, called conditional variational autoencoders

Schematic of the architecture of the conditional variational autoencoder used in this work. A “funnel-like” structure analogous to the simple autoencoder described in Fig.

In our analysis, we set a latent space size of

One of the main lines of research in generative models is based on generative adversarial networks

More formally, we can define a value function as follows:

In practice, despite generating sharp images, GANs have proved to be quite unstable at training time. Moreover, it has been shown how vanilla GANs are
prone to mode collapse, where the generator only focuses on a few modes of the data distribution and yields new samples with low diversity

Schematic of the Wasserstein generative adversarial network – gradient penalty described in Sect.

Many alternatives to vanilla GANs have been proposed to address these issues. We focus here on Wasserstein GANs – gradient penalty

In our experiments, we chose

Finally, note that, in this case only, we standardized the data

We describe here the methodology followed to train our models and test their accuracy. We remark that the training and testing of any machine learning
algorithm should be performed on a case-by-case basis, in order to match the accuracy requirements dictated by the specific problem considered (in
this case, the emulation of seismic traces given a certain velocity model). For concreteness, we present here the details of training and testing our
models for application to the test case in Sect.

All our models are trained on the same 2000 simulated events used in D18. For optimization and testing purposes, we divide the remaining 2000 samples (from the pool of 4000 events generated in total by D18) into a validation set and a testing set of 1000 events each. Differently from D18, in this paper we use a validation set to tune the hyperparameters of our deep-learning models. To provide an unbiased estimate of the performance of the final tuned models, we quote our definitive results evaluating the accuracy of each model on the testing set, which is never “seen” by the model at any point in the training or optimization procedures.

Similar to D18, our accuracy performance is quantified in terms of the

The 2D correlation coefficient ^{®} Core™ i7-8750H CPU @ 2.20

When training our NNs, all implemented in

Since one of our goals is to compare our new emulation methods with the one previously developed in D18, we train and test them on the same geophysical scenario considered there. To train and test our algorithms we use the same microseismic traces that were forward modelled in D18 for training and testing purposes.

We briefly recap here the characteristics of the simulated geophysical domain and microseismic traces, referring to D18 for further details. We
consider a geophysical framework where we record seismic traces in a marine environment. Sensors are placed at the seabed to record both pressure and
three-component particle velocity of the propagating medium. As was the case in D18, we assume that our recorded seismic traces are generated by
explosive isotropic sources. For isotropic sources, considering only the pressure wave and ignoring the particle velocity is sufficient to determine
the location of the event in the studied domain, as shown in D18. We consider the seismic traces to be noiseless when building the emulator, while
some noise is added to the simulated recorded seismogram when inferring the coordinates' posterior distribution, as we will show in
Sect.

Density, P-wave velocity, and S-wave velocity models of the simulated domain used in this work. The models are specified as 3D grids of voxels. They show a layered structure with strong variability along the vertical axis. However, a smaller degree of variability is also present across the horizontal plane; hence, the models are effectively 3D models that cannot be accurately simulated as a 1D layered model.

Forward simulations of seismic traces are obtained by solving the elastic wave equation given a 3D heterogenous velocity and density model for the
propagating medium, shown in Fig.

In this section we summarize our main findings. We start in Sect.

In Table

Considerations of accuracy in terms of reconstructed seismograms are important for applications to posterior inference analysis, to avoid
biases and/or misestimates of the uncertainty associated to the inferred parameters. In our case achieving higher accuracy is crucial to guarantee
unbiased and accurate estimation of the microseismic source location. For this reason, in Table

All of our new methods provide a

The NN direct model, described in Sect.

Comparison of the reconstruction accuracy of different emulation methods on three random seismograms from the testing set (dashed black lines), whose coordinates are reported on top of each panel. The seismograms record the vertical component of motion at the receiver placed on the point with coordinates (0.5

Figure

Speed considerations are also important when evaluating the performance of the models. In general, applications of deep learning to Bayesian
analysis may often be possible only making use of high-performance computing (HPC) infrastructure. If this is not available, applications to real
parameter estimation frameworks may be fatally compromised. Therefore, it is important to notice that all our proposed models can be efficiently run
on a simple laptop, without the need of any HPC platform. If HPC infrastructure is available, our models can be sped up even further. In
particular, running all generative models on GPUs would lead to a speed-up of at least an order of magnitude

Importantly, however, even without this HPC acceleration we find that all our models are

Among our proposed methods, the fastest to evaluate is the PCA+NN method described in Sect.

Related to the difference between GP and NN regression are the storage size requirements of the different methods. Models employing NNs are less demanding than GPs in terms of memory requirements, mainly because they do not need to keep memory of the training data. Within NN architectures, the simpler ones are, intuitively, the lightest to store. PCA+NN is again the best performing method in this regard, winning in particular over AE+NN since the latter requires the storage of weights and biases for two NNs.

Now that we have quantified the performance of our generative models, we want to apply them to the Bayesian inference of a microseismic event
location. For this purpose, we simulate the detection of a microseismic event and wish to infer the posterior distribution of its coordinates. The
posterior distribution of a set of parameters

The posterior distribution

We simulate the observation of a microseismic isotropic event by generating a noiseless trace given specified coordinates
(

Instead of repeating the analysis for each proposed generative model, we decide to use the one that has been shown in Table

Comparison of the marginalized 68 % and 95 % credibility contours obtained with the D18 method (in blue) and our proposed NN direct generative model (in red) described in Sect.

Prior range and mean and marginalized 68 % credibility intervals on the coordinates (

Figure

In order to check the accuracy of our emulators, we perform a comparison of the inference results that we obtain with our surrogate model with the
results obtained from a non-linear probabilistic location method. The widely used algorithm

The theoretical computations of the arrival times given specified coordinates (

Our goal is to verify that our methodology provides estimates of the posterior probability distribution for the event location that are in good
agreement with those obtained from the

Comparison of the marginalized 68 % and 95 % percent credibility contours obtained with the EDT non-linear location method (in blue) and our proposed NN direct generative model (in red) described in Sect.

The inference results for the posterior distribution of the coordinates are reported in Table

In this paper we developed new generative models to accelerate Bayesian inference of microseismic event locations. Our geophysical setup was similar
to the one used in

All models developed in this paper were trained on the same 2000 forward simulated seismograms used by D18 when training their emulator. However, our
models are based on deep-learning architectures and make minimal use of Gaussian process (GP) regression, which is instead performed multiple times in
the method proposed by D18. This makes all of our models faster to train and evaluate compared to the previous emulator, achieving a speed-up factor
of up to

We showed this first by calculating the 2D correlation coefficient for the seismograms of the test set. The values obtained with all our models were
higher than those obtained by D18, indicating the higher accuracy achieved. Secondly, we repeated the simulated experiment devised by D18, with
sensors placed at the seabed of a 3D marine environment where our simulated sources were randomly located. We showed that using information coming
from only four receivers situated on the detection plane we were able to provide accurate and tight constraints on the source coordinates, whereas the
D18 method struggled to provide any significant constraint given the same setup and would likely need additional information from more sensors to
achieve comparable constraints. As a result of the speed-up obtained at evaluation time, we were able to perform the inference process on a single CPU
in

We also compared our inference results with those obtained from arrival time inversion, following the methodology implemented in the software

A complete Bayesian hierarchical model for source location has been developed in the software

In conclusion, we provided the community with a collection of deep generative models that can very efficiently accelerate Bayesian inference of
microseismic sources. The ultimate goal here would be to integrate our emulators within existing methodologies and software for joint location and
moment tensor components inversion, as for example implemented in

The performances of our emulators in terms of accuracy are all comparable and improved with respect to the D18 method. Speed
considerations may therefore be invoked in the decision process for a particular method. However, we notice that our framework is valid only for
microseismic events characterized by isotropic moment tensor. Considering more complicated forms of the moment tensor will likely require additional
complications, first of all considering seismic traces recorded for longer times, since the signal structure will be in general more
complicated. Extensions of this work to non-isotropic sources, possibly in combination with other source inversion techniques

We note that for a different velocity model our emulators would need to be retrained on a new set of seismic traces. We note that this is a limitation
shared by other forward modelling approaches

We expect our methodology for waveform emulation to be improved by including physics-based information in the emulation framework, following recent
work in physics-informed neural networks

Here we briefly summarize, for comparison, the surrogate model developed in D18 for fast emulation of isotropic microseismic traces, given their
source locations on a 3D grid. We report here the main steps of the procedure, referring to D18 for all details.

We first compress the training seismograms, isolating in each of them the 100 dominant components in absolute values and storing their amplitudes and time indices.

We then train a GP for each dominant component and for each index. Thus, in total there will be 100

Once the GPs are trained, for each set of coordinates the 100 predictions for the dominant signal components and the 100 predictions for their indices will produce a compressed version of the seismogram, where the (predicted) subdominant components are set to zero.

Given two probability distributions

In Sect.

It is easy to show

In our case, since

In Sect.

Our deep-learning models will be available at

ASM developed the theoretical framework and methodology, as well as the software implementation of the generative models described in the paper. He also generated the synthetic data used for training, validated the models, and wrote the article. DP was involved in the implementation, validation of the models, and writing. AMGF, MPH, and BJ were involved in the review and conceptualization of this work and contributed to the writing.

The authors declare that they have no conflict of interest.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work has been partially enabled by funding from Royal Dutch Shell plc and the UCL Cosmoparticle Initiative. Some of the computations have been performed on the Wilkes High Performance GPU computer cluster at the University of Cambridge; we are grateful to Stuart Rankin and Greg Willatt for their technical support. We wish to express our deepest gratitude to Saptarshi Das for useful discussions and support during the initial phase of this project. We also thank Stephen Bourne, Xi Chen, Detlef Hohl, Jonathan Smith, and Teh-Ru Alex Song for useful discussions. We acknowledge the use of the software

This work has been partially enabled by funding from Royal Dutch Shell plc and the UCL Cosmoparticle Initiative. Davide Piras is supported by the STFC UCL Centre for Doctoral Training in Data Intensive Science. Ana Margarida Godinho Ferreira received funding from NERC grant NE/N011791/1.

This paper was edited by Michal Malinowski and reviewed by two anonymous referees.