<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0"><?xmltex \makeatother\@nolinetrue\makeatletter?>
  <front>
    <journal-meta><journal-id journal-id-type="publisher">SE</journal-id><journal-title-group>
    <journal-title>Solid Earth</journal-title>
    <abbrev-journal-title abbrev-type="publisher">SE</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Solid Earth</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1869-9529</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/se-11-1527-2020</article-id><title-group><article-title>Deep learning for fast simulation of seismic waves in complex media</article-title><alt-title>Deep learning for fast simulation of seismic waves in complex media</alt-title>
      </title-group><?xmltex \runningtitle{Deep learning for fast simulation of seismic waves in complex media}?><?xmltex \runningauthor{B.~Moseley et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Moseley</surname><given-names>Ben</given-names></name>
          <email>bmoseley@robots.ox.ac.uk</email>
        <ext-link>https://orcid.org/0000-0003-2238-1783</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Nissen-Meyer</surname><given-names>Tarje</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Markham</surname><given-names>Andrew</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Department of Computer Science, University of Oxford, Oxford, UK</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Department of Earth Sciences, University of Oxford, Oxford, UK</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Ben Moseley (bmoseley@robots.ox.ac.uk)</corresp></author-notes><pub-date><day>24</day><month>August</month><year>2020</year></pub-date>
      
      <volume>11</volume>
      <issue>4</issue>
      <fpage>1527</fpage><lpage>1549</lpage>
      <history>
        <date date-type="received"><day>14</day><month>October</month><year>2019</year></date>
           <date date-type="rev-request"><day>13</day><month>November</month><year>2019</year></date>
           <date date-type="rev-recd"><day>21</day><month>June</month><year>2020</year></date>
           <date date-type="accepted"><day>30</day><month>June</month><year>2020</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2020 </copyright-statement>
        <copyright-year>2020</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://se.copernicus.org/articles/.html">This article is available from https://se.copernicus.org/articles/.html</self-uri><self-uri xlink:href="https://se.copernicus.org/articles/.pdf">The full text article is available as a PDF file from https://se.copernicus.org/articles/.pdf</self-uri>
      <abstract><title>Abstract</title>
    <p id="d1e104">The simulation of seismic waves is a core task in many geophysical applications. Numerical methods such as finite difference (FD) modelling and spectral element methods (SEMs) are the most popular techniques for simulating seismic waves, but disadvantages such as their computational cost prohibit their use for many tasks. In this work, we investigate the potential of deep learning for aiding seismic simulation in the solid Earth sciences. We present two deep neural networks which are able to simulate the seismic response at multiple locations in horizontally layered and faulted 2-D acoustic media an order of magnitude faster than traditional finite difference modelling. The first network is able to simulate the seismic response in horizontally layered media and uses a WaveNet network architecture design.  The second network is significantly more general than the first and is able to simulate the seismic response in faulted media with arbitrary layers, fault properties and an arbitrary location of the seismic source on the surface of the media, using a conditional autoencoder design. We test the sensitivity of the accuracy of both networks to different network hyperparameters and show that the WaveNet network can be retrained to carry out fast seismic inversion in the same media. We find that are there are challenges when extending our methods to more complex, elastic and 3-D Earth models; for example, the accuracy of both networks is reduced when they are tested on models outside of their training distribution. We discuss further research directions which could address these challenges and potentially yield useful tools for practical simulation tasks.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

      <?xmltex \hack{\newpage}?>
<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d1e118">Seismic simulations are essential for addressing many outstanding questions in geophysics.  In seismic hazard analysis, they are a key tool for quantifying the ground motion of potential earthquakes <xref ref-type="bibr" rid="bib1.bibx7 bib1.bibx10" id="paren.1"/>. In oil and gas prospecting, they allow the seismic response of hydrocarbon reservoirs to be modelled <xref ref-type="bibr" rid="bib1.bibx9 bib1.bibx33" id="paren.2"/>. In geophysical surveying, they show how the subsurface is illuminated by different survey designs <xref ref-type="bibr" rid="bib1.bibx67" id="paren.3"/>. In global geophysics, they are used to obtain snapshots of the Earth's interior dynamics by tomography <xref ref-type="bibr" rid="bib1.bibx20 bib1.bibx8" id="paren.4"/>, to decipher source and path effects from individual seismograms <xref ref-type="bibr" rid="bib1.bibx28" id="paren.5"/> and to model wave effects of complex structures <xref ref-type="bibr" rid="bib1.bibx56 bib1.bibx43" id="paren.6"/>. In seismic inversion, they are used to estimate the elastic properties of a medium given its seismic response <xref ref-type="bibr" rid="bib1.bibx55 bib1.bibx53" id="paren.7"/> and in full-waveform inversion <xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx65" id="paren.8"/>, a technique used to image the 3-D  structure of the subsurface, they are used up to tens of thousands of times to improve on estimates of a medium's elastic properties. In planetary science, seismic simulations play a central role in understanding novel recordings on Mars <xref ref-type="bibr" rid="bib1.bibx62" id="paren.9"/>.</p>
      <p id="d1e149">Numerous methods exist for simulating seismic waves, the most popular in fully heterogeneous media being finite difference (FD) and spectral element methods (SEMs) <xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx36 bib1.bibx25" id="paren.10"/>. They are able to capture a large range of physics, including the effects of undulating solid–fluid interfaces <xref ref-type="bibr" rid="bib1.bibx30" id="paren.11"/>, intrinsic attenuation <xref ref-type="bibr" rid="bib1.bibx60" id="paren.12"/> and anisotropy <xref ref-type="bibr" rid="bib1.bibx61" id="paren.13"/>. These methods solve for the propagation of the full seismic<?pagebreak page1528?> wavefield by discretising the elastodynamic equations of motion. For an acoustic heterogeneous medium, these are given by the scalar linear equation of motion:
          <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M1" display="block"><mml:mrow><mml:mi mathvariant="italic">ρ</mml:mi><mml:mi mathvariant="normal">∇</mml:mi><mml:mo>⋅</mml:mo><mml:mfenced close=")" open="("><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">ρ</mml:mi></mml:mfrac></mml:mstyle><mml:mi mathvariant="normal">∇</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:mfenced><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msup><mml:mi>v</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msup><mml:mo>∂</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="italic">ρ</mml:mi><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msup><mml:mo>∂</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
        where <inline-formula><mml:math id="M2" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> is the acoustic pressure, <inline-formula><mml:math id="M3" display="inline"><mml:mi>f</mml:mi></mml:math></inline-formula> is a point source of volume injection (the seismic source), and <inline-formula><mml:math id="M4" display="inline"><mml:mrow><mml:mi>v</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mi mathvariant="italic">κ</mml:mi><mml:mo>/</mml:mo><mml:mi mathvariant="italic">ρ</mml:mi></mml:mrow></mml:msqrt></mml:mrow></mml:math></inline-formula> is the velocity of the medium, with <inline-formula><mml:math id="M5" display="inline"><mml:mi mathvariant="italic">ρ</mml:mi></mml:math></inline-formula> the density of the medium and <inline-formula><mml:math id="M6" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> the adiabatic compression modulus <xref ref-type="bibr" rid="bib1.bibx32" id="paren.14"/>.</p>
      <p id="d1e296">Whilst FD and spectral element methods are the primary means of simulation in complex media, a major disadvantage of these methods is their computational cost <xref ref-type="bibr" rid="bib1.bibx6 bib1.bibx29" id="paren.15"/>. Typical FD or SEM simulations can involve billions of degrees of freedom, and at each time step the wavefield must be iteratively updated at each 3-D grid point. For many practical geophysical applications, this is often prohibitively expensive. For example, in global seismology, one may be interested in modelling waves up to 1 Hz in frequency to resolve small-scale heterogeneities in the mantle and a single simulation of this type with conventional techniques can cost around 40 million CPU hours <xref ref-type="bibr" rid="bib1.bibx30" id="paren.16"/>. At crustal scales, industrial seismic imaging requires wave modelling up to  tens of Hertz in frequency carried out hundreds of thousands of times for each explosion in a seismic survey, and such requirements can easily fill the largest supercomputers on Earth. Any improvement in efficiency is welcome, not least due to the high financial and environmental costs of high-performance computing.</p>
      <p id="d1e305">In some applications, large parts of the Earth model may be relatively smooth or simple. This simplicity can be taken advantage of, for example, in the complexity-adapted SEM introduced by <xref ref-type="bibr" rid="bib1.bibx29" id="text.17"/>, and can deliver a large speedup compared to standard numerical modelling. Pseudo-analytical methods such as ray tracing and amplitude-versus-offset modelling <xref ref-type="bibr" rid="bib1.bibx3 bib1.bibx64" id="paren.18"/> are another approach which can provide significant speedups, albeit being approximate. We note that many applications are constrained and driven by a sparse set of observations on the surface of an Earth model. For these applications, we are typically only interested in modelling the seismic response at these points to decipher seismic origin or the 3-D structure beneath the surface, yet fully numerical methods still need to iterate the entire wavefield through all points in the model at all points in time. Any shortcut to avoid computing these massive 4-D wavefields might lead to drastic efficiency improvements. In short, the points above suggest that alternative and advantageous methods to capture accurate wave physics may be possible for these challenging problems.</p>
      <p id="d1e315">The field of machine learning has seen an explosion in growth over the last decade. This has been primarily driven by advancements in deep learning, which has provided more powerful algorithms allowing much more difficult problems to be learned <xref ref-type="bibr" rid="bib1.bibx16" id="paren.19"/>. This progress has led to a surge in the use of deep learning techniques across many areas of science. In particular, deep neural networks have recently shown promise in their ability to make fast yet sufficiently accurate predictions of physical phenomena <xref ref-type="bibr" rid="bib1.bibx18 bib1.bibx31 bib1.bibx44" id="paren.20"/>. These approaches are able to learn about highly non-linear physics and often offer much faster inference times than traditional simulation.</p>
      <p id="d1e324">In this work, we ask whether the latest deep learning techniques can aid seismic simulation tasks relevant to the solid Earth sciences. We investigate the use of deep neural networks and discuss the challenges and opportunities when using them for practical seismic simulation tasks. Our contribution is as follows:
<list list-type="bullet"><list-item>
      <p id="d1e329">We present two deep neural networks which are able to simulate seismic waves in 2-D acoustic media an order of magnitude faster than FD simulation. The first network uses a WaveNet network architecture <xref ref-type="bibr" rid="bib1.bibx58" id="paren.21"/> and is able to accurately simulate the pressure response from a fixed point source at multiple locations in a horizontally layered velocity model. The second is significantly more general; it uses a conditional autoencoder network design and is able to simulate the seismic response at multiple locations in faulted media with arbitrary layers, fault properties and an arbitrary location of the source on the surface of the media. In contrast to the classical methods, both networks simulate the seismic response in a single inference step, without needing to iteratively model the seismic wavefield through time, resulting in a significant speedup compared to FD simulation.</p></list-item><list-item>
      <p id="d1e336">We test the sensitivity of the accuracy of both networks to different network designs, present a loss function with a time-varying gain which improves training convergence and show that fast seismic inversion in horizontally layered media can also be carried out by retraining the WaveNet network.</p></list-item><list-item>
      <p id="d1e340">We find challenges when extending our methods to more complex, elastic and 3-D Earth models and discuss further research directions which could address these challenges and yield useful tools for practical simulation tasks.</p></list-item></list></p>
      <p id="d1e343">In Sect. <xref ref-type="sec" rid="Ch1.S2"/>, we consider the simple case of simulating seismic waves in horizontally layered 2-D acoustic Earth models using a WaveNet deep neural network. In Sect. <xref ref-type="sec" rid="Ch1.S3"/>, we move on to the task of simulating more complex faulted Earth models using a conditional autoencoder network. In Sect. <xref ref-type="sec" rid="Ch1.S4"/>, we discuss the challenges of extending our approaches to practical simulation tasks and future research directions.</p><?xmltex \hack{\newpage}?>
<?pagebreak page1529?><sec id="Ch1.S1.SS1">
  <label>1.1</label><title>Related work</title>
      <p id="d1e360">The use of machine learning and neural networks in geophysics is not new <xref ref-type="bibr" rid="bib1.bibx59" id="paren.22"/>. For example, <xref ref-type="bibr" rid="bib1.bibx39" id="text.23"/> used neural networks to carry out automated first break picking, <xref ref-type="bibr" rid="bib1.bibx12" id="text.24"/> used a neural network to discriminate between earthquakes and nuclear explosions and <xref ref-type="bibr" rid="bib1.bibx46" id="text.25"/> used them for electromagnetic inversion of a conductive target. In seismic inversion, <xref ref-type="bibr" rid="bib1.bibx51" id="text.26"/> used a neural network to estimate the velocity of 1-D, layered, constant thickness velocity profiles from seismic amplitudes and <xref ref-type="bibr" rid="bib1.bibx41" id="text.27"/> used neural networks for cross-well travel-time tomography. However, these early approaches only used shallow network designs with small numbers of free parameters which limits the expressivity of neural networks and the complexity of problems they can learn about <xref ref-type="bibr" rid="bib1.bibx16" id="paren.28"/>.</p>
      <p id="d1e385">The field of machine learning has grown rapidly over the last decade, primarily because of advances in deep learning. The availability of larger datasets, discovery of methods which allow deeper networks to be trained and availability of more powerful computing architectures (mostly GPUs) has allowed much more complex problems to be learnt <xref ref-type="bibr" rid="bib1.bibx16" id="paren.29"/>, leading to a surge in the use of deep learning in many different research areas. For example, in physics, <xref ref-type="bibr" rid="bib1.bibx31" id="text.30"/> presented a deep convolutional network which could accurately predict whether randomly stacked wooden towers would fall or remain stable, given 2-D images of the tower.  <xref ref-type="bibr" rid="bib1.bibx18" id="text.31"/> demonstrated that convolutional neural networks could estimate flow fields in complex computational fluid dynamics (CFD) calculations 2 orders of magnitude faster than a traditional GPU-accelerated CFD solver, and <xref ref-type="bibr" rid="bib1.bibx44" id="text.32"/> used a conditional generative adversarial network to simulate particle showers in particle colliders.</p>
      <p id="d1e400">A resurgence is occurring in geophysics too <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx26" id="paren.33"/>. Early examples of deep learning include <xref ref-type="bibr" rid="bib1.bibx11" id="text.34"/>, who used deep probabilistic neural networks to estimate crustal thicknesses from surface wave velocities and <xref ref-type="bibr" rid="bib1.bibx57" id="text.35"/>, who used a deep autoencoder to compress seismic waveforms. More recently, <xref ref-type="bibr" rid="bib1.bibx45" id="text.36"/> presented an earthquake identification method using convolutional networks which is orders of magnitude faster than traditional techniques. In seismic inversion, <xref ref-type="bibr" rid="bib1.bibx4" id="text.37"/> proposed an efficient deep learning concept for carrying out seismic tomography using the semblance of common midpoint receiver gathers. <xref ref-type="bibr" rid="bib1.bibx66" id="text.38"/> proposed a convolutional autoencoder network to carry out seismic inversion, whilst <xref ref-type="bibr" rid="bib1.bibx68" id="text.39"/> adapted a U-net network design for the same purpose. <xref ref-type="bibr" rid="bib1.bibx49" id="text.40"/> demonstrated that a recurrent neural network framework can be used to carry out full-waveform inversion (FWI). <xref ref-type="bibr" rid="bib1.bibx54" id="text.41"/> showed a method for using deep learning to extrapolate low-frequency seismic energy to improve the convergence of FWI algorithms. In seismic simulation, <xref ref-type="bibr" rid="bib1.bibx70" id="text.42"/> presented a multi-scale convolutional network for predicting the evolution of the full seismic wavefield in heterogeneous media. Their method was able to approximate the wavefield kinematics over multiple time steps, although it suffered from the accumulation of error over time and did not offer a reduction in computational time. <xref ref-type="bibr" rid="bib1.bibx38" id="text.43"/> showed that a convolutional network with a recursive loss function can simulate the full wavefield in horizontally layered acoustic media. <xref ref-type="bibr" rid="bib1.bibx27" id="text.44"/> used a generative adversarial network to simulate seismograms from radially symmetric and smooth Earth models.</p>
      <p id="d1e441">In this work, we present fast methods for simulating seismic waves in horizontally layered and faulted 2-D acoustic media, which offer a significant reduction in computation time compared to <xref ref-type="bibr" rid="bib1.bibx70" id="text.45"/>. We also present a fast method for seismic inversion of horizontally layered acoustic media, which is more general than the original approach proposed by <xref ref-type="bibr" rid="bib1.bibx51" id="text.46"/> because it is able to invert velocity models with varying numbers of layers and varying layer thicknesses. We restrict ourselves to 2-D acoustic media and discuss implications for 3-D elastic media below.</p>
</sec>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Fast seismic simulation in 2-D horizontally layered acoustic media using WaveNet</title>
      <p id="d1e459">First, we consider the simple case of simulating seismic waves in horizontally layered 2-D acoustic Earth models. We train a deep neural network with a WaveNet architecture to simulate the seismic response recorded at multiple receiver locations in the Earth model, horizontally offset from a point source emitted at the surface of the model. As mentioned above, many seismic applications are concerned with sparse observations similar to this setup. A key difference of this approach compared to FD and SEM simulations is that the network computes the seismic response at the surface in a single inference step, without needing to iteratively model the seismic wavefield through time, potentially offering a significant speedup. Whilst we concentrate on simple velocity models here, more complex faulted Earth models are considered in Sect. <xref ref-type="sec" rid="Ch1.S3"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><?xmltex \currentcnt{1}?><label>Figure 1</label><caption><p id="d1e466">Ground truth FD simulation example. <bold>(a)</bold> A 20 Hz Ricker seismic source is emitted close to the surface and propagates through a 2-D horizontally layered acoustic Earth model. The black circle shows the source location. A total of 11 receivers are placed at the same depth as the source with a horizontal spacing of 50 m (red triangles). The full wavefield is overlain for a single snapshot in time. Note seismic reflections occur at each velocity interface. <bold>(b)</bold> The Earth velocity model. The Earth model has a constant density of 2200 kg m<inline-formula><mml:math id="M7" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>. <bold>(c)</bold> The resulting ground truth pressure response recorded by each of the receivers, using FD modelling. A <inline-formula><mml:math id="M8" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2.5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> gain is applied to the receiver responses for display.</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f01.png"/>

      </fig>

      <p id="d1e507">An example simulation we wish to learn is shown in Fig. <xref ref-type="fig" rid="Ch1.F1"/> and our simulation workflow is shown in Fig. <xref ref-type="fig" rid="Ch1.F2"/>. The input to the network is a horizontally layered velocity profile and the output of the network is a simulation of the pressure response recorded at each  receiver location.  We will now discuss deep neural networks, our WaveNet architecture, our simulation workflow and our training methodology in more detail below.</p><?xmltex \hack{\newpage}?>
<?pagebreak page1530?><sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Deep neural networks and the WaveNet network</title>
      <p id="d1e523">A neural network is a network of simple computational elements, known as neurons, which perform mathematical operations on multidimensional arrays or tensors <xref ref-type="bibr" rid="bib1.bibx16" id="paren.47"/>. The composition of these neurons together defines a mathematical function of the network's input. Each neuron has a set of free parameters, or weights, which are tuned using optimisation, allowing the network's function to be learned, given a set of training data. In deep learning, the neurons are typically arranged in multiple layers, which allows the network to learn highly non-linear functions.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2" specific-use="star"><?xmltex \currentcnt{2}?><label>Figure 2</label><caption><p id="d1e531">Our WaveNet simulation workflow. Given a 1-D Earth velocity profile as input <bold>(a)</bold>, our WaveNet deep neural network <bold>(b)</bold> outputs a simulation of the pressure responses at the 11 receiver locations in Fig. <xref ref-type="fig" rid="Ch1.F1"/>. The raw input 1-D velocity profile sampled in depth is converted into its normal incidence reflectivity series sampled in time before being input into the network. The network is composed of nine time-dilated causally connected convolutional layers with a filter width of two samples and dilation rates which increase exponentially with layer depth. Each hidden layer of the network has the same length as the input reflectivity series, 256 channels and a rectified linear unit (ReLU) activation function. A final causally connected convolutional layer with a filter width of 101 samples, 11 output channels and an identity activation is used to generate the output simulation.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f02.png"/>

        </fig>

      <p id="d1e548">A standard building block in deep learning is the convolutional layer, where all neurons in the layer share the same weight tensor and each neuron has a limited field of view of its input tensor. The output of the layer is achieved by cross correlating the weight tensor with the input tensor.  Multiple weight tensors, or filters, can be used to increase the depth of the output tensor. Such designs have achieved state-of-the-art performance across a wide range of machine learning tasks <xref ref-type="bibr" rid="bib1.bibx17" id="paren.48"/>.</p>
      <p id="d1e555">The WaveNet network proposed by <xref ref-type="bibr" rid="bib1.bibx58" id="text.49"/> makes multiple alterations to the standard convolutional layer for its use with time series. Each convolutional layer is made causal; that is, the receptive field of each neuron only contains samples from the input layer whose sample times are before or the same as the current neuron's sample time. Furthermore, the WaveNet exponentially dilates the width of its causal connections with layer depth. This allows the field of view of its neurons to increase exponentially with layer depth, without needing a large number of layers. These modifications are made to honour time series prediction tasks which are causal and to better model input data which vary over multiple timescales. The WaveNet network recently achieved state-of-the-art performance in text-to-speech synthesis.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Simulation workflow</title>
      <p id="d1e569">Our workflow consists of a preprocessing step, where we convert each input velocity model into its corresponding normal incidence reflectivity series sampled in time (Fig. <xref ref-type="fig" rid="Ch1.F2"/>a), followed by a simulation step, where it is passed to a WaveNet network to simulate the pressure response recorded by each receiver (Fig. <xref ref-type="fig" rid="Ch1.F2"/>b).</p>
      <?pagebreak page1531?><p id="d1e576">The reflectivity series is typically used in exploration seismology <xref ref-type="bibr" rid="bib1.bibx52" id="paren.50"/> and contains values of the ratio of the amplitude of the reflected wave to the incident wave for each interface in a velocity model. For acoustic waves at normal incidence, these values are given by
<?xmltex \hack{\newpage}?>
            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M9" display="block"><mml:mrow><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ρ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">ρ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">ρ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">ρ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ρ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ρ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> are the densities and P-wave velocities across the interface. The series is usually expressed in time and each reflectivity value occurs at the time at which the primary reflection of the source from the corresponding velocity interface arrives at a given receiver. The arrival times can be computed by carrying out a depth-to-time conversion of the reflectivity values using the input velocity model.</p>
      <p id="d1e690">We chose to convert the velocity model to its reflectivity series and use the causal WaveNet architecture to constrain our workflow. For horizontally layered velocity models and receivers horizontally offset from the source, the receiver pressure recordings are causally correlated to the normal incidence reflectively series of the zero-offset receiver. Intuitively, a seismic reflection recorded after a short time has only travelled through a shallow part of the velocity model and the pressure responses are at most dependent on the past samples in this reflectivity series.  By preprocessing the input velocity model into its corresponding reflectivity series and using the causal WaveNet architecture to simulate the receiver response, we can constrain the network so that it honours this causal correlation.</p>
      <p id="d1e693">We input the 1-D profile of a 2-D horizontally layered velocity model, with a depth of 640 m and a step size of 5 m. We use Eq. (<xref ref-type="disp-formula" rid="Ch1.E2"/>)  and  a standard 1-D depth to time conversion to convert the velocity model into its normal incidence reflectivity series. The output reflectivity series has a length of 1 s and a sample rate of 2 ms. An example output reflectivity series is shown in Fig. <xref ref-type="fig" rid="Ch1.F2"/>a.</p>
      <p id="d1e701">The reflectivity series is passed to the WaveNet network, which contains nine causally connected convolutional layers (Fig. <xref ref-type="fig" rid="Ch1.F2"/>b). Each convolutional layer has the same length as the input reflectivity series, 256 hidden channels, a receptive field width of two samples and a rectified linear unit (ReLU) activation function <xref ref-type="bibr" rid="bib1.bibx40" id="paren.51"/>. Similar to the original WaveNet design, we use exponentially increasing dilations at each layer to ensure that the first sample in the input reflectivity series is in the receptive field of the last sample of the output simulation.  We add a final causally connected convolutional layer with 11 output channels, a filter width of 101 samples and an identity activation to generate the output simulation, where each output channel corresponds to a receiver prediction. This results in the network having 1 333 515 free parameters in total.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3"><?xmltex \currentcnt{3}?><label>Figure 3</label><caption><p id="d1e711">Distribution of layer velocity and layer thickness over all examples in the training set.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f03.png"/>

        </fig>

</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Training data generation</title>
      <?pagebreak page1532?><p id="d1e728">To train the network, we generate 50 000 synthetic ground truth example simulations using the SEISMIC_CPML code, which performs second-order acoustic FD modelling <xref ref-type="bibr" rid="bib1.bibx24" id="paren.52"/>. Each example simulation uses a randomly sampled 2-D horizontally layered velocity model with a width and depth of 640 m and a sample rate of 5 m in both directions. (Fig. <xref ref-type="fig" rid="Ch1.F1"/>b). For all simulations, we use a constant density model of 2200 kg m<inline-formula><mml:math id="M14" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>.</p>
      <p id="d1e748">In each simulation, the layer velocities and layer thickness are randomly sampled from log-normal distributions. We also add a small velocity gradient randomly sampled from a normal distribution to each model such that the velocity values tend to increase with depth, to be more Earth realistic. The distributions over layer velocities and layer thicknesses for the entire training set are shown in Fig. <xref ref-type="fig" rid="Ch1.F3"/>.</p>
      <p id="d1e753">We use a 20 Hz Ricker source emitted close to the surface and record the pressure response at 11 receiver locations placed symmetrically around the source, horizontally offset every 50 m (Fig. <xref ref-type="fig" rid="Ch1.F1"/>a). We use a convolutional perfectly matched layer boundary condition such that waves which reach the edge of the model are absorbed with negligible reflection. We run each simulation for 1 s and use a 0.5 ms sample rate to maintain accurate FD fidelity. We downsample the resulting receiver pressure responses to 2 ms before using them for training.</p>
      <p id="d1e758">We run 50 000 simulations and extract a  training example from each simulation, where each training example consists of a 1-D layered velocity profile and the recorded pressure response at each of the 11 receivers. We withhold 10 000 of these examples as a validation set to measure the generalisation performance of the network during training.</p>
</sec>
<sec id="Ch1.S2.SS4">
  <label>2.4</label><title>Training process</title>
      <p id="d1e769">The network is trained using the Adam stochastic gradient descent algorithm  <xref ref-type="bibr" rid="bib1.bibx23" id="paren.53"/>. This algorithm computes the gradient of a loss function with respect to the free parameters of the network over a randomly selected subset, or batch, of the training examples. This gradient is used to iteratively update the parameter values, with a step size controlled by a learning rate parameter. We propose a L2 loss function with a time-varying gain function for this task, given by
            <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M15" display="block"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mo>‖</mml:mo><mml:mi>G</mml:mi><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi>Y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>-</mml:mo><mml:mi>Y</mml:mi><mml:mo>)</mml:mo><mml:msubsup><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M16" display="inline"><mml:mover accent="true"><mml:mi>Y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover></mml:math></inline-formula> is the simulated receiver pressure response from the network,  <inline-formula><mml:math id="M17" display="inline"><mml:mi>Y</mml:mi></mml:math></inline-formula> is the ground truth receiver pressure response from FD modelling, and <inline-formula><mml:math id="M18" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> is the number of training examples in each batch. The gain function <inline-formula><mml:math id="M19" display="inline"><mml:mi>G</mml:mi></mml:math></inline-formula> has the form <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mi>g</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M21" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> is the sample time and <inline-formula><mml:math id="M22" display="inline"><mml:mi>g</mml:mi></mml:math></inline-formula> is a hyperparameter which determines the strength of the gain.  We add this to empirically account for the attenuation of the wavefield caused by spherical spreading, by increasing the weight of samples at later times. In this section, we use a fixed value of <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:mi>g</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2.5</mml:mn></mml:mrow></mml:math></inline-formula>. We use a learning rate of <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, a batch size of 20 training examples and run training over 500 000 gradient descent steps.</p>
</sec>
<sec id="Ch1.S2.SS5">
  <label>2.5</label><title>Comparison to 2-D ray tracing</title>
      <p id="d1e917">We compare the WaveNet simulation to an efficient, quasi-analytical 2-D ray-tracing algorithm which assumes horizontally layered media. We modify the 2-D horizontally layered ray-tracing bisection algorithm from the Consortium for Research in Elastic Wave Exploration Seismology (CREWES) seismic modelling library <xref ref-type="bibr" rid="bib1.bibx34" id="paren.54"/> to include Zoeppritz modelling of the reflection and transmission coefficients at each velocity interface <xref ref-type="bibr" rid="bib1.bibx3" id="paren.55"/> and  2-D spherical spreading attenuation <xref ref-type="bibr" rid="bib1.bibx19 bib1.bibx42" id="paren.56"/> during ray tracing. The output of the algorithm is a primary reflectivity series for each receiver, which we convolve with the source signature used in FD modelling to obtain an estimate of the receiver responses.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4" specific-use="star"><?xmltex \currentcnt{4}?><label>Figure 4</label><caption><p id="d1e931">WaveNet simulations for four randomly selected examples in the test set. Red shows the input velocity model, its corresponding reflectivity series and the ground truth pressure response from FD simulation at the 11 receiver locations. Green shows the WaveNet simulation given the input reflectivity series for each example. A <inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2.5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> gain is applied to the receiver responses for display.</p></caption>
          <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f04.png"/>

        </fig>

</sec>
<sec id="Ch1.S2.SS6">
  <label>2.6</label><title>Results</title>
      <p id="d1e959">Whilst training the WaveNet, the losses over the training and validation datasets converge to similar values, suggesting the network is generalising well to examples in the validation dataset. To assess the performance of the trained network, we generate a random  test set of 1000 unseen examples.  The simulations for four randomly selected examples from this test set are compared to the ground truth FD modelling simulation in Fig. <xref ref-type="fig" rid="Ch1.F4"/>. We also compare the WaveNet simulation to 2-D ray tracing in Fig. <xref ref-type="fig" rid="Ch1.F5"/>. For nearly all time samples, the network is able to simulate the receiver pressure responses. The WaveNet is able to predict the normal moveout (NMO) of the primary layer reflections with receiver offset, the direct arrivals at the start of each receiver recording and the spherical spreading loss of the wavefield over time, though the network struggles to accurately simulate the multiple reverberations at the end of the receiver recordings.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5" specific-use="star"><?xmltex \currentcnt{5}?><label>Figure 5</label><caption><p id="d1e968">Comparison of WaveNet simulation to 2-D ray tracing. We compare the WaveNet simulation to 2-D ray tracing for two of the examples in Fig. <xref ref-type="fig" rid="Ch1.F4"/>. Red shows the input velocity model, its corresponding reflectivity series and the ground truth pressure responses from FD simulation. Green shows the WaveNet simulation (left) and 2-D ray tracing simulation (right). A <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2.5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> gain is applied to the receiver responses for display.</p></caption>
          <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f05.png"/>

        </fig>

      <p id="d1e990">We plot the histogram of the average absolute amplitude difference between the ground truth FD simulation and the simulation from the WaveNet and 2-D ray tracing over the test set in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F13"/>d in the Appendix, and observe that the WaveNet simulation has a lower average amplitude difference than 2-D ray tracing. Small differences in phase and amplitude at larger offsets are the main source of discrepancy between the 2-D ray tracing and FD simulation, which can be seen in Fig. <xref ref-type="fig" rid="Ch1.F5"/>, and are likely due to errors both in the ray tracing approximation and in using discretisation in the FD simulation. The WaveNet predictions are consistent and stable across the test set, and their closer amplitude match to the FD simulation is perhaps to be expected because the network is trained to directly match the FD simulation rather than the 2-D ray tracing.</p>
      <p id="d1e998">We compare the sensitivity of the network's accuracy to two different convolutional network designs in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F13"/>. Their main differences to the WaveNet design is that both networks use standard rather than causal convolutional layers and the second network uses exponential dilations whilst the first does not. Both networks have nine convolutional layers,<?pagebreak page1533?> each with 256 hidden channels, filter sizes of 3, ReLU activations for all hidden layers and an identity activation function for the output layer, with 1 387 531 free parameters in total. We observe that the convolutional network without dilations does not converge during training, whilst the dilated convolutional network has a higher average absolute amplitude difference over the test set from the ground truth FD simulation than the WaveNet network (Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F13"/>d).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F6" specific-use="star"><?xmltex \currentcnt{6}?><label>Figure 6</label><caption><p id="d1e1007">Generalisation ability of the WaveNet. The WaveNet simulations (green) for four velocity models with much smaller average layer thicknesses than the training distribution are compared to ground truth FD simulation. Red shows the input velocity model, its corresponding reflectivity series and the ground truth pressure responses from FD simulation.</p></caption>
          <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f06.png"/>

        </fig>

      <p id="d1e1016">The generalisation ability of the WaveNet outside of its training distribution is tested in Fig. <xref ref-type="fig" rid="Ch1.F6"/>. We generate four velocity models with a much smaller average layer thickness than the training set and compare the WaveNet simulation to the ground truth FD simulation. We find that the WaveNet is able to make an accurate prediction of the seismic response, but it struggles to simulate the multiple reflections and sometimes the interference between the direct arrival and primary reflections.</p>
      <p id="d1e1021">We compare the average time taken to generate 100 simulations to FD simulation and 2-D ray tracing in Table <xref ref-type="table" rid="Ch1.T1"/>. We find that on a single CPU core, the WaveNet is 19 times faster than FD simulation, and using a GPU and the TensorFlow library <xref ref-type="bibr" rid="bib1.bibx1" id="paren.57"/> it is 549 times faster. This speedup is likely to be higher than if the GPU was used for accelerating existing numerical methods <xref ref-type="bibr" rid="bib1.bibx50" id="paren.58"/>. In<?pagebreak page1534?> this case, the specialised 2-D ray tracing algorithm offers a similar speedup to the WaveNet network. The network takes approximately 12 h to train on one Nvidia Tesla K80 GPU, although this training step is only required once and subsequent simulation steps are fast.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Fast seismic simulation in 2-D faulted acoustic media using a conditional autoencoder</title>
      <p id="d1e1041">The WaveNet architecture we implemented above is limited in that it is only able to simulate horizontally layered Earth models. In this section, we present a second network which is significantly more general; it simulates seismic waves in 2-D faulted acoustic media with arbitrary layers, fault properties<?pagebreak page1535?> and an arbitrary location of the seismic source on the surface of the media.</p>
      <p id="d1e1044">This is a much more challenging task to learn for multiple reasons. Firstly, the media varies along both dimensions and the resulting seismic wavefield has more complex kinematics than the wavefields in horizontally layered media. Secondly, we allow the output of the network to be conditioned on the input source location which requires the network to learn the effect of the source location. Thirdly, we input the velocity model directly into the network without conversion to a reflectivity series beforehand; the network must learn to carry out its own depth to time conversion to simulate the receiver responses. We chose this approach over our WaveNet workflow because we note that for non-horizontally layered media the pressure responses are not causally correlated to the normal incidence reflectivity series in general and our previous causality assumption does not hold.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F7" specific-use="star"><?xmltex \currentcnt{7}?><label>Figure 7</label><caption><p id="d1e1049">Ground truth FD simulation example, with a 2-D faulted media. <bold>(a)</bold> The black circle shows the source location. Overall, 32 receivers are placed at the same depth as the source with a horizontal spacing of 15 m (red triangles). The full wavefield pressure is overlain for a single snapshot in time. <bold>(b)</bold> The Earth velocity model. <bold>(c)</bold> The resulting ground truth pressure response recorded by each receiver, using FD modelling. A <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2.5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> gain is applied to the receiver responses for display.</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f07.png"/>

      </fig>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1" specific-use="star"><?xmltex \currentcnt{1}?><label>Table 1</label><caption><p id="d1e1082">Speed comparison of simulation and inversion methods. The time shown is the average time taken to generate 100 simulations (or 100 velocity predictions for the inverse WaveNet) on either a single core of a 2.2 GHz Intel Core i7 processor or a Nvidia Tesla K80 GPU. For simulation methods, the speedup factor compared to FD simulation is shown in brackets. The inverse WaveNet is faster than the forward WaveNet because it has fewer hidden channels in its architecture and therefore requires less computation.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Method</oasis:entry>
         <oasis:entry colname="col2">Average CPU time (s)</oasis:entry>
         <oasis:entry colname="col3">Average GPU time (s)</oasis:entry>
         <oasis:entry colname="col4">Training time (days)</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">2-D FD simulation</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:mn mathvariant="normal">73</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">–</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">2-D ray tracing</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:mn mathvariant="normal">2.2</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:mn mathvariant="normal">33</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">–</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">WaveNet (forward)</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:mn mathvariant="normal">3.79</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.03</mml:mn></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mn mathvariant="normal">19</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.133</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:mn mathvariant="normal">549</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col4">0.5</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Conditional autoencoder</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:mn mathvariant="normal">3.3</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:mn mathvariant="normal">22</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.180</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.003</mml:mn></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:mn mathvariant="normal">406</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col4">4</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">WaveNet (inverse)</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M40" display="inline"><mml:mrow><mml:mn mathvariant="normal">1.27</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.02</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.051</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">0.5</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e1341">Similar to Sect. <xref ref-type="sec" rid="Ch1.S2"/>, we simulate the seismic response recorded by a set of receivers horizontally offset from a point source emitted within the Earth model. An example simulation we wish to learn is shown in Fig. <xref ref-type="fig" rid="Ch1.F7"/>. We will now discuss the network architecture and training process in more detail below.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F8" specific-use="star"><?xmltex \currentcnt{8}?><label>Figure 8</label><caption><p id="d1e1350">Our conditional autoencoder simulation workflow. Given a 2-D velocity model and source location as input, a conditional autoencoder network outputs a simulation of the pressure responses at the receiver locations in Fig. <xref ref-type="fig" rid="Ch1.F7"/>. The network is composed of 24 convolutional layers and concatenates the input source location with its latent vector.</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f08.png"/>

      </fig>

<?pagebreak page1536?><sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Conditional autoencoder architecture</title>
      <p id="d1e1368">Our simulation workflow is shown in Fig. <xref ref-type="fig" rid="Ch1.F8"/>. Instead of preprocessing the input velocity model to its associated reflectivity model, we input the velocity model directly into the network. The network is conditioned on the source position, which is allowed to vary along the surface of the Earth model. The output of the network is a simulation of the pressure responses recorded at 32 fixed receiver locations in the model shown in Fig. <xref ref-type="fig" rid="Ch1.F7"/>.</p>
      <p id="d1e1375">We use a conditional autoencoder network design, shown in Fig. <xref ref-type="fig" rid="Ch1.F8"/>. The network is composed of 10 convolutional layers which reduce the spatial dimensions of the input velocity model until it has a <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> shape with 1024 hidden channels. We term this tensor the latent vector. The input source position is  concatenated onto the latent vector and 14 convolutional layers are used to expand the size of the latent vector until its output shape is the same as the target receiver gather. We choose this encoder–decoder architecture to force the network to compress the velocity model into a set of salient features before expanding them to infer the receiver responses. All hidden layers use ReLU activation functions and the final output layer uses an identity activation function.<?pagebreak page1537?> The resulting network has 18 382 296 free parameters. The full parameterisation of the network is shown in Table <xref ref-type="table" rid="App1.Ch1.S1.T2"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F9" specific-use="star"><?xmltex \currentcnt{9}?><label>Figure 9</label><caption><p id="d1e1396">Conditional autoencoder simulations for eight randomly selected examples in the test set. White circles show the input source location. The left simulation plots show the network predictions, the middle simulation plots show the ground truth FD simulations and the right simulation plots show the difference. A <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2.5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> gain is applied for display.</p></caption>
          <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f09.png"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Training process</title>
      <p id="d1e1424">We use the same training data generation process described by Sect. <xref ref-type="sec" rid="Ch1.S2.SS3"/>. When generating velocity models, we add a fault to the model. We randomly sample the length, normal or reverse direction, slip distance and orientation of the fault. Example velocity models drawn from this process are shown in Fig. <xref ref-type="fig" rid="Ch1.F9"/>. We generate 100 000 example velocity models and for each model chose three random source locations along the top of the model. This generates a total of 300 000 synthetic ground truth example simulations to use for training the network. We withhold 60 000 of these examples to use as a validation set during training.</p>
      <p id="d1e1431">We train using the same training process and loss function described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS4"/>, except that we employ a L1 norm instead of a L2 norm in the loss function (Eq. <xref ref-type="disp-formula" rid="Ch1.E3"/>). We use a learning rate of <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, a batch size of 100 examples and run training over 3 000 000 gradient descent steps. We use batch normalisation <xref ref-type="bibr" rid="bib1.bibx22" id="paren.59"/> after each convolutional layer to help regularise the network during training.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F10" specific-use="star"><?xmltex \currentcnt{10}?><label>Figure 10</label><caption><p id="d1e1461">Conditional autoencoder simulation accuracy when varying the source location. The network simulation is shown for six different source locations whilst keeping the velocity model fixed. The source positions are regularly spaced across the surface of the velocity model (white circles). Example simulations for two different velocity models in the test set are shown, where each row corresponds to a different velocity model.  The pairs of simulation plots in each row from left to right correspond to the network prediction (left in the pair) and the ground truth FD simulation (right in the pair), when varying the source location from left to right in the velocity model. A <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2.5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> gain is applied for display.</p></caption>
          <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f10.png"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Results</title>
      <p id="d1e1490">During training the losses over the training and validation datasets converge to similar values and we test the performance of the trained network using a test set of 1000 unseen examples. The output simulations for eight randomly selected velocity models and source positions from this set are shown in Fig. <xref ref-type="fig" rid="Ch1.F9"/>. We observe that the network is able to simulate the kinematics of the primary reflections and in most cases is able to capture their relative amplitudes. We also plot the network simulation when varying the source location over two velocity models from the test set in Fig. <xref ref-type="fig" rid="Ch1.F10"/> and find that the network is able to generalise well over different source locations.</p>
      <p id="d1e1497">We test the accuracy of the simulation when using different network designs and training hyperparameters, shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F14"/>. We compare example simulations from the test set when using our baseline conditional autoencoder network, when halving the number of hidden channels for all layers, when using an L2 loss function during training, when using gain exponents of <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:mi>g</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:mi>g</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> in the loss function and when removing two layers from the encoder and eight layers from the decoder. We plot the histogram of the average absolute amplitude difference between the ground truth FD simulation and the network simulation over the test set for all of the cases above, and observe that in all cases the simulations are less accurate than our baseline approach. Without the gain in the loss function, the network only learns to simulate the direct arrival and the first few reflections in the receiver responses. With a gain exponent of <inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:mi>g</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>, the network simulation is unstable and it fails to simulate the first 0.2 s of the receiver responses. When using the network with fewer layers, the simulations have edge artefacts, whilst the network with half the number of hidden channels is closest to the baseline accuracy. In testing, we find that training a network with the same number of layers but without using a bottleneck design to reduce the velocity model to a <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1024</mml:mn></mml:mrow></mml:math></inline-formula> latent vector does not converge.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F11" specific-use="star"><?xmltex \currentcnt{11}?><label>Figure 11</label><caption><p id="d1e1556">Generalisation ability of the conditional autoencoder. The conditional autoencoder simulations for five velocity models taken from different regions of the Marmousi P-wave velocity model are shown <bold>(d–h)</bold>. For each example, the left plot shows the input velocity model and source location, the middle simulation plots show the network prediction (left) and the ground truth FD simulation (right), and the right plot shows the nearest neighbour in the training set to the input velocity model. Simulations from three of the test velocity models in Fig. <xref ref-type="fig" rid="Ch1.F9"/> are also shown with their nearest neighbours <bold>(a–c)</bold>. A <inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2.5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> gain is applied for display.</p></caption>
          <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f11.png"/>

        </fig>

      <p id="d1e1585">We compare the accuracy of the conditional autoencoder to the WaveNet network in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F15"/>. We plot the simulation from both networks for an example model in the horizontally layered velocity model test set and the histogram of the average absolute amplitude difference between the ground truth FD simulation and the WaveNet and conditional autoencoder simulations over this test set. Both networks are able to accurately simulate the receiver responses, and the WaveNet simulation is slightly more accurate than the conditional autoencoder, though of course the latter is more general.</p>
      <p id="d1e1590">We test the generalisation ability of the conditional autoencoder outside of its training distribution by inputting randomly selected <inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:mn mathvariant="normal">640</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">640</mml:mn></mml:mrow></mml:math></inline-formula> m boxes from the publicly available 2-D Marmousi P-wave velocity model <xref ref-type="bibr" rid="bib1.bibx35" id="paren.60"/> into the network. This velocity model contains much more<?pagebreak page1538?> complex faulting at multiple scales, higher dips and more layer variability than our training dataset. The resulting network simulations are shown in Fig. <xref ref-type="fig" rid="Ch1.F11"/>. We  calculate the nearest neighbour to the input velocity model in the set of training velocity models, defined as the training model with the lowest L1 difference summed over all velocity values from the input velocity model and show this alongside each example.</p>
      <p id="d1e1610">We find that the network is not able to accurately simulate the full seismic response from velocity models which have large dips and/or complex faulting (Fig. <xref ref-type="fig" rid="Ch1.F11"/>e, f, h) that are absent in the training set. This observation is similar to most studies which analyse the generalisability of deep neural networks outside their training set (e.g. <xref ref-type="bibr" rid="bib1.bibx69" id="altparen.61"/> and <xref ref-type="bibr" rid="bib1.bibx13" id="altparen.62"/>). However,  encouragingly, the network is able to mimic the response from velocity models with small dips (Fig. <xref ref-type="fig" rid="Ch1.F11"/>d, g), even though the nearest training-set neighbour contains a fault, whereas the Marmousi layers are continuous.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F12" specific-use="star"><?xmltex \currentcnt{12}?><label>Figure 12</label><caption><p id="d1e1625">Inverse WaveNet predictions for four examples in the test set.  Red shows the input pressure response at the zero-offset receiver location, the ground truth reflectivity series and its corresponding velocity model.  Green shows the inverse WaveNet reflectivity series prediction and the resulting velocity prediction.</p></caption>
          <?xmltex \igopts{width=469.470472pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f12.png"/>

        </fig>

      <?pagebreak page1539?><p id="d1e1634">We compare the average time taken to generate 100 simulations using the conditional autoencoder network to FD simulation in Table <xref ref-type="table" rid="Ch1.T1"/>. We find that on a single CPU core the network is 22 times faster than FD simulation and when using a GPU and the PyTorch library  <xref ref-type="bibr" rid="bib1.bibx47" id="paren.63"/>, it is 406 times faster. This is comparable to the speedup obtained with the WaveNet. It is likely that 2-D ray tracing will not offer the same speedup as observed in Sect. <xref ref-type="sec" rid="Ch1.S2.SS6"/>, because computing ray paths through these models is likely to be more demanding. The network takes approximately 4 d to train on one Nvidia Titan V GPU.  This is 8 times longer than training the WaveNet network, although we made little effort to optimise its training time. We find that when using only 50 000 training examples the validation loss increases and the network overfits to the training dataset.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Discussion</title>
      <p id="d1e1654">Both our deep neural networks accurately model the seismic response in horizontally layered and faulted 2-D acoustic media. The WaveNet is able to carry out simulation of horizontally layered velocity models, and the conditional autoencoder is able to generalise to faulted media with arbitrary layers, fault properties and an arbitrary location of the seismic source on the surface of the media. This is a significantly harder task than simulating horizontally layered media with the WaveNet network. Furthermore, both networks are 1–2 orders of magnitude faster than FD modelling.</p>
      <p id="d1e1657">Whilst these results are encouraging and suggest that deep learning is valuable for simulation, there are further challenges when extending our methods to more complex, elastic and 3-D Earth models required for practical simulation tasks. We believe that further research will help to understand whether deep learning can aid in these more general settings and discuss these aspects in more detail below.</p>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Extension to elastic simulation</title>
      <p id="d1e1667">An important ability for practical geophysical applications is to be able to simulate seismic waves in (visco)elastic media, rather than acoustic media. The architectures of our networks are readily extendable in this regard; S-wave velocity and density models could be added as additional input channels to our networks and the number of output channels in the networks could be increased so that multi-component particle velocity vectors are output. The same training scheme could be used, with training data generated using elastic FD simulation instead of acoustic simulation and a loss function which compares vector fields instead of scalar fields. Thus, with some simple changes to our design, this challenge is at least conceptually simple to address, though further research is required to understand if it is feasible. The cost of traditional elastic simulation exceeds the cost of acoustic simulation by orders of magnitude and has prevented the seismic industry from fully embracing this crucial step. We postulate that the difference in simulation times between future elastic and acoustic simulation networks might be smaller compared to fully discretised methods such as FD, as a consequence of the networks not needing to compute the entire discretised wavefield. While this is speculative at this point, it is intriguing to investigate.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Extension to 3-D simulation</title>
      <p id="d1e1678">Another important extension is to move from 2-D to 3-D simulation. In terms of network design, our autoencoder could be extended to 3-D simulation by increasing the dimensionality of its input, hidden and output tensors. In this case, we would expect a similar order of magnitude acceleration of simulation time to 2-D, because the network would still directly estimate the seismic response without needing to iteratively model the seismic wavefield through time. However,<?pagebreak page1540?> multiple challenges arise in this setting. Firstly, increasing the dimensionality would increase the size of the network and therefore likely increase its training time. Finding an alternative representation, such as meshes or oct-trees <xref ref-type="bibr" rid="bib1.bibx2" id="paren.64"/> to reduce the dimensionality of the problem, or a way to exploit symmetry in the wave equation to reduce complexity, may be critical in this aspect. Secondly, a major challenge is likely to be the increased computational cost of generating training data with conventional methods, which, for instance, is significantly higher in 3-D when using FD modelling. Whilst we only used the subset of the wavefield at each receiver location to train our networks, finding a way to use the entire wavefield from FD simulation to train the network may help reduce the number of training simulations required. We note that generating training data are an amortised cost because the network only needs to be trained once, and although large, in the case of seismic inversion where millions of production runs are required the training cost could become negligible. Another intriguing aspect is to investigate whether deep neural network simulation costs scale more favourably with increasing frequency <inline-formula><mml:math id="M52" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula> compared to fully discrete methods which scale with <inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">ω</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>; in this study, we only consider simulation at a fixed frequency range.</p>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Generalisation to more complex Earth models</title>
      <p id="d1e1711">Perhaps the largest challenge in designing appropriate networks is to improve their generality so they can simulate more complex Earth models. We have shown that deep neural networks can move beyond simulating simple horizontally layered velocity models to more complex faulted models where, to the best of our knowledge, no analytical<?pagebreak page1541?> solutions exist, which we believe is a positive step. However, both our networks performed worse on velocity models outside of their training distributions. Furthermore, to be able to generalise to more complex velocity models the conditional autoencoder required more free parameters, more time to train and more training examples than the WaveNet network. Generalisation outside of the training distribution is a well-known and common challenge of deep neural networks in general <xref ref-type="bibr" rid="bib1.bibx16" id="paren.65"/>.</p>
      <p id="d1e1717">A naive approach would be to increase the range of the training data to improve the generality of the network; however, this would quickly become computationally intractable when trying to simulate all possible Earth models. We note that for many practical applications it may be acceptable to use a training distribution with a limited range; for example, in many of the seismic applications such tomography, FWI and seismic hazard assessment, a huge number of forward simulations of comparatively few Earth models are carried out.</p>
      <p id="d1e1720">A promising research direction may be to better regularise the networks by adding more physics-based constraints into the workflow. We found that using causality in the WaveNet generated more accurate simulations than when using a standard convolutional network; this suggested that adding this constraint helped the network simulate the seismic response, although it is an open question how best to represent causality when simulating more arbitrary Earth models. We also found that a bottleneck design helped the conditional<?pagebreak page1542?> autoencoder to converge; our hypothesis is that this encouraged a depth-to-time conversion by slowly reducing the spatial dimensions of the velocity model before expanding them into time. More advanced network designs, for example, using attention-like mechanisms <xref ref-type="bibr" rid="bib1.bibx63" id="paren.66"/> to help the network focus on relevant parts of the velocity model, rather than using convolutional layers with full fields of view, or using long short-term memory (LSTM) cells to help the network model multiple reverberations could be tested. Another interesting direction would be to use the wave equation (Eq. <xref ref-type="disp-formula" rid="Ch1.E1"/>) to directly regularise the loss function, similar to the physics-based machine learning approach proposed by <xref ref-type="bibr" rid="bib1.bibx48" id="text.67"/>.</p>
      <p id="d1e1731">We found that the nearest-neighbour test was a useful way to understand if an input velocity model was close to the training distribution and therefore if the network's output simulation was likely to be accurate. Probabilistic approaches, such as Bayesian deep learning <xref ref-type="bibr" rid="bib1.bibx15" id="paren.68"/>, could be investigated for their ability to provide  quantitative uncertainty estimates on the network's output simulation.</p>
</sec>
<sec id="Ch1.S4.SS4">
  <label>4.4</label><title>Inversion with the WaveNet</title>
      <p id="d1e1745">As an additional test, we were also able to retrain the WaveNet network to carry out fast seismic inversion in the horizontally layered media, which offered a fast alternative to existing inversion algorithms. We retrained the WaveNet network with its inputs and output reversed; its input was then a set of 11 recorded receiver responses and its output was a  prediction of the corresponding normal incidence reflectivity series. We used the same WaveNet architecture described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS2"/>, except that we inverted its structure to maintain the causal correlation between the receiver responses and reflectivity series, and we used 128 instead of 256 hidden channels for each hidden layer. We used exactly the same training data and training strategy described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS3"/> and <xref ref-type="sec" rid="Ch1.S2.SS4"/>, except that we used a loss function given by
            <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M54" display="block"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mo>‖</mml:mo><mml:mover accent="true"><mml:mi>R</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>-</mml:mo><mml:mi>R</mml:mi><mml:msubsup><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M55" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> is the true reflectivity series and <inline-formula><mml:math id="M56" display="inline"><mml:mover accent="true"><mml:mi>R</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:math></inline-formula> is the predicted reflectivity series. To recover a prediction of the velocity model, we carried out a standard 1-D time-to-depth conversion of the output reflectivity values followed by integration.</p>
      <p id="d1e1808">Predictions of the reflectivity series and velocity models for four randomly selected examples from a test set of unseen examples are shown in Fig. <xref ref-type="fig" rid="Ch1.F12"/>. The inverse WaveNet network was able to predict the underlying velocity model for each example, although in some cases small velocity errors propagated with depth, which was likely a result of the integration of the reflectivity series. The network was able to produce velocity predictions in the same order of magnitude time as the forward network (shown in Table <xref ref-type="table" rid="Ch1.T1"/>), which is likely to be a fraction of the time needed for existing seismic inversion algorithms which rely on forward simulation.</p>
      <p id="d1e1815">We note that seismic inversion is typically an ill-defined problem, and it is likely that the predictions of this network are biased towards the velocity models it was trained on. We expect the accuracy of the network to reduce when tested on inputs outside of its training distribution and with real, noisy seismic data. Further research could try to quantify this uncertainty, for example, by using Bayesian deep learning. We have not yet compared our inverse WaveNet network to existing seismic inversion techniques, such as posterior sampling or FWI.</p>
      <p id="d1e1818">An alternative method for inversion is to use our forward networks in existing seismic inversion algorithms based on optimisation, such as FWI. Both the WaveNet and conditional autoencoder networks are fully differentiable and could therefore be used to generate fast approximate gradient estimates in these methods. However, similar limitations on their generality are likely to exist and one would need to be careful to keep the inversion routine within the training distribution of the networks. Furthermore, whilst fast, these approaches would still suffer from the curse of dimensionality when moving to higher dimensions and require exponentially more samples to fully explore the parameter space.</p>
</sec>
<sec id="Ch1.S4.SS5">
  <label>4.5</label><title>Summary</title>
      <p id="d1e1829">Given the potentially large training costs and the challenge of generality, it may be that current deep learning techniques are most advantageous to practical simulation tasks where many similar simulations are required, such as inversion or statistical seismic hazard analysis, and least useful for problems with a very small number of simulations per model family. In seismology, however, we suspect that most current and future challenges fall into the former category, which renders these initial results promising. Deep learning approaches have different computational costs and benefits, and accuracies that are less clearly understood compared to traditional approaches and these should be considered for each application. Further research is required to understand how best to design the training set for a particular simulation application, as well as how to help deep neural networks generalise to unseen velocity models outside of their training distribution. Finally, we note that we only tested two types of deep neural networks (the WaveNet and conditional autoencoders) and many other types exist which could prove more effective.</p>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Conclusions</title>
      <p id="d1e1841">We have investigated the potential of deep learning for aiding seismic simulation in geophysics. We presented two deep neural networks which are able to carry out fast and largely accurate simulation of seismic waves. Both networks are 20–500 times faster than FD modelling and simulate seismic waves in horizontally layered and faulted 2-D acoustic<?pagebreak page1543?> media. The first network uses a WaveNet architecture and simulates seismic waves in horizontally layered media. We showed that this network can also be used to carry out fast seismic inversion of the same media. The second network is significantly more general than the first; it simulates seismic waves in faulted media with arbitrary layers, fault properties and an arbitrary location of the seismic source on the surface of the media. Our main contribution is to show that deep neural networks can move beyond simulating simple horizontally layered velocity models to more complex faulted models where, to the best of our knowledge, no analytical solutions exist, which we believe is a positive step towards understanding their practical potential. We discussed the challenges of extending our approaches to practical geophysical applications and future research directions which could address them, noting where it may be favourable for using these network architectures.</p><?xmltex \hack{\clearpage}?>
</sec>

      
      </body>
    <back><app-group>

<?pagebreak page1544?><app id="App1.Ch1.S1">
  <?xmltex \currentcnt{A}?><label>Appendix A</label><title/>

      <?xmltex \floatpos{h!}?><fig id="App1.Ch1.S1.F13"><?xmltex \currentcnt{A1}?><label>Figure A1</label><caption><p id="d1e1857">Comparison of different network architectures on simulation accuracy. <bold>(a)</bold> The WaveNet simulated pressure response for a randomly selected example in the test set (green) compared to ground truth FD simulation (red). <bold>(b, c)</bold> The simulated response when using two  convolutional network designs with and without exponential dilations. <bold>(d)</bold> The histogram of the average absolute amplitude difference between the ground truth FD simulation and the simulations from the WaveNet, the dilated convolutional network and 2-D ray tracing over the test set of 1000 examples. A <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2.5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> gain is applied to the receiver responses for display.</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f13.png"/>

      </fig>

<?xmltex \hack{\clearpage}?><?xmltex \floatpos{h!}?><fig id="App1.Ch1.S1.F14"><?xmltex \currentcnt{A2}?><label>Figure A2</label><caption><p id="d1e1891">Comparison of different conditional autoencoder network designs and training hyperparameters on simulation accuracy. <bold>(a)</bold> A randomly selected velocity model and source location from the test set and its corresponding ground truth FD simulation. <bold>(b)</bold> The histogram of the average absolute amplitude difference between the ground truth FD simulation and the simulation from the different cases over the test set. The histogram of the baseline network over the Marmousi test dataset is also shown. <bold>(c)</bold> A comparison of simulations and their difference to the ground truth  when using our proposed conditional autoencoder (baseline), when halving the number of hidden channels for all layers (thin), when using an L2 loss function during training (L2 loss), when using gain exponents of <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:mi>g</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:mi>g</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> in the loss function and when removing two layers from the encoder and eight layers from the decoder (shallow).  A <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2.5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> gain is applied for display.</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f14.png"/>

      </fig>

      <?xmltex \floatpos{h!}?><fig id="App1.Ch1.S1.F15"><?xmltex \currentcnt{A3}?><label>Figure A3</label><caption><p id="d1e1950">Comparison of WaveNet and conditional autoencoder simulation accuracy. Panel <bold>(a)</bold> shows a velocity model, reflectivity series and ground truth FD simulation for a randomly selected example in the horizontally layered velocity model test set in red. Green shows the WaveNet simulation. Panel <bold>(b)</bold> shows the conditional autoencoder simulation for the same velocity model. Panel <bold>(c)</bold> shows the histogram of the average absolute amplitude difference between the ground truth FD simulation and WaveNet and conditional autoencoder simulations over this test set. A <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2.5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> gain is applied for display.</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://se.copernicus.org/articles/11/1527/2020/se-11-1527-2020-f15.png"/>

      </fig>

<?xmltex \hack{\clearpage}?><?xmltex \floatpos{h!}?><table-wrap id="App1.Ch1.S1.T2"><?xmltex \hack{\hsize\textwidth}?><?xmltex \currentcnt{A1}?><label>Table A1</label><caption><p id="d1e1986">Conditional autoencoder layer parameters. Each entry shows the parameterisation of each convolutional layer. The padding column shows the padding on each side of the input tensor for each spatial dimension.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="13">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="left"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:colspec colnum="10" colname="col10" align="right"/>
     <oasis:colspec colnum="11" colname="col11" align="right"/>
     <oasis:colspec colnum="12" colname="col12" align="right"/>
     <oasis:colspec colnum="13" colname="col13" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Layer</oasis:entry>
         <oasis:entry colname="col2">Type</oasis:entry>
         <oasis:entry colname="col3">In, out channels</oasis:entry>
         <oasis:entry colname="col4">Kernel size</oasis:entry>
         <oasis:entry colname="col5">Stride</oasis:entry>
         <oasis:entry colname="col6">Padding</oasis:entry>
         <oasis:entry namest="col7" nameend="col12" align="center"/>
         <oasis:entry colname="col13"/>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">1</oasis:entry>
         <oasis:entry colname="col2">Conv2d</oasis:entry>
         <oasis:entry colname="col3">(1,8)</oasis:entry>
         <oasis:entry colname="col4">(3,3)</oasis:entry>
         <oasis:entry colname="col5">(1,1)</oasis:entry>
         <oasis:entry colname="col6">(1,1)</oasis:entry>
         <oasis:entry colname="col7">14</oasis:entry>
         <oasis:entry colname="col8">Conv2d</oasis:entry>
         <oasis:entry colname="col9">(512,512)</oasis:entry>
         <oasis:entry colname="col10">(3,3)</oasis:entry>
         <oasis:entry colname="col11">(1,1)</oasis:entry>
         <oasis:entry colname="col12">(1,1)</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">2</oasis:entry>
         <oasis:entry colname="col2">Conv2d</oasis:entry>
         <oasis:entry colname="col3">(8,16)</oasis:entry>
         <oasis:entry colname="col4">(2,2)</oasis:entry>
         <oasis:entry colname="col5">(2,2)</oasis:entry>
         <oasis:entry colname="col6">0</oasis:entry>
         <oasis:entry colname="col7">15</oasis:entry>
         <oasis:entry colname="col8">Conv2d</oasis:entry>
         <oasis:entry colname="col9">(512,512)</oasis:entry>
         <oasis:entry colname="col10">(3,3)</oasis:entry>
         <oasis:entry colname="col11">(1,1)</oasis:entry>
         <oasis:entry colname="col12">(1,1)</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">3</oasis:entry>
         <oasis:entry colname="col2">Conv2d</oasis:entry>
         <oasis:entry colname="col3">(16,16)</oasis:entry>
         <oasis:entry colname="col4">(3,3)</oasis:entry>
         <oasis:entry colname="col5">(1,1)</oasis:entry>
         <oasis:entry colname="col6">(1,1)</oasis:entry>
         <oasis:entry colname="col7">16</oasis:entry>
         <oasis:entry colname="col8">ConvT2d</oasis:entry>
         <oasis:entry colname="col9">(512,256)</oasis:entry>
         <oasis:entry colname="col10">(2,4)</oasis:entry>
         <oasis:entry colname="col11">(2,4)</oasis:entry>
         <oasis:entry colname="col12">0</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">4</oasis:entry>
         <oasis:entry colname="col2">Conv2d</oasis:entry>
         <oasis:entry colname="col3">(16,32)</oasis:entry>
         <oasis:entry colname="col4">(2,2)</oasis:entry>
         <oasis:entry colname="col5">(2,2)</oasis:entry>
         <oasis:entry colname="col6">0</oasis:entry>
         <oasis:entry colname="col7">17</oasis:entry>
         <oasis:entry colname="col8">Conv2d</oasis:entry>
         <oasis:entry colname="col9">(256,256)</oasis:entry>
         <oasis:entry colname="col10">(3,3)</oasis:entry>
         <oasis:entry colname="col11">(1,1)</oasis:entry>
         <oasis:entry colname="col12">(1,1)</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">5</oasis:entry>
         <oasis:entry colname="col2">Conv2d</oasis:entry>
         <oasis:entry colname="col3">(32,32)</oasis:entry>
         <oasis:entry colname="col4">(3,3)</oasis:entry>
         <oasis:entry colname="col5">(1,1)</oasis:entry>
         <oasis:entry colname="col6">(1,1)</oasis:entry>
         <oasis:entry colname="col7">18</oasis:entry>
         <oasis:entry colname="col8">Conv2d</oasis:entry>
         <oasis:entry colname="col9">(256,256)</oasis:entry>
         <oasis:entry colname="col10">(3,3)</oasis:entry>
         <oasis:entry colname="col11">(1,1)</oasis:entry>
         <oasis:entry colname="col12">(1,1)</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">6</oasis:entry>
         <oasis:entry colname="col2">Conv2d</oasis:entry>
         <oasis:entry colname="col3">(32,64)</oasis:entry>
         <oasis:entry colname="col4">(2,2)</oasis:entry>
         <oasis:entry colname="col5">(2,2)</oasis:entry>
         <oasis:entry colname="col6">0</oasis:entry>
         <oasis:entry colname="col7">19</oasis:entry>
         <oasis:entry colname="col8">ConvT2d</oasis:entry>
         <oasis:entry colname="col9">(256,64)</oasis:entry>
         <oasis:entry colname="col10">(2,4)</oasis:entry>
         <oasis:entry colname="col11">(2,4)</oasis:entry>
         <oasis:entry colname="col12">0</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">7</oasis:entry>
         <oasis:entry colname="col2">Conv2d</oasis:entry>
         <oasis:entry colname="col3">(64,128)</oasis:entry>
         <oasis:entry colname="col4">(2,2)</oasis:entry>
         <oasis:entry colname="col5">(2,2)</oasis:entry>
         <oasis:entry colname="col6">0</oasis:entry>
         <oasis:entry colname="col7">20</oasis:entry>
         <oasis:entry colname="col8">Conv2d</oasis:entry>
         <oasis:entry colname="col9">(64,64)</oasis:entry>
         <oasis:entry colname="col10">(3,3)</oasis:entry>
         <oasis:entry colname="col11">(1,1)</oasis:entry>
         <oasis:entry colname="col12">(1,1)</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">8</oasis:entry>
         <oasis:entry colname="col2">Conv2d</oasis:entry>
         <oasis:entry colname="col3">(128,256)</oasis:entry>
         <oasis:entry colname="col4">(2,2)</oasis:entry>
         <oasis:entry colname="col5">(2,2)</oasis:entry>
         <oasis:entry colname="col6">0</oasis:entry>
         <oasis:entry colname="col7">21</oasis:entry>
         <oasis:entry colname="col8">Conv2d</oasis:entry>
         <oasis:entry colname="col9">(64,64)</oasis:entry>
         <oasis:entry colname="col10">(3,3)</oasis:entry>
         <oasis:entry colname="col11">(1,1)</oasis:entry>
         <oasis:entry colname="col12">(1,1)</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">9</oasis:entry>
         <oasis:entry colname="col2">Conv2d</oasis:entry>
         <oasis:entry colname="col3">(256,512)</oasis:entry>
         <oasis:entry colname="col4">(2,2)</oasis:entry>
         <oasis:entry colname="col5">(2,2)</oasis:entry>
         <oasis:entry colname="col6">0</oasis:entry>
         <oasis:entry colname="col7">22</oasis:entry>
         <oasis:entry colname="col8">ConvT2d</oasis:entry>
         <oasis:entry colname="col9">(64,8)</oasis:entry>
         <oasis:entry colname="col10">(2,4)</oasis:entry>
         <oasis:entry colname="col11">(2,4)</oasis:entry>
         <oasis:entry colname="col12">0</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">10</oasis:entry>
         <oasis:entry colname="col2">Conv2d</oasis:entry>
         <oasis:entry colname="col3">(512,1024)</oasis:entry>
         <oasis:entry colname="col4">(2,2)</oasis:entry>
         <oasis:entry colname="col5">(2,2)</oasis:entry>
         <oasis:entry colname="col6">0</oasis:entry>
         <oasis:entry colname="col7">23</oasis:entry>
         <oasis:entry colname="col8">Conv2d</oasis:entry>
         <oasis:entry colname="col9">(8,8)</oasis:entry>
         <oasis:entry colname="col10">(3,3)</oasis:entry>
         <oasis:entry colname="col11">(1,1)</oasis:entry>
         <oasis:entry colname="col12">(1,1)</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">11</oasis:entry>
         <oasis:entry colname="col2">Concat</oasis:entry>
         <oasis:entry colname="col3">(1024,1025)</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6"/>
         <oasis:entry colname="col7">24</oasis:entry>
         <oasis:entry colname="col8">Conv2d</oasis:entry>
         <oasis:entry colname="col9">(8,8)</oasis:entry>
         <oasis:entry colname="col10">(3,3)</oasis:entry>
         <oasis:entry colname="col11">(1,1)</oasis:entry>
         <oasis:entry colname="col12">(1,1)</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">12</oasis:entry>
         <oasis:entry colname="col2">ConvT2d</oasis:entry>
         <oasis:entry colname="col3">(1025,1025)</oasis:entry>
         <oasis:entry colname="col4">(2,2)</oasis:entry>
         <oasis:entry colname="col5">(2,2)</oasis:entry>
         <oasis:entry colname="col6">0</oasis:entry>
         <oasis:entry colname="col7">25</oasis:entry>
         <oasis:entry colname="col8">Conv2d</oasis:entry>
         <oasis:entry colname="col9">(8,1)</oasis:entry>
         <oasis:entry colname="col10">(1,1)</oasis:entry>
         <oasis:entry colname="col11">(1,1)</oasis:entry>
         <oasis:entry colname="col12">0</oasis:entry>
         <oasis:entry colname="col13"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">13</oasis:entry>
         <oasis:entry colname="col2">ConvT2d</oasis:entry>
         <oasis:entry colname="col3">(1025,512)</oasis:entry>
         <oasis:entry colname="col4">(2,4)</oasis:entry>
         <oasis:entry colname="col5">(2,4)</oasis:entry>
         <oasis:entry colname="col6">0</oasis:entry>
         <oasis:entry colname="col7"/>
         <oasis:entry colname="col8"/>
         <oasis:entry colname="col9"/>
         <oasis:entry colname="col10"/>
         <oasis:entry colname="col11"/>
         <oasis:entry colname="col12"/>
         <oasis:entry colname="col13"/>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<?xmltex \hack{\clearpage}?>
</app>
  </app-group><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d1e2612">All our training data were generated synthetically using the SEISMIC_CPML FD modelling library. The code to reproduce all of our data and results is available at <uri>https://github.com/benmoseley/seismic-simulation-complex-media</uri> <xref ref-type="bibr" rid="bib1.bibx37" id="paren.69"/>.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d1e2624">TNM and AM were involved in the conceptualisation, supervision and review of the work. BM was involved in the conceptualisation, data creation, methodology, investigation, software, data analysis, validation and writing.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e2630">Tarje Nissen-Meyer is a topical editor for the <italic>Solid Earth</italic> editorial board.</p>
  </notes><ack><title>Acknowledgements</title><p id="d1e2639">The authors would like to thank the Computational Infrastructure for Geodynamics (<uri>https://www.geodynamics.org/</uri>, last access: 9 August 2020) for releasing the open-source SEISMIC_CPML FD modelling libraries.
We would also like to thank Tom Le Paine for his fast WaveNet implementation on GitHub which our code was based on
(<uri>https://github.com/tomlepaine/fast-wavenet/</uri>, last access: 9 August 2020), as well as our reviewers Andrew Curtis and Andrew Valentine for their valuable and in-depth feedback.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d1e2650">This research has been supported by the Centre for Doctoral Training in Autonomous Intelligent Machines and Systems at the University of Oxford, Oxford, UK, and the UK Engineering and Physical Sciences Research Council.</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e2656">This paper was edited by Caroline Beghein and reviewed by Andrew Curtis and Andrew Valentine.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Abadi et al.(2015)</label><?label tensorflow?><mixed-citation>Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K.,
Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O.,
Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X.:
TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems, <uri>https://www.tensorflow.org</uri>, last access: 9 August 2020, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Ahmed et al.(2018)Ahmed, Saint, Shabayek, Cherenkova, Das, Gusev,
Aouada, and Ottersten</label><?label Ahmed2018?><mixed-citation>Ahmed, E., Saint, A., Shabayek, A. E. R., Cherenkova, K., Das, R., Gusev, G.,
Aouada, D., and Ottersten, B.: A survey on Deep Learning Advances on
Different 3D Data Representations, arXiv [preprint], <uri>https://arxiv.org/abs/1808.01462</uri>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Aki and Richards(1980)</label><?label Aki1980?><mixed-citation>
Aki, K. and Richards, P. G.: Quantitative seismology, W. H. Freeman and Co., New York, New York, 1980.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Araya-Polo et al.(2018)Araya-Polo, Jennings, Adler, and
Dahlke</label><?label Araya-Polo2018?><mixed-citation>
Araya-Polo, M., Jennings, J., Adler, A., and Dahlke, T.: Deep-learning
tomography, The Leading Edge, 37, 58–66, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Bergen et al.(2019)Bergen, Johnson, De Hoop, and
Beroza</label><?label Bergen2019a?><mixed-citation>Bergen, K. J., Johnson, P. A., De Hoop, M. V., and Beroza, G. C.: Machine
learning for data-driven discovery in solid Earth geoscience, Science, 363, eaau0323, <ext-link xlink:href="https://doi.org/10.1126/science.aau0323" ext-link-type="DOI">10.1126/science.aau0323</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Bohlen(2002)</label><?label Bohlen2002?><mixed-citation>
Bohlen, T.: Parallel 3-D viscoelastic finite difference seismic modelling,
Comput. Geosci., 28, 887–899, 2002.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Boore(2003)</label><?label Boore2003?><mixed-citation>
Boore, D. M.: Simulation of ground motion using the stochastic method, Pure
Appl. Geophys., 160, 635–676, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Bozdağ et al.(2016)Bozdağ, Peter, Lefebvre, Komatitsch, Tromp,
Hill, Podhorszki, and Pugmire</label><?label Bozdag2016?><mixed-citation>
Bozdağ, E., Peter, D., Lefebvre, M., Komatitsch, D., Tromp, J., Hill, J.,
Podhorszki, N., and Pugmire, D.: Global adjoint tomography: first-generation
model, Geophys. J. Int., 207, 1739–1766, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Chopra and Marfurt(2007)</label><?label Chopra2007?><mixed-citation>
Chopra, S. and Marfurt, K. J.: Seismic Attributes for Prospect Identification
and Reservoir Characterization, Society of Exploration Geophysicists and
European Association of Geoscientists and Engineers, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Cui et al.(2010)Cui, Olsen, Jordan, Lee, Zhou, Small, Roten, Ely,
Panda, Chourasia, Levesque, Day, and Maechling</label><?label Cui2010?><mixed-citation>
Cui, Y., Olsen, K. B., Jordan, T. H., Lee, K., Zhou, J., Small, P., Roten, D.,
Ely, G., Panda, D. K., Chourasia, A., Levesque, J., Day, S. M., and
Maechling, P.: Scalable Earthquake Simulation on Petascale Supercomputers,
in: 2010 ACM/IEEE International Conference for High Performance Computing,
Networking, Storage and Analysis, New Orleans, LA, USA, 13–19 November 2010, 1–20, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Devilee et al.(1999)Devilee, Curtis, and Roy-Chowdhury</label><?label Devilee1999?><mixed-citation>
Devilee, R. J. R., Curtis, A., and Roy-Chowdhury, K.: An efficient,
probabilistic neural network approach to solving inverse problems: Inverting
surface wave velocities for Eurasian crustal thickness, J.
Geophys. Res.-Sol. Ea., 104, 28841–28857, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Dowla et al.(1990)Dowla, Taylor, and Anderson</label><?label Dowla1990?><mixed-citation>
Dowla, F. U., Taylor, S. R., and Anderson, R. W.: Seismic discrimination with
artificial neural networks: Preliminary results with regional spectral data,
B. Seismol. Soc. Am., 80, 1346–1373, 1990.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Earp and Curtis(2020)</label><?label Earp2020?><mixed-citation>
Earp, S. and Curtis, A.: Probabilistic neural network-based 2D travel-time
tomography, Neural Comput. Appl., 1–19, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Fichtner(2010)</label><?label Fichtner2010?><mixed-citation>
Fichtner, A.: Full Seismic Waveform Modelling and Inversion, Springer, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Gal(2016)</label><?label Gal2016?><mixed-citation>
Gal, Y.: Uncertainty in Deep Learning, PhD thesis, University of Cambridge,  2016.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Goodfellow et al.(2016)Goodfellow, Bengio, and
Courville</label><?label Goodfellow2016?><mixed-citation>
Goodfellow, I., Bengio, Y., and Courville, A.: Deep Learning, MIT Press,
2016.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Gu et al.(2018)Gu, Wang, Kuen, Ma, Shahroudy, Shuai, Liu, Wang, Wang,
Cai, and Chen</label><?label Gu2018?><mixed-citation>
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang,
X., Wang, G., Cai, J., and Chen, T.: Recent advances in convolutional neural
networks, Pattern Recogn., 77, 354–377, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Guo et al.(2016)Guo, Li, and Iorio</label><?label Guo2016?><mixed-citation>
Guo, X., Li, W., and Iorio, F.: Convolutional Neural Networks for Steady Flow
Approximation, in: Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining – KDD '16, San  Francisco, CA, USA, August 2016,
481–490, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Gutenberg(1936)</label><?label Gutenberg1936?><mixed-citation>
Gutenberg, B.: The amplitudes of waves to be expected in seismic prospecting,
Geophysics, 1, 252–256, 1936.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Hosseini et al.(2019)Hosseini, Sigloch, Tsekhmistrenko, Zaheri,
Nissen-Meyer, and Igel</label><?label Hosseini2019?><mixed-citation>
Hosseini, K., Sigloch, K., Tsekhmistrenko, M., Zaheri, A., Nissen-Meyer, T.,
and Igel, H.: Global mantle structure from multifrequency tomography using
P, PP and P-diffracted waves, Geophys. J. Int., 220,
96–141, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Igel(2017)</label><?label Igel2017?><mixed-citation>
Igel, H.: Computational seismology: a practical introduction, Oxford
University Press, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Ioffe and Szegedy(2015)</label><?label Ioffe2015?><mixed-citation>
Ioffe, S. and Szegedy, C.: Batch normalization: Accelerating deep network
training by reducing internal covariate shift,<?pagebreak page1548?> in: 32nd International
Conference on Machine Learning, ICML 2015, 7–9 July 2015, Lille, France, 1, 448–456, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Kingma and Ba(2014)</label><?label Kingma2014?><mixed-citation>Kingma, D. P. and Ba, J.: Adam: A Method for Stochastic Optimization, arXiv [preprint], <uri>https://arxiv.org/abs/1412.6980</uri>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Komatitsch and Martin(2007)</label><?label Komatitsch2007?><mixed-citation>
Komatitsch, D. and Martin, R.: An unsplit convolutional perfectly matched
layer improved at grazing incidence for the seismic wave equation,
Geophysics, 72, SM155–SM167, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Komatitsch and Tromp(1999)</label><?label Komatitsch1999?><mixed-citation>
Komatitsch, D. and Tromp, J.: Introduction to the spectral element method for
three-dimensional seismic wave propagation, Geophys. J.
Int., 139, 806–822, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Kong et al.(2019)Kong, Trugman, Ross, Bianco, Meade, and
Gerstoft</label><?label Kong2019?><mixed-citation>
Kong, Q., Trugman, D. T., Ross, Z. E., Bianco, M. J., Meade, B. J., and
Gerstoft, P.: Machine learning in seismology: Turning data into insights,
Seismol. Res. Lett., 90, 3–14, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Krischer and Fichtner(2017)</label><?label Krischer2017?><mixed-citation>
Krischer, L. and Fichtner, A.: Generating Seismograms with Deep Neural
Networks, AGU Fall Meeting Abstracts, 11–15 December 2017, New Orleans, Louisiana, USA, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx28"><?xmltex \def\ref@label{{Krischer et~al.(2017)Krischer, Hutko, van Driel, St{\"{a}}hler,
Bahavar, Trabant, and Nissen‐Meyer}}?><label>Krischer et al.(2017)Krischer, Hutko, van Driel, Stähler,
Bahavar, Trabant, and Nissen‐Meyer</label><?label Krischer2017a?><mixed-citation>
Krischer, L., Hutko, A. R., van Driel, M., Stähler, S., Bahavar, M.,
Trabant, C., and Nissen‐Meyer, T.: On-Demand Custom Broadband Synthetic
Seismograms, Seismol. Res. Lett., 88, 1127–1140, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Leng et al.(2016)Leng, Nissen-Meyer, and van Driel</label><?label Leng2016?><mixed-citation>
Leng, K., Nissen-Meyer, T., and van Driel, M.: Efficient global wave
propagation adapted to 3-D structural complexity: a
pseudospectral/spectral-element approach, Geophys. J. Int.,
207, 1700–1721, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Leng et al.(2019)Leng, Nissen-Meyer, van Driel, Hosseini, and
Al-Attar</label><?label Leng2019?><mixed-citation>
Leng, K., Nissen-Meyer, T., van Driel, M., Hosseini, K., and Al-Attar, D.:
AxiSEM3D: broad-band seismic wavefields in 3-D global earth models with
undulating discontinuities, Geophys. J. Int., 217,
2125–2146, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Lerer et al.(2016)Lerer, Gross, and Fergus</label><?label Lerer2016?><mixed-citation>
Lerer, A., Gross, S., and Fergus, R.: Learning Physical Intuition of Block
Towers by Example, Proceedings of the 33rd International Conference on
International Conference on Machine Learning, 20–22 June 2016, New York, NY, USA, 48, 430–438, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Long et al.(2013)Long, Zhao, and Zou</label><?label Long2013?><mixed-citation>
Long, G., Zhao, Y., and Zou, J.: A temporal fourth-order scheme for the
first-order acoustic wave equations, Geophys. J. Int., 194,
1473–1485, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Lumley(2001)</label><?label Lumley2001?><mixed-citation>
Lumley, D. E.: Time-lapse seismic reservoir monitoring, Geophysics, 66,
50–53, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Margrave and Lamoureux(2018)</label><?label Margrave2018?><mixed-citation>
Margrave, G. F. and Lamoureux, M. P.: Numerical Methods of Exploration
Seismology, Cambridge University Press, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Martin et al.(2006)Martin, Wiley, and Marfurt</label><?label Martin2006?><mixed-citation>
Martin, G. S., Wiley, R., and Marfurt, K. J.: Marmousi2: An elastic upgrade
for Marmousi, Leading Edge, 25, 156–166, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Moczo et al.(2007)Moczo, Robertsson, and Eisner</label><?label Moczo2007?><mixed-citation>
Moczo, P., Robertsson, J. O., and Eisner, L.: The Finite-Difference
Time-Domain Method for Modeling of Seismic Wave Propagation, Adv.
Geophys., 48, 421–516, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Moseley(2020)</label><?label mos2019?><mixed-citation>Moseley, B.: Code repository for deep learning for fast simulation of seismic waves in complex media, available at: <uri>https://github.com/benmoseley/seismic-simulation-complex-media</uri>, last access: 9 August 2020.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Moseley et al.(2018)Moseley, Markham, and Nissen-Meyer</label><?label Moseley2018?><mixed-citation>Moseley, B., Markham, A., and Nissen-Meyer, T.: Fast approximate simulation of
seismic waves with deep learning, arXiv [preprint], <uri>https://arxiv.org/abs/1807.06873</uri>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Murat and Rudman(1992)</label><?label Murat1992?><mixed-citation>
Murat, M. E. and Rudman, A. J.: Automated first arrival picking: a neural
network approach, Geophys. Prospect., 40, 587–604, 1992.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Nair and Hinton(2010)</label><?label Nair2010?><mixed-citation>
Nair, V. and Hinton, G.: Rectified Linear Units Improve Restricted Boltzmann
Machines Vinod Nair, in: Proceedings of ICML, 21–24 June 2010, Haifa, Israel, 27, 807–814, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Nath et al.(1999)Nath, Chakraborty, Singh, and Ganguly</label><?label Nath1999?><mixed-citation>
Nath, S. K., Chakraborty, S., Singh, S. K., and Ganguly, N.: Velocity
inversion in cross-hole seismic tomography by counter-propagation neural
network, genetic algorithm and evolutionary programming techniques,
Geophys. J. Int., 138, 108–124, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Newman(1973)</label><?label Newman1973?><mixed-citation>
Newman, P.: Divergence effects in a layered earth, Geophysics, 38, 481–488, 1973.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Ni et al.(2002)Ni, Tan, Gurnis, and Helmberger</label><?label Ni2002?><mixed-citation>
Ni, S., Tan, E., Gurnis, M., and Helmberger, D.: Sharp sides to the African
superplume, Science, 296, 1850–1852, 2002.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Paganini et al.(2018)Paganini, De Oliveira, and
Nachman</label><?label Paganini2018?><mixed-citation>
Paganini, M., De Oliveira, L., and Nachman, B.: Accelerating Science with
Generative Adversarial Networks: An Application to 3D Particle Showers in
Multilayer Calorimeters, Phys. Rev. Lett., 120, 1–6, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Perol et al.(2018)Perol, Gharbi, and Denolle</label><?label Perol2018?><mixed-citation>
Perol, T., Gharbi, M., and Denolle, M.: Convolutional neural network for
earthquake detection and location, Science Advances, 4, e1700578, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Poulton et al.(1992)Poulton, Sternberg, and Glass</label><?label Poulton1992?><mixed-citation>
Poulton, M. M., Sternberg, B. K., and Glass, C. E.: Location of subsurface
targets in geophysical data using neural networks, Geophysics, 57,
1534–1544, 1992.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Pytorch(2016)</label><?label pytorch?><mixed-citation>Pytorch: available at: <uri>https://www.pytorch.org</uri> (last access: 9 August 2020), 2016.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Raissi et al.(2019)Raissi, Perdikaris, and Karniadakis</label><?label Raissi2019?><mixed-citation>
Raissi, M., Perdikaris, P., and Karniadakis, G. E.: Physics-informed neural
networks: A deep learning framework for solving forward and inverse problems
involving nonlinear partial differential equations, J. Comput.
Phys., 378, 686–707, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Richardson(2018)</label><?label Richardson2018?><mixed-citation>Richardson, A.: Seismic Full-Waveform Inversion Using Deep Learning Tools and
Techniques, arXiv [preprint], <uri>https://arxiv.org/abs/1801.07232</uri>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Rietmann et al.(2012)Rietmann, Messmer, Nissen-Meyer, Peter, Basini,
Komatitsch, Schenk, Tromp, Boschi, and Giardini</label><?label Rietmann2012?><mixed-citation>
Rietmann, M., Messmer, P., Nissen-Meyer, T., Peter, D., Basini, P., Komatitsch,
D., Schenk, O., Tromp, J., Boschi, L., and Giardini, D.: Forward and adjoint
simulations of seismic wave propagation on emerging large-scale GPU
architectures, International Conference for High Performance Computing,
Networking, Storage and Analysis, SC,  November 2012, Salt Lake City, UT, 1–11, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx51"><?xmltex \def\ref@label{{R{\"{o}}th and Tarantola(1994)}}?><label>Röth and Tarantola(1994)</label><?label Roth1994?><mixed-citation>
Röth, G. and Tarantola, A.: Neural networks and inversion of seismic
data, J. Geophys. Res., 99, 6753, 1994.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Russell(1988)</label><?label Russell1988?><mixed-citation>
Russell, B. H.: Introduction to Seismic Inversion Methods, Society of
Exploration Geophysicists,  1988.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Schuster(2017)</label><?label Schuster2017?><mixed-citation>
Schuster, G. T.: Seismic Inversion, Society of Exploration Geophysicists,
2017.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Sun and Demanet(2018)</label><?label Sun2018?><mixed-citation>
Sun, H. and Demanet, L.: Low frequency extrapolation with deep learning, 2018
SEG International Exposition and Annual Meeting, 14–19 October 2018, Anaheim, CA, USA, 2011–2015,
2018.</mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Tarantola(1987)</label><?label Tarantola1987?><mixed-citation>
Tarantola, A.: Inverse problem theory: methods for data fitting and model
parameter estimation, Elsevier, 1987.</mixed-citation></ref>
      <ref id="bib1.bibx56"><label>Thorne et al.(2020)Thorne, Pachhai, Leng, Wicks, and
Nissen-Meyer</label><?label Thorne2020?><mixed-citation>
Thorne, M. S., Pachhai, S., Leng, K., Wicks, J. K., and Nissen-Meyer, T.: New
Candidate Ultralow-Velocity Zone Locations from Highly Anomalous SPdKS
Waveforms, Minerals, 10, 211, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx57"><label>Valentine and Trampert(2012)</label><?label Valentine2012?><mixed-citation>
Valentine, A. P. and Trampert, J.: Data space reduction, quality assessment
and searching of seismograms: autoencoder networks for waveform data,
Geophys. J. Int., 189, 1183–1202, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx58"><label>van den Oord et al.(2016)van den Oord, Dieleman, Zen, Simonyan,
Vinyals, Graves, Kalchbrenner, Senior, and Kavukcuoglu</label><?label Oord2016?><mixed-citation>van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A.,
Kalchbrenner, N., Senior, A., and Kavukcuoglu, K.: WaveNet: A Generative
Model for Raw Audio, arXiv [preprint], <uri>https://arxiv.org/abs/1609.03499</uri>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx59"><label>Van Der Baan and Jutten(2000)</label><?label VanDerBaan2000?><mixed-citation>
Van Der Baan, M., and Jutten, C.: Neural networks in geophysical
applications, Geophysics, 65, 1032–1047, 2000.</mixed-citation></ref>
      <?pagebreak page1549?><ref id="bib1.bibx60"><label>van Driel and Nissen-Meyer(2014a)</label><?label VanDriel2014?><mixed-citation>
van Driel, M., and Nissen-Meyer, T.: Optimized viscoelastic wave propagation
for weakly dissipative media, Geophys. J. Int., 199,
1078–1093, 2014a.</mixed-citation></ref>
      <ref id="bib1.bibx61"><label>van Driel and Nissen-Meyer(2014b)</label><?label VanDriel2014a?><mixed-citation>
van Driel, M., and Nissen-Meyer, T.: Seismic wave propagation in fully
anisotropic axisymmetric media, Geophys. J. Int., 199,
880–893, 2014b.</mixed-citation></ref>
      <ref id="bib1.bibx62"><?xmltex \def\ref@label{{{Van Driel} et~al.(2019){Van Driel}, Ceylan, Clinton, Giardini,
Alemany, Allam, Ambrois, Balestra, Banerdt, Becker, B{\"{o}}se, Boxberg,
Brinkman, Casademont, Ch{\`{e}}ze, Daubar, Deschamps, Dethof, Ditz, Drilleau,
Essing, Euchner, Fernando, Garcia, Garth, Godwin, Golombek, Grunert,
Hadziioannou, Haindl, Hammer, Hochfeld, Hosseini, Hu, Kedar, Kenda, Khan,
Kilchling, Knapmeyer-Endrun, Lamert, Li, Lognonn{\'{e}}, Mader, Marten,
Mehrkens, Mercerat, Mimoun, M{\"{o}}ller, Murdoch, Neumann, Neurath,
Paffrath, Panning, Peix, Perrin, Rolland, Schimmel, Schr{\"{o}}er, Spiga,
St{\"{a}}hler, Steinmann, Stutzmann, Szenicer, Trumpik, Tsekhmistrenko,
Twardzik, Weber, Werdenbach-Jarklowski, Zhang, and Zheng}}?><label>Van Driel et al.(2019)Van Driel, Ceylan, Clinton, Giardini,
Alemany, Allam, Ambrois, Balestra, Banerdt, Becker, Böse, Boxberg,
Brinkman, Casademont, Chèze, Daubar, Deschamps, Dethof, Ditz, Drilleau,
Essing, Euchner, Fernando, Garcia, Garth, Godwin, Golombek, Grunert,
Hadziioannou, Haindl, Hammer, Hochfeld, Hosseini, Hu, Kedar, Kenda, Khan,
Kilchling, Knapmeyer-Endrun, Lamert, Li, Lognonné, Mader, Marten,
Mehrkens, Mercerat, Mimoun, Möller, Murdoch, Neumann, Neurath,
Paffrath, Panning, Peix, Perrin, Rolland, Schimmel, Schröer, Spiga,
Stähler, Steinmann, Stutzmann, Szenicer, Trumpik, Tsekhmistrenko,
Twardzik, Weber, Werdenbach-Jarklowski, Zhang, and Zheng</label><?label VanDriel2019?><mixed-citation>
van Driel, M., Ceylan, S., Clinton, J. F., Giardini, D., Alemany, H., Allam,
A., Ambrois, D., Balestra, J., Banerdt, B., Becker, D., Böse, M.,
Boxberg, M. S., Brinkman, N., Casademont, T., Chèze, J., Daubar, I.,
Deschamps, A., Dethof, F., Ditz, M., Drilleau, M., Essing, D., Euchner, F.,
Fernando, B., Garcia, R., Garth, T., Godwin, H., Golombek, M. P., Grunert,
K., Hadziioannou, C., Haindl, C., Hammer, C., Hochfeld, I., Hosseini, K., Hu,
H., Kedar, S., Kenda, B., Khan, A., Kilchling, T., Knapmeyer-Endrun, B.,
Lamert, A., Li, J., Lognonné, P., Mader, S., Marten, L., Mehrkens, F.,
Mercerat, D., Mimoun, D., Möller, T., Murdoch, N., Neumann, P.,
Neurath, R., Paffrath, M., Panning, M. P., Peix, F., Perrin, L., Rolland, L.,
Schimmel, M., Schröer, C., Spiga, A., Stähler, S. C., Steinmann,
R., Stutzmann, E., Szenicer, A., Trumpik, N., Tsekhmistrenko, M., Twardzik,
C., Weber, R., Werdenbach-Jarklowski, P., Zhang, S., and Zheng, Y.:
Preparing for InSight: Evaluation of the blind test for martian seismicity,
Seismol. Res. Lett., 90, 1518–1534, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx63"><label>Vaswani et al.(2017)Vaswani, Shazeer, Parmar, Uszkoreit, Jones,
Gomez, Kaiser, and Polosukhin</label><?label Vaswani2017?><mixed-citation>Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N.,
Kaiser, L., and Polosukhin, I.: Attention Is All You Need, arXiv [preprint], <uri>https://arxiv.org/abs/1706.03762</uri>,
2017.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx64"><label>Vinje et al.(1993)Vinje, Iversen, and Gjoystdal</label><?label Vinje1993?><mixed-citation>
Vinje, V., Iversen, E., and Gjoystdal, H.: Traveltime and amplitude estimation
using wavefront construction, Geophysics, 58, 1157–1166, 1993.</mixed-citation></ref>
      <ref id="bib1.bibx65"><label>Virieux and Operto(2009)</label><?label Virieux2009?><mixed-citation>
Virieux, J. and Operto, S.: An overview of full-waveform inversion in
exploration geophysics, Geophysics, 74, 6, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx66"><label>Wu and Lin(2018)</label><?label Wu2018?><mixed-citation>Wu, Y. and Lin, Y.: InversionNet: A Real-Time and Accurate Full Waveform
Inversion with CNNs and continuous CRFs, arXiv [preprint], <uri>https://arxiv.org/abs/1811.07875</uri>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx67"><label>Xie et al.(2006)Xie, Jin, and Wu</label><?label Xie2006?><mixed-citation>
Xie, X.-B., Jin, S., and Wu, R.-S.: Wave-equation-based seismic illumination
analysis, Geophysics, 71, S169–S177, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx68"><label>Yang and Ma(2019)</label><?label Yang2019?><mixed-citation>
Yang, F. and Ma, J.: Deep-learning inversion: A next-generation seismic
velocity model building method, Geophysics, 84, R583–R599, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx69"><label>Zhang and Lin(2018)</label><?label Zhang2018?><mixed-citation>Zhang, Z. and Lin, Y.: Data-driven Seismic Waveform Inversion: A Study on the
Robustness and Generalization,arXiv [preprint], <uri>https://arxiv.org/abs/1809.10262</uri>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx70"><label>Zhu et al.(2017)Zhu, Sheng, and Sun</label><?label Zhu2017?><mixed-citation>
Zhu, W., Sheng, Y., and Sun, Y.: Wave-dynamics simulation using deep neural
networks, Stanford Report, Stanford Vision and Learning Lab, Stanford University, CA, USA, 2017.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Deep learning for fast simulation of seismic waves in complex media</article-title-html>
<abstract-html><p>The simulation of seismic waves is a core task in many geophysical applications. Numerical methods such as finite difference (FD) modelling and spectral element methods (SEMs) are the most popular techniques for simulating seismic waves, but disadvantages such as their computational cost prohibit their use for many tasks. In this work, we investigate the potential of deep learning for aiding seismic simulation in the solid Earth sciences. We present two deep neural networks which are able to simulate the seismic response at multiple locations in horizontally layered and faulted 2-D acoustic media an order of magnitude faster than traditional finite difference modelling. The first network is able to simulate the seismic response in horizontally layered media and uses a WaveNet network architecture design.  The second network is significantly more general than the first and is able to simulate the seismic response in faulted media with arbitrary layers, fault properties and an arbitrary location of the seismic source on the surface of the media, using a conditional autoencoder design. We test the sensitivity of the accuracy of both networks to different network hyperparameters and show that the WaveNet network can be retrained to carry out fast seismic inversion in the same media. We find that are there are challenges when extending our methods to more complex, elastic and 3-D Earth models; for example, the accuracy of both networks is reduced when they are tested on models outside of their training distribution. We discuss further research directions which could address these challenges and potentially yield useful tools for practical simulation tasks.</p></abstract-html>
<ref-html id="bib1.bib1"><label>Abadi et al.(2015)</label><mixed-citation>
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K.,
Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O.,
Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X.:
TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems, <a href="https://www.tensorflow.org" target="_blank"/>, last access: 9 August 2020, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Ahmed et al.(2018)Ahmed, Saint, Shabayek, Cherenkova, Das, Gusev,
Aouada, and Ottersten</label><mixed-citation>
Ahmed, E., Saint, A., Shabayek, A. E. R., Cherenkova, K., Das, R., Gusev, G.,
Aouada, D., and Ottersten, B.: A survey on Deep Learning Advances on
Different 3D Data Representations, arXiv [preprint], <a href="https://arxiv.org/abs/1808.01462" target="_blank"/>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Aki and Richards(1980)</label><mixed-citation>
Aki, K. and Richards, P. G.: Quantitative seismology, W. H. Freeman and Co., New York, New York, 1980.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Araya-Polo et al.(2018)Araya-Polo, Jennings, Adler, and
Dahlke</label><mixed-citation>
Araya-Polo, M., Jennings, J., Adler, A., and Dahlke, T.: Deep-learning
tomography, The Leading Edge, 37, 58–66, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Bergen et al.(2019)Bergen, Johnson, De Hoop, and
Beroza</label><mixed-citation>
Bergen, K. J., Johnson, P. A., De Hoop, M. V., and Beroza, G. C.: Machine
learning for data-driven discovery in solid Earth geoscience, Science, 363, eaau0323, <a href="https://doi.org/10.1126/science.aau0323" target="_blank">https://doi.org/10.1126/science.aau0323</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Bohlen(2002)</label><mixed-citation>
Bohlen, T.: Parallel 3-D viscoelastic finite difference seismic modelling,
Comput. Geosci., 28, 887–899, 2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Boore(2003)</label><mixed-citation>
Boore, D. M.: Simulation of ground motion using the stochastic method, Pure
Appl. Geophys., 160, 635–676, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Bozdağ et al.(2016)Bozdağ, Peter, Lefebvre, Komatitsch, Tromp,
Hill, Podhorszki, and Pugmire</label><mixed-citation>
Bozdağ, E., Peter, D., Lefebvre, M., Komatitsch, D., Tromp, J., Hill, J.,
Podhorszki, N., and Pugmire, D.: Global adjoint tomography: first-generation
model, Geophys. J. Int., 207, 1739–1766, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Chopra and Marfurt(2007)</label><mixed-citation>
Chopra, S. and Marfurt, K. J.: Seismic Attributes for Prospect Identification
and Reservoir Characterization, Society of Exploration Geophysicists and
European Association of Geoscientists and Engineers, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Cui et al.(2010)Cui, Olsen, Jordan, Lee, Zhou, Small, Roten, Ely,
Panda, Chourasia, Levesque, Day, and Maechling</label><mixed-citation>
Cui, Y., Olsen, K. B., Jordan, T. H., Lee, K., Zhou, J., Small, P., Roten, D.,
Ely, G., Panda, D. K., Chourasia, A., Levesque, J., Day, S. M., and
Maechling, P.: Scalable Earthquake Simulation on Petascale Supercomputers,
in: 2010 ACM/IEEE International Conference for High Performance Computing,
Networking, Storage and Analysis, New Orleans, LA, USA, 13–19 November 2010, 1–20, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Devilee et al.(1999)Devilee, Curtis, and Roy-Chowdhury</label><mixed-citation>
Devilee, R. J. R., Curtis, A., and Roy-Chowdhury, K.: An efficient,
probabilistic neural network approach to solving inverse problems: Inverting
surface wave velocities for Eurasian crustal thickness, J.
Geophys. Res.-Sol. Ea., 104, 28841–28857, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Dowla et al.(1990)Dowla, Taylor, and Anderson</label><mixed-citation>
Dowla, F. U., Taylor, S. R., and Anderson, R. W.: Seismic discrimination with
artificial neural networks: Preliminary results with regional spectral data,
B. Seismol. Soc. Am., 80, 1346–1373, 1990.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Earp and Curtis(2020)</label><mixed-citation>
Earp, S. and Curtis, A.: Probabilistic neural network-based 2D travel-time
tomography, Neural Comput. Appl., 1–19, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Fichtner(2010)</label><mixed-citation>
Fichtner, A.: Full Seismic Waveform Modelling and Inversion, Springer, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Gal(2016)</label><mixed-citation>
Gal, Y.: Uncertainty in Deep Learning, PhD thesis, University of Cambridge,  2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Goodfellow et al.(2016)Goodfellow, Bengio, and
Courville</label><mixed-citation>
Goodfellow, I., Bengio, Y., and Courville, A.: Deep Learning, MIT Press,
2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Gu et al.(2018)Gu, Wang, Kuen, Ma, Shahroudy, Shuai, Liu, Wang, Wang,
Cai, and Chen</label><mixed-citation>
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang,
X., Wang, G., Cai, J., and Chen, T.: Recent advances in convolutional neural
networks, Pattern Recogn., 77, 354–377, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Guo et al.(2016)Guo, Li, and Iorio</label><mixed-citation>
Guo, X., Li, W., and Iorio, F.: Convolutional Neural Networks for Steady Flow
Approximation, in: Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining – KDD '16, San  Francisco, CA, USA, August 2016,
481–490, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Gutenberg(1936)</label><mixed-citation>
Gutenberg, B.: The amplitudes of waves to be expected in seismic prospecting,
Geophysics, 1, 252–256, 1936.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Hosseini et al.(2019)Hosseini, Sigloch, Tsekhmistrenko, Zaheri,
Nissen-Meyer, and Igel</label><mixed-citation>
Hosseini, K., Sigloch, K., Tsekhmistrenko, M., Zaheri, A., Nissen-Meyer, T.,
and Igel, H.: Global mantle structure from multifrequency tomography using
P, PP and P-diffracted waves, Geophys. J. Int., 220,
96–141, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Igel(2017)</label><mixed-citation>
Igel, H.: Computational seismology: a practical introduction, Oxford
University Press, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Ioffe and Szegedy(2015)</label><mixed-citation>
Ioffe, S. and Szegedy, C.: Batch normalization: Accelerating deep network
training by reducing internal covariate shift, in: 32nd International
Conference on Machine Learning, ICML 2015, 7–9 July 2015, Lille, France, 1, 448–456, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Kingma and Ba(2014)</label><mixed-citation>
Kingma, D. P. and Ba, J.: Adam: A Method for Stochastic Optimization, arXiv [preprint], <a href="https://arxiv.org/abs/1412.6980" target="_blank"/>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Komatitsch and Martin(2007)</label><mixed-citation>
Komatitsch, D. and Martin, R.: An unsplit convolutional perfectly matched
layer improved at grazing incidence for the seismic wave equation,
Geophysics, 72, SM155–SM167, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Komatitsch and Tromp(1999)</label><mixed-citation>
Komatitsch, D. and Tromp, J.: Introduction to the spectral element method for
three-dimensional seismic wave propagation, Geophys. J.
Int., 139, 806–822, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Kong et al.(2019)Kong, Trugman, Ross, Bianco, Meade, and
Gerstoft</label><mixed-citation>
Kong, Q., Trugman, D. T., Ross, Z. E., Bianco, M. J., Meade, B. J., and
Gerstoft, P.: Machine learning in seismology: Turning data into insights,
Seismol. Res. Lett., 90, 3–14, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Krischer and Fichtner(2017)</label><mixed-citation>
Krischer, L. and Fichtner, A.: Generating Seismograms with Deep Neural
Networks, AGU Fall Meeting Abstracts, 11–15 December 2017, New Orleans, Louisiana, USA, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Krischer et al.(2017)Krischer, Hutko, van Driel, Stähler,
Bahavar, Trabant, and Nissen‐Meyer</label><mixed-citation>
Krischer, L., Hutko, A. R., van Driel, M., Stähler, S., Bahavar, M.,
Trabant, C., and Nissen‐Meyer, T.: On-Demand Custom Broadband Synthetic
Seismograms, Seismol. Res. Lett., 88, 1127–1140, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Leng et al.(2016)Leng, Nissen-Meyer, and van Driel</label><mixed-citation>
Leng, K., Nissen-Meyer, T., and van Driel, M.: Efficient global wave
propagation adapted to 3-D structural complexity: a
pseudospectral/spectral-element approach, Geophys. J. Int.,
207, 1700–1721, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Leng et al.(2019)Leng, Nissen-Meyer, van Driel, Hosseini, and
Al-Attar</label><mixed-citation>
Leng, K., Nissen-Meyer, T., van Driel, M., Hosseini, K., and Al-Attar, D.:
AxiSEM3D: broad-band seismic wavefields in 3-D global earth models with
undulating discontinuities, Geophys. J. Int., 217,
2125–2146, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Lerer et al.(2016)Lerer, Gross, and Fergus</label><mixed-citation>
Lerer, A., Gross, S., and Fergus, R.: Learning Physical Intuition of Block
Towers by Example, Proceedings of the 33rd International Conference on
International Conference on Machine Learning, 20–22 June 2016, New York, NY, USA, 48, 430–438, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Long et al.(2013)Long, Zhao, and Zou</label><mixed-citation>
Long, G., Zhao, Y., and Zou, J.: A temporal fourth-order scheme for the
first-order acoustic wave equations, Geophys. J. Int., 194,
1473–1485, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Lumley(2001)</label><mixed-citation>
Lumley, D. E.: Time-lapse seismic reservoir monitoring, Geophysics, 66,
50–53, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Margrave and Lamoureux(2018)</label><mixed-citation>
Margrave, G. F. and Lamoureux, M. P.: Numerical Methods of Exploration
Seismology, Cambridge University Press, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Martin et al.(2006)Martin, Wiley, and Marfurt</label><mixed-citation>
Martin, G. S., Wiley, R., and Marfurt, K. J.: Marmousi2: An elastic upgrade
for Marmousi, Leading Edge, 25, 156–166, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Moczo et al.(2007)Moczo, Robertsson, and Eisner</label><mixed-citation>
Moczo, P., Robertsson, J. O., and Eisner, L.: The Finite-Difference
Time-Domain Method for Modeling of Seismic Wave Propagation, Adv.
Geophys., 48, 421–516, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Moseley(2020)</label><mixed-citation>
Moseley, B.: Code repository for deep learning for fast simulation of seismic waves in complex media, available at: <a href="https://github.com/benmoseley/seismic-simulation-complex-media" target="_blank"/>, last access: 9 August 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Moseley et al.(2018)Moseley, Markham, and Nissen-Meyer</label><mixed-citation>
Moseley, B., Markham, A., and Nissen-Meyer, T.: Fast approximate simulation of
seismic waves with deep learning, arXiv [preprint], <a href="https://arxiv.org/abs/1807.06873" target="_blank"/>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Murat and Rudman(1992)</label><mixed-citation>
Murat, M. E. and Rudman, A. J.: Automated first arrival picking: a neural
network approach, Geophys. Prospect., 40, 587–604, 1992.
</mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Nair and Hinton(2010)</label><mixed-citation>
Nair, V. and Hinton, G.: Rectified Linear Units Improve Restricted Boltzmann
Machines Vinod Nair, in: Proceedings of ICML, 21–24 June 2010, Haifa, Israel, 27, 807–814, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Nath et al.(1999)Nath, Chakraborty, Singh, and Ganguly</label><mixed-citation>
Nath, S. K., Chakraborty, S., Singh, S. K., and Ganguly, N.: Velocity
inversion in cross-hole seismic tomography by counter-propagation neural
network, genetic algorithm and evolutionary programming techniques,
Geophys. J. Int., 138, 108–124, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Newman(1973)</label><mixed-citation>
Newman, P.: Divergence effects in a layered earth, Geophysics, 38, 481–488, 1973.
</mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Ni et al.(2002)Ni, Tan, Gurnis, and Helmberger</label><mixed-citation>
Ni, S., Tan, E., Gurnis, M., and Helmberger, D.: Sharp sides to the African
superplume, Science, 296, 1850–1852, 2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Paganini et al.(2018)Paganini, De Oliveira, and
Nachman</label><mixed-citation>
Paganini, M., De Oliveira, L., and Nachman, B.: Accelerating Science with
Generative Adversarial Networks: An Application to 3D Particle Showers in
Multilayer Calorimeters, Phys. Rev. Lett., 120, 1–6, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Perol et al.(2018)Perol, Gharbi, and Denolle</label><mixed-citation>
Perol, T., Gharbi, M., and Denolle, M.: Convolutional neural network for
earthquake detection and location, Science Advances, 4, e1700578, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Poulton et al.(1992)Poulton, Sternberg, and Glass</label><mixed-citation>
Poulton, M. M., Sternberg, B. K., and Glass, C. E.: Location of subsurface
targets in geophysical data using neural networks, Geophysics, 57,
1534–1544, 1992.
</mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Pytorch(2016)</label><mixed-citation>
Pytorch: available at: <a href="https://www.pytorch.org" target="_blank"/> (last access: 9 August 2020), 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Raissi et al.(2019)Raissi, Perdikaris, and Karniadakis</label><mixed-citation>
Raissi, M., Perdikaris, P., and Karniadakis, G. E.: Physics-informed neural
networks: A deep learning framework for solving forward and inverse problems
involving nonlinear partial differential equations, J. Comput.
Phys., 378, 686–707, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Richardson(2018)</label><mixed-citation>
Richardson, A.: Seismic Full-Waveform Inversion Using Deep Learning Tools and
Techniques, arXiv [preprint], <a href="https://arxiv.org/abs/1801.07232" target="_blank"/>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Rietmann et al.(2012)Rietmann, Messmer, Nissen-Meyer, Peter, Basini,
Komatitsch, Schenk, Tromp, Boschi, and Giardini</label><mixed-citation>
Rietmann, M., Messmer, P., Nissen-Meyer, T., Peter, D., Basini, P., Komatitsch,
D., Schenk, O., Tromp, J., Boschi, L., and Giardini, D.: Forward and adjoint
simulations of seismic wave propagation on emerging large-scale GPU
architectures, International Conference for High Performance Computing,
Networking, Storage and Analysis, SC,  November 2012, Salt Lake City, UT, 1–11, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Röth and Tarantola(1994)</label><mixed-citation>
Röth, G. and Tarantola, A.: Neural networks and inversion of seismic
data, J. Geophys. Res., 99, 6753, 1994.
</mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Russell(1988)</label><mixed-citation>
Russell, B. H.: Introduction to Seismic Inversion Methods, Society of
Exploration Geophysicists,  1988.
</mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Schuster(2017)</label><mixed-citation>
Schuster, G. T.: Seismic Inversion, Society of Exploration Geophysicists,
2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Sun and Demanet(2018)</label><mixed-citation>
Sun, H. and Demanet, L.: Low frequency extrapolation with deep learning, 2018
SEG International Exposition and Annual Meeting, 14–19 October 2018, Anaheim, CA, USA, 2011–2015,
2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Tarantola(1987)</label><mixed-citation>
Tarantola, A.: Inverse problem theory: methods for data fitting and model
parameter estimation, Elsevier, 1987.
</mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>Thorne et al.(2020)Thorne, Pachhai, Leng, Wicks, and
Nissen-Meyer</label><mixed-citation>
Thorne, M. S., Pachhai, S., Leng, K., Wicks, J. K., and Nissen-Meyer, T.: New
Candidate Ultralow-Velocity Zone Locations from Highly Anomalous SPdKS
Waveforms, Minerals, 10, 211, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Valentine and Trampert(2012)</label><mixed-citation>
Valentine, A. P. and Trampert, J.: Data space reduction, quality assessment
and searching of seismograms: autoencoder networks for waveform data,
Geophys. J. Int., 189, 1183–1202, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>van den Oord et al.(2016)van den Oord, Dieleman, Zen, Simonyan,
Vinyals, Graves, Kalchbrenner, Senior, and Kavukcuoglu</label><mixed-citation>
van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A.,
Kalchbrenner, N., Senior, A., and Kavukcuoglu, K.: WaveNet: A Generative
Model for Raw Audio, arXiv [preprint], <a href="https://arxiv.org/abs/1609.03499" target="_blank"/>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>Van Der Baan and Jutten(2000)</label><mixed-citation>
Van Der Baan, M., and Jutten, C.: Neural networks in geophysical
applications, Geophysics, 65, 1032–1047, 2000.
</mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>van Driel and Nissen-Meyer(2014a)</label><mixed-citation>
van Driel, M., and Nissen-Meyer, T.: Optimized viscoelastic wave propagation
for weakly dissipative media, Geophys. J. Int., 199,
1078–1093, 2014a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>van Driel and Nissen-Meyer(2014b)</label><mixed-citation>
van Driel, M., and Nissen-Meyer, T.: Seismic wave propagation in fully
anisotropic axisymmetric media, Geophys. J. Int., 199,
880–893, 2014b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>Van Driel et al.(2019)Van Driel, Ceylan, Clinton, Giardini,
Alemany, Allam, Ambrois, Balestra, Banerdt, Becker, Böse, Boxberg,
Brinkman, Casademont, Chèze, Daubar, Deschamps, Dethof, Ditz, Drilleau,
Essing, Euchner, Fernando, Garcia, Garth, Godwin, Golombek, Grunert,
Hadziioannou, Haindl, Hammer, Hochfeld, Hosseini, Hu, Kedar, Kenda, Khan,
Kilchling, Knapmeyer-Endrun, Lamert, Li, Lognonné, Mader, Marten,
Mehrkens, Mercerat, Mimoun, Möller, Murdoch, Neumann, Neurath,
Paffrath, Panning, Peix, Perrin, Rolland, Schimmel, Schröer, Spiga,
Stähler, Steinmann, Stutzmann, Szenicer, Trumpik, Tsekhmistrenko,
Twardzik, Weber, Werdenbach-Jarklowski, Zhang, and Zheng</label><mixed-citation>
van Driel, M., Ceylan, S., Clinton, J. F., Giardini, D., Alemany, H., Allam,
A., Ambrois, D., Balestra, J., Banerdt, B., Becker, D., Böse, M.,
Boxberg, M. S., Brinkman, N., Casademont, T., Chèze, J., Daubar, I.,
Deschamps, A., Dethof, F., Ditz, M., Drilleau, M., Essing, D., Euchner, F.,
Fernando, B., Garcia, R., Garth, T., Godwin, H., Golombek, M. P., Grunert,
K., Hadziioannou, C., Haindl, C., Hammer, C., Hochfeld, I., Hosseini, K., Hu,
H., Kedar, S., Kenda, B., Khan, A., Kilchling, T., Knapmeyer-Endrun, B.,
Lamert, A., Li, J., Lognonné, P., Mader, S., Marten, L., Mehrkens, F.,
Mercerat, D., Mimoun, D., Möller, T., Murdoch, N., Neumann, P.,
Neurath, R., Paffrath, M., Panning, M. P., Peix, F., Perrin, L., Rolland, L.,
Schimmel, M., Schröer, C., Spiga, A., Stähler, S. C., Steinmann,
R., Stutzmann, E., Szenicer, A., Trumpik, N., Tsekhmistrenko, M., Twardzik,
C., Weber, R., Werdenbach-Jarklowski, P., Zhang, S., and Zheng, Y.:
Preparing for InSight: Evaluation of the blind test for martian seismicity,
Seismol. Res. Lett., 90, 1518–1534, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>Vaswani et al.(2017)Vaswani, Shazeer, Parmar, Uszkoreit, Jones,
Gomez, Kaiser, and Polosukhin</label><mixed-citation>
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N.,
Kaiser, L., and Polosukhin, I.: Attention Is All You Need, arXiv [preprint], <a href="https://arxiv.org/abs/1706.03762" target="_blank"/>,
2017.

</mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>Vinje et al.(1993)Vinje, Iversen, and Gjoystdal</label><mixed-citation>
Vinje, V., Iversen, E., and Gjoystdal, H.: Traveltime and amplitude estimation
using wavefront construction, Geophysics, 58, 1157–1166, 1993.
</mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>Virieux and Operto(2009)</label><mixed-citation>
Virieux, J. and Operto, S.: An overview of full-waveform inversion in
exploration geophysics, Geophysics, 74, 6, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>Wu and Lin(2018)</label><mixed-citation>
Wu, Y. and Lin, Y.: InversionNet: A Real-Time and Accurate Full Waveform
Inversion with CNNs and continuous CRFs, arXiv [preprint], <a href="https://arxiv.org/abs/1811.07875" target="_blank"/>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>Xie et al.(2006)Xie, Jin, and Wu</label><mixed-citation>
Xie, X.-B., Jin, S., and Wu, R.-S.: Wave-equation-based seismic illumination
analysis, Geophysics, 71, S169–S177, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib68"><label>Yang and Ma(2019)</label><mixed-citation>
Yang, F. and Ma, J.: Deep-learning inversion: A next-generation seismic
velocity model building method, Geophysics, 84, R583–R599, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib69"><label>Zhang and Lin(2018)</label><mixed-citation>
Zhang, Z. and Lin, Y.: Data-driven Seismic Waveform Inversion: A Study on the
Robustness and Generalization,arXiv [preprint], <a href="https://arxiv.org/abs/1809.10262" target="_blank"/>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib70"><label>Zhu et al.(2017)Zhu, Sheng, and Sun</label><mixed-citation>
Zhu, W., Sheng, Y., and Sun, Y.: Wave-dynamics simulation using deep neural
networks, Stanford Report, Stanford Vision and Learning Lab, Stanford University, CA, USA, 2017.
</mixed-citation></ref-html>--></article>
