The rapid characterisation of earthquake parameters such as its magnitude is at the heart of earthquake early warning (EEW). In traditional EEW methods, the robustness in the estimation of earthquake parameters has been observed to increase with the length of input data. Since time is a crucial factor in EEW applications, in this paper we propose a deep-learning-based magnitude classifier based on data from a single seismic station and further investigate the effect of using five different durations of seismic waveform data after first P-wave arrival: 1, 3, 10, 20 and 30 s. This is accomplished by testing the performance of the proposed model that combines convolution and bidirectional long short-term memory units to classify waveforms based on their magnitude into three classes: “noise”, “low-magnitude events” and “high-magnitude events”. Herein, any earthquake signal with magnitude equal to or above 5.0 is labelled as “high-magnitude”. We show that the variation in the results produced by changing the length of the data is no more than the inherent randomness in the trained models due to their initialisation. We further demonstrate that the model is able to successfully classify waveforms over wide ranges of both hypocentral distance and signal-to-noise ratio.

The earthquake magnitude, defined as a logarithmic measure of the relative
strength of an earthquake, is one of the most fundamental parameters in its
characterisation (Mousavi and Beroza, 2020). The complex nature of
the geophysical processes affecting earthquakes makes it very difficult to
have a single reliable measure for its size (Kanamori and Stewart, 1978), and
hence, magnitude values measured in different scales often differ by more
than 1 unit. This is especially true for larger events due to saturation
effects (Howell Jr, 1981; Kanamori, 1983). Owing to the above-mentioned reasons
and the empirical nature of the majority of the magnitude scales, it is one of
the most difficult parameters to estimate (Chung and Bernreuter, 1981;
Ekström and Dziewonski, 1988). Some of the classical approaches to
obtain first estimates of earthquake magnitude have used empirical relations
for parameters such as predominant period

The recent developments in the area of deep learning (LeCun et al., 2015), combined with the availability of affordable high-end computational power through graphics processing units (GPUs), have led to state-of-the-art results in image recognition (Krizhevsky et al., 2017; He et al., 2016), speech recognition (Mikolov et al., 2011; Hinton et al., 2012) and natural language processing (Peters et al., 2018; Collobert et al., 2011). In fields such as seismology, where the volume of available data has increased exponentially over the last decades (Kong et al., 2018), deep learning has achieved great success in tasks such as seismic phase picking (Zhu and Beroza, 2019; Liao et al., 2021; Li et al., 2021), event detection (Wang and Teng, 1995; Mousavi et al., 2020; Meier et al., 2019), magnitude estimation (Mousavi and Beroza, 2020), event location characterisation (Perol et al., 2018; Panakkat and Adeli, 2009; Kuyuk and Susumu, 2018) and first-motion polarity detection (Ross et al., 2018).

Considering that timeliness is of the essence in rapid earthquake characterisation, it becomes important to find an optimum duration for the input data that can provide a reliable and statistically significant estimate for various earthquake parameters while using a minimum amount of P-wave data. In this study, we present a deep learning model to perform time series multiclass classification (Fawaz et al., 2019; Aly, 2005) that classifies seismic waveforms as “noise”, “low-magnitude” or “high-magnitude”. Here a local magnitude of 5.0 is taken to be the boundary between the low-magnitude and high-magnitude classes. We further investigate the effect of using different lengths of data on the model performance. Please note that the boundary of 5.0 is arbitrarily chosen and can be modified depending on the purpose of the model and the local geology (which influences the correlation between earthquake magnitude and intensity). Magnitudes of 3 and 4 were also experimented with as decision boundaries, and accuracy, precision and recall values in either case were found to be similar to those for magnitude 5. Thus, the decision boundary in itself does not seem to influence the model performance. Unlike Saad et al. (2020), who use data from three seismic stations to characterise different earthquake parameters, the model discussed in this paper only uses three-component data from a single station.

We use data from the STanford EArthquake Dataset (STEAD) (Mousavi et al., 2019) (see “Data availability”) to train and test our model. STEAD is a high-quality bench-marked dataset created for machine learning and deep learning applications and contains seismic event and noise waveforms of 1 min duration recorded by over 2500 seismic stations across the globe. The waveforms have been detrended and filtered with a bandpass filter between 1.0 and 40.0 Hz, followed by a resampling at 100 Hz. Metadata consisting of 35 attributes for earthquake traces and 8 attributes for noise traces are provided by the authors.

To ensure consistency in magnitude we only use traces for which the
magnitude is provided in “ml” (local magnitude) scale (as this is the case for most of the
traces in the dataset). We also discard traces with signal-to-noise ratio
less than 10 dB for quality control. We divide the noise and earthquake
traces into training, validation and test sets in the ratio

Original distribution (prior to data augmentation) of

As mentioned earlier, we take a local magnitude 5.0 to be the decision
boundary between high-magnitude and low-magnitude events. However, the
training dataset originally has a magnitude distribution as shown in Fig. 1; this would lead to a high imbalance between the low-magnitude and
high-magnitude classes (a ratio of nearly

Events with magnitude equal to or above 5.0 are represented 20 times in the dataset by using a shifting window starting from 300 samples to 280 samples before the first P-arrival sample, the window being shifted by 2 samples for each representation. Each of these traces are also flipped; i.e. their polarity is reversed, since it does not affect the magnitude information of the data. Such data augmentation techniques used for images have also been found to be useful for time series data (Batista et al., 2004; Wen et al., 2021).

For low-magnitude events the following strategy of random
undersampling is adopted.

All events with magnitude between 4.5 and 5.0 are used.

A total of

A total of

A total of

A total of

The distribution of classes in the training dataset obtained by undersampling “noise” (represented by class 0) and “low-magnitude” (represented by class 1) data and applying data augmentation to “high-magnitude” (represented by class 2) events. A similar distribution of classes is seen in the validation and test datasets as well.

The model architecture (Chakraborty et al., 2021) consists of two sets of 1D
convolution (Kiranyaz et al., 2021), dropout (Srivastava et al., 2014) and
max-pooling (Nagi et al., 2011) layers, followed by three bi-directional
long short-term memory (LSTM) layers (Hochreiter and Schmidhuber, 1997).
Convolutional neural networks have often been found to be useful for
seismological data analysis as they are capable of extracting temporally
independent patterns in the data (features). When combined with LSTMs the
temporal relations between these features can be obtained. In applications
such as magnitude-based classification of earthquakes, this aids in the
effective analysis of signal features as compared to the pre-signal
background noise. The dropout layers are used to prevent the model from
overfitting, and the max-pooling layer is a method to reduce the data
dimensionality so that only relevant features can be retained. The final
layer is a softmax layer (Goodfellow et al., 2016), which gives a
three-element array of the form [

The architecture of the model used to perform the three-class
classification. The input to the model is three-component seismic waveform data
from a single station. The example shown here corresponds to the case where
3 s of P-wave data is used (the total length of data is, thus, 6 s). The 1D convolution layers have a kernel size of four and eight filters
each; the drop rate for each dropout layer is 0.2, and each max-pooling layer
reduces the size of the data by a factor of 4; the bi-LSTM layers have
dimensions of 256, 256 and 128, respectively. The final layer is a softmax
layer that outputs the probability of the trace belonging to classes 0
(noise), 1 (low-magnitude) and 2 (high-magnitude), represented here as

The model is trained using an Adam optimiser (Kingma and Ba, 2015), categorical
cross-entropy loss (Murphy, 2012) and a batch size of 256. Early stopping
(Prechelt, 2012) is used to prevent overfitting, whereby the validation loss
is monitored, and the training stops when there is no reduction in it for 20
consecutive epochs. We start with a learning rate of 10

To analyse the effect of different lengths of data on the performance of the
classifier model, we use the metrics listed below to evaluate the model
performance. The metrics are calculated in terms of true positive (TP),
true negative (TN), false positive (FP) and false negative (FN) samples.

Examples of waveforms that have been correctly classified. In each case the highest probability corresponds to the respective class.

Softmax probabilities for different input lengths of the same waveform, predicted by the models trained on the corresponding lengths of data. The waveform used here corresponds to an event of magnitude 2.8; although the maximum probability corresponds to class 1, the values of these probabilities are different for different data lengths, and there is no clear dependence between the length of the data and this probability.

Figure 4 shows three waveforms (one from each class) that have been correctly classified. The softmax probabilities, as described in Sect. 2.2, are also shown. In each case the highest probability is predicted for the corresponding class. Figure 5 shows the softmax probabilities, predicted by the model for different lengths of the same waveform. Although the waveform is correctly classified in each case, the predicted probabilities are different and show no dependence on the length of input data.

We investigated the possible factors that might be influencing the model performance. Figure 6a shows the variation in the model performance with respect to the duration of P-wave data used as an input. As we do not tune a random seed during model training (Bengio, 2012; Madhyastha and Jain, 2019), we also looked at the randomness in the performance when the model is trained on the same data five times (Fig. 6b). Thus, we can see that the variation in the results caused by changing the length of data is comparable to the randomness in the results due to random initialisation upon retraining the model on the same data.

The classification results for a model trained on the 3 s
data.

Figure 7 shows the classification statistics for one of the iterations of the model trained on the 3 s data. The events classified as noise tend to be of low magnitude, while the misclassification of low-magnitude events as high-magnitude and vice versa is most pronounced at the decision boundary of 5.0. Another important observation is that the degree of misclassification of low-magnitude events is much higher than the reverse case; approximately 65 % of events with magnitude between 4.5 and 5.0 and 35 % of events with magnitude between 4.0 and 4.5 are classified as high-magnitude, while fewer than 10 % of events with magnitude between 5.0 and 5.5 are classified as low-magnitude; this is intentional as a missed alarm is considered more dangerous than a false alarm in this context (Allen and Melgar, 2019) and is achieved by giving the high-magnitude class more weight during model training.

Classification of events with different

Figure 8 visualises the classification of events across different hypocentral distance (Fig. 8a) and signal-to-noise ratios (Fig. 8b). We observe that there are instances of correct classification across a wide-range of hypocentral distances and signal-to-noise ratios (SNRs), which means that the model is capable of learning the frequency characteristics of waveforms to some extent and does not directly correlate the amplitude or SNR with magnitude. We do observe some clustering of low-magnitude events classified as noise for SNRs below 20 dB. But for the demarcation between low-magnitude and high-magnitude events the misclassification seems to be close to the decision boundary and spread across a wide range of hypocentral distances and signal-to-noise ratios.

Despite maximising the amount of data on either side of the decision
boundary between low and high magnitude, we find some incorrect
classifications, most of which lie within a range of

In this study, we present a deep learning model that classifies seismic waveform into three classes: noise, low-magnitude events and high-magnitude events, with events of local magnitude equal to or above 5.0 categorised as “high-magnitude”. We investigate the effect of using different durations of P-wave data to perform the said task and demonstrate that changing the length of the waveform (1, 3, 10, 20 or 30 s after P arrival) has no significant effect on the model performance. We also find that the model classifies most of the data above a magnitude of 4.5 as high-magnitude, even though the decision boundary is chosen at 5.0, due to the higher class weight assigned to high-magnitude events. We obtain an overall accuracy of up to 93.86 %, and we expect this to be very useful in the fast classification of seismological data.

The seismic waveforms used in our research are a part of the STanford
EArthquake Dataset (STEAD) (Mousavi et al., 2019), and the dataset was downloaded
from

MC, NS, GR and HS contributed to the conception and design of the study. MC did the analysis with the help of WL and JF. MC wrote the first draft of the manuscript. GR and NS wrote sections of the manuscript. All authors contributed to manuscript revision and read and approved the submitted version.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research is supported by the “KINachwuchswissenschaftlerinnen” (grant SAI 01IS20059) by the Bundesministerium für Bildung und Forschung (BMBF). Calculations were performed at the Frankfurt Institute for Advanced Studies' new GPU cluster, funded by the BMBF for the project Seismologie und Artifizielle Intelligenz (SAI).

This research has been supported by the Bundesministerium für Bildung und Forschung (grant no. SAI 01IS20059).

This paper was edited by Irene Bianchi and reviewed by Filippo Gatti and one anonymous referee.