Exploration of the data space via trans-dimensional sampling: the case study of seismic double difference data

. Double differences (DD) seismic data are widely used to deﬁne elasticity distribution in the Earth’s interior, and its variation in time. DD data are often pre-processed from earthquakes recordings through expert-opinion, where couples of earthquakes are selected based on some user-deﬁned criteria, and DD data are computed from the selected couples. We develop a novel methodology for preparing DD seismic data based on a trans-dimensional algorithm, without imposing pre-deﬁned criteria 5 on the selection of couples of events. We apply it to a seismic database recorded on the ﬂank of Katla volcano (Iceland), where elasticity variations in time has been indicated. Our approach quantitatively deﬁnes the presence of changepoints that separate the seismic events in time-windows. Within each time-window, the DD data are consistent with the hypothesis of time-invariant elasticity in the subsurface, and DD data can be safely used in subsequent analysis. Due to the parsimonious behavior of the trans-dimensional algorithm, only changepoints supported by the data are retrieved. Our results indicate that: 10 (a) retrieved changepoints are consistent with ﬁrst-order variations in the data (i.e. most striking changes in the DD data are correctly reproduced in the changepoint distribution in time); (b) changepoint locations in time do correlate neither with changes in seismicity rate, nor with changes in waveforms similarity (measured through the cross-correlation coefﬁcients); and (c) noteworthy, the changepoint distribution in time seems to be insensitive to variations in the seismic network geometry during the experiment. Our results proofs that trans-dimensional algorithms can be positively applied to pre-processing of geophysical Competing interests. The authors declare that they have no conﬂict of interest. Author contributions. N.P.A. planned and conduced the data analysis. G.S. provided the seismic data and the relevant data pre-processing. N.P.A. and G.S. equally contributed to the discussion on the results and the implication for monitoring the subsurface. N.P.A. and G.S. wrote the draft of the manuscript and draw the relevant ﬁgures.


Introduction
Data preparation is a daily routine in the worklife of geoscientists. Before using data to get insights into the Earth system, geoscientists try to deeply understand their datasets, to avoid introducing, e.g. instrumental issues, redundant data, un-wanted 20 structures like data density anomalies, and many others (Yin and Pillet, 2006;Berardino et al., 2002;Lohman and Simons, 2005). All the activities for preliminary data analysis can be considered as exploration of the "data space" (Tarantola, 2005) and are mainly based on expert opinion. Previous experience drives scientists in selecting the most trustable portion of their experiments, cleaning data-sets before using them for getting new knowledge about Earth model parameters. There are two cumulating in geosciences in the last decade, requires more and more data screening and preparation, sometimes involving multidisciplinary expertise. Research activities could greatly benefit from a more automated exploration of the data space able to ease preparatory tasks. Second, expert opinion is a human activity and is mainly based on dual categories, e.g. good/bad data, and can not easily handle a continuous probability distribution over the data (i.e. expert opinion can not easily associate a continuous "confidence" measure to each data-point). 30 In recent years, in the framework of Bayesian inference, exploration of the data space has been introduced in few cases to "explore" unknown features of the data-sets. For example, the so called Hierarchical Bayes approach has been introduced to estimate data uncertainties from the data themselves (Malinverno and Briggs, 2004). More complex Hierarchical Bayes approaches have been developed to measure the data correlation as well (e.g. Bodin et al., 2012a;Galetti et al., 2016) or to evaluate an error model (e.g. Dettmer and Dosso, 2012). The exploration of the data space, in all these studies, implies to 35 consider some additional unknowns (e.g. data uncertainties or error correlation length), so called hyper-parameters or nuisance parameters, and to estimate them directly from the data. A step-forward in exploration of the data space has been represented by Steininger et al. (2013) and Xiang et al. (2018), where the authors used a data space exploration approach to evaluate the performance of two different error models directly from the data. In such studies, the number of hyper-parameters considered is not fixed, but can assume two different values (1 or 2), depending on the error model considered. Another interesting, recent 40 case of exploration of the data space is represented by the work of Tilmann et al. (2020), where the authors used Bayesian inference to separate the data in two sets: "outliers" and "regular". In this case, the data themselves are probabilistically evaluated to understand their contribution to the final solution as "regular" data or "outlier", i.e. the data are classified in two different families, according to their coherence with the hypothesis of being "regular" data or not.
In this study, we push the exploration of data space in a new direction. We develop an algorithm for computing Bayesian in-45 ference specifically for the exploration of the data space. Exploration of the data space is performed through a trans-dimensional algorithm (e.g. Malinverno, 2002;Sambridge et al., 2006) so that the number of hyper-parameters is neither fixed nor limited to 1 or 2. We represent data structure as partitions of the covariance matrix of errors, i.e. changepoints that create sub-matrices of the covariance matrix with homogeneous characteristics, where the number of partitions is not dictated by the user, but it is derived by the data themselves, in a Bayesian sense (i.e. we obtain a posterior probability distribution, PPD, of the number of 50 partitions). In this way, similar to Tilmann et al. (2020), portions of data can be classified and used differently in the subsequent steps of the analysis.
We apply our algorithm to prepare a widely exploited seismic data-set, the seismic double-difference (DD) data-set, that have been used as input in seismic tomography for defining subsurface elasticity (e.g. Zhang and Thurber, 2003) and its variation in time (e.g. so called "time-lapse tomography", Caló et al., 2011;Zhang and Zhang, 2015). DD data need to be re-constructed 55 from specific partitions of the original data (i.e. seismic events). Subjective choices have a great impact on the definition of DD data. In particular, such choices can be used to limit the number of DD data itself and the selection, in turn, could introduce biases in the subsequent definition of the elastic model and its variations in time. We apply our algorithm to statistically define, contained in the DD datum can be used to refine event locations (e.g. small events referred to a master event, Waldhauser and 70 Ellsworth, 2000) or the seismic properties of the rocks in the area where events are clustered (e.g. Zhang and Thurber, 2003).
In recent years, seismic monitoring of subsurface processes has also been realized through seismic tomography (e.g. Chiarabba et al., 2020) and in particular with the analysis of DD data: rock weakening due to mining activities (Qian et al., 2018;Ma et al., 2020;Luxbacher et al., 2008), granite fracturing during geothermal well stimulation (Caló et al., 2011;Caló and Dorbath, 2013) and oil&gas operations (Zhang et al., 2006). For monitoring purpose, an additional assumption is considered during DD 75 data preparation: elastic properties of the media traversed by the seismic waves should not change between the occurrence of the selected pairs of events. This fact implies the computation of the so called time-lapse analysis, where pre-defined timewindows are considered and static images of the subsurface (Caló et al., 2011), or differential elastic models (Qian et al., 2018), are reconstructed for each time-window. In any case, the most relevant issue in time-lapse tomography remains how to define the time-windows, which artificially separate events and prevent their coupling to obtain DD data: how many time-windows 80 are meaningful to construct DD data? And which should be their time-lengths? This issue is critical due to the dependence of the number of DD data from the number of events coupled and, thus, from the number of time-windows, as schematically shown in Figure 1.
The definition of the set of time-windows, on which the sequence of 3D time-lapse tomographies should be computed, is demanded to the expert-opinion. There are three main possibilities in time-lapse tomography: (a) imposing time-windows 85 based on known seismic history (before and after a known, relevant seismic event, Young and Maxwell, 1992;Chiarabba et al., 2020); (b) keeping the same length for all time-windows (e.g. one day, Qian et al., 2018), or (c) trying to have the same amount of data in all the time windows (e.g. Patanè et al., 2006;Kerr, 2011;Zhang and Zhang, 2015). In other cases, the length of the time-windows vary based on research needs (e.g. Caló et al., 2011). A human-defined set of time-windows might mask the real variations of the physical properties, and the time-evolution of the elastic model found can be not associated to the 90 investigated geophysical process. Here, only one changepoint is present, so ND = 19 DD data can be prepared. (b) In case of two changepoints, only ND = 9 DD data can be prepared.
Here, we tackle the issue of defining the number and time-length of the time-windows in DD data preparation though a novel approach. To simplify the experiment, we focus on closely associated events recorded on a volcanic edifice in Iceland. Such cluster of events, which spans no more than 100 meters in diameter, is considered as a punctual source of repeating events, which are recorded from a seismic station 6 km away, for more than two years continuously. In this way, we assume perfectly 95 co-located events and we can focus on time-variations of DD data. More generally, the novel approach can be applied to both temporal and spatial associations (i.e. to define both time-windows and spatial-length for associating events, and composing DD data).

Background on Bayesian inference, Markov chain Monte Carlo sampling and trans-dimensional algorithms
Geophysical inverse problems have been solved for a long time following direct search or linearized inversion schemes, due to 100 the limited number of computations needed to obtain a solution. Such solution have been given in the form of a single "final" model, presented as representative of the Earth's physical properties. Thanks to the computational resources now available, such approaches are outdated for more sophisticated and cpu-time consuming workflows, where multiple models are evaluated and compared, to obtain a wider view of the Earth's physical properties. Algorithms based on Bayesian inference belong to this second category, where the "solution" is no more a single model, but a distribution of probability on the possible value of 105 the investigated parameters, following Bayes theorem (Bayes, 1763): where p(m | d) represent the information obtained on the model parameters m through the data d, so called "posterior probability distribution (PPD)", or simply "posterior". Such information is obtained combining the prior knowledge on the model: p(m), with the likelihood of a the model to the data: p(d | m). The denominator of the right term is called "evidence" and 110 represents the the probability of the data in the model space: The evidence is a high-dimensional integral that normalizes the PPD. It is generally difficult to compute and, thus, methods which do not require its computation (like Markov chain Monte Carlo, McMC, see below) are widely used in Bayesian inference.

115
The likelihood of the data for a given model is necessary to evaluate and compare different set of model parameters. It is generally expressed as: ples the model space according to probability rules, like Gibbs sampler or Metropolis rule (Metropolis et al., 1953;Gelman et al., 1996). Briefly, starting from a given point in the model space, called current model, a new point of the model space, called candidate model is proposed and visited according to some rules based on the PPD. In particular, the Metropolis rule coupled to the approach developed in Mosegaard and Tarantola (1995), which is the workflow adopted in this study, accepts or rejects to move from a current model to a candidate model according to the ratio of their likelihoods, i.e.: This is a simplified version of a more general formulation of the acceptance probability in Metropolis-based McMC (Gallagher et al., 2009). It is worth noticing that our workflow does not directly specify the dimensionality of the model space. In fact, following the recent advancements in the solution of geophysical inverse problems, we do no more consider models with a fixed number of parameters, but we make use of the so called trans-dimensional (trans-D) algorithms and propose candidate models with a different number of dimensions with respect to the current models. This approach is called trans-dimensional sampling and it has been widely used for the solution of geophysical inverse problems in the last decades. (Malinverno, 2002;Sambridge et al., 2006;Bodin et al., 2012a;Dettmer et al., 2014;Mandolesi et al., 2018;Poggiali et al., 2019). Trans-D algorithms have been proven to be intrinsically "parsimonious" (Malinverno, 2002) and, thus, they preferably sample simpler models with respect to complex ones. This is one of the most important characteristics of trans-D algorithms, enabling a fully 140 data-driven resolution for the model parameters.

Data
We use data from a cluster of repeating earthquakes located on the southern flank of Katla volcano (Iceland; Figure 2a). This seismic activity originated in July 2011 following an unrest episode of the volcano (Sgattoni et al., 2017) and continued for several years with remarkably similar waveform features over time. The cluster is located at very shallow depth (< 1 km) 145 and consists of small magnitude events (∼ −0.5 -1.2 ML), characterized by emergent P wave and unclear S wave, a narrowbanded frequency content around 3 Hz at most stations and correlation coefficients well above 0.9 at the nearest stations during the entire sequence. The temporal behavior is also peculiar, with a regular event rate of about 6 events per day during warm seasons gradually decreasing to one event every 1-2 days during cold seasons (Sgattoni et al., 2016b). Sgattoni et al. (2016a) obtained relative locations of 1141 events recorded between July 2011 and July 2013 by designing a method optimized for very 150 small clusters that includes the effects of 3D heterogeneities and tracks uncertainties throughout the calculation. The number of relocated events depends on a selection of the best events among a total of > 1800, based on thresholds on correlation coefficient and amount of detected P and S phases. The resulting size of the cluster is on the order of 25×50×100 m 3 (easting, northing, depth), with estimated uncertainties on the order of few tens of meters ( Figure 2b). Changes in the station network configuration around the cluster occurred due to technical problems, with the greater loss of data in the second part of the 155 sequence, from January 2012. This coincides with a clear increase in relative location uncertainties, which correlates also with a decrease in correlation coefficients, mainly for S phases. Other temporal changes in waveform correlation were identified by Sgattoni et al. (2016a) in August 2012 and January 2013. In this study we focus on P-wave data recorded at station ALF (part of the Icelandic Meteorological Office seismic network), which is located about 6 km away from the cluster ( Figure 2a) and is the only nearby station that has been continuously operating during the entire time. The similarity of the waveforms recorded at 160 ALF is remarkable, with correlation coefficients of the biggest events above 0.99 throughout the entire period of study ( Figure   2c). To compute the DD dataset, we use the origin times (OT i ) of N e = 1119 relocated events from Sgattoni et al. (2016a). We remark that the increased location uncertainties due to the network geometry change in January 2012 may affect the quality of the locations the events and, consequently, the determination of their OT, which is relevant for computing uncertainties in the DD data (see Section 2.1).

Data uncertainties from full-waveform investigation
To apply our novel Bayesian approach, we need to estimate a covariance matrix of the errors in the DD data. Having an origin time for each event (given by the location obtained in Sgattoni et al. (2016a) using the full seismic network), we derive the DD data and their uncertainties directly comparing the raw waveforms and finding the absolute delays between each couple of events. From the absolute delays of the P-arrivals, the subtraction of the time-differences in the OTs of two events gives the DD 170 datum for such couple. We estimate the absolute time-delay between two events following the Bayesian approach described in Piana Agostinetti and Martini (2019). Briefly, we collected a 20-s record of each event, centred on the approximate P-wave arrival time. We compute a stacked version of the events, so called "wavelet" (Figure 3a). From the wavelet, we compute compute the residuals for each event (Figure 3b). Event residuals define a standard deviation function σ(t) for the 20s record.
Event residuals are also auto-correlated to obtain an averaged auto-correlation function r(t). The standard deviation and the 175 auto-correlation function are used to define a covariance matrix C e,w (the same for all the waveforms, Piana Agostinetti and Malinverno, 2018) using the equation: where C e,w is the covariance matrix of the waveform errors; S is a diagonal matrix containing the standard deviation σ(t) computed from the residuals; and R is a symmetric Toeplitz matrix whose rows and columns contain the auto-correlation 180 function r(t) with t = 0 on the diagonal. In this way, we reconstruct a full covariance matrix, which can be used to obtain realistic error estimates for our DD data. Noteworthy, the use of a diagonal covariance matrix instead of a full covariance matrix would risk to underestimate the errors, biasing the subsequent analysis for defining the DD time-windows. Having the error model for the waveforms, for each couple of waveforms, we perform a Markov chain Monte Carlo sampling (Mosegaard and Tarantola, 1995) to reconstruct the PPD of the time-shift between the two waveforms. Following Sgattoni et al. (2016a), 185 we use a 1s-long time-window to compute the likelihood of the waveforms, centred on the approximate P-wave arrival time.
Starting with N e = 1119 events, we obtain N e × (N e − 1)/2 DD data. The total number of DD data is N D = 625521. DD data value d ij and uncertainties σ ij , associated to events i and j, are reported in Thus, a change in the seismic network configuration could influence the quality of seismic network detections and locations, which in turn could introduce a bias in our data, as a shift in the DD value or an increase in DD error. Given the independent processes used to define each single DD datum, as a first approximation we consider our final covariance matrix of the errors in the DD data C * e as a 625521 × 625521 diagonal matrix, with the square of the uncertainties presented in Figure 4b along the 195 diagonal, omitting the correlation between errors given by, e.g., biases in OT determination.

An algorithm for exploration of double-difference data space
What happens to the DD data-set if we create a hard-partition in time, i.e. if we artificially separate some events from the others? As clearly illustrated in Figure 1, the number of data N D in the DD data-set varies, decreasing for increasing number of hard-partitions. From a Bayesian point of view, this is not admissible, because Equations 3 and 5 need to consider the same 200 number of data points in two models to allow their comparison (see also Tilmann et al., 2020).
Our novel approach to solve this issue relies on the introduction of a family of "hyper-parameters", which represent the partitions of the events, and such hyper-parameters are used for scaling the different entries in the data covariance matrix C * e . In our approach, the number of "hyper-parameters" in the family is not fixed, but it is directly derived from the data themselves. Following a Bayesian inference approach, we reconstruct the statistical distribution of the hyper-parameters (i.e. or simplified model parameterization. Hyper-parameters have been used to estimate error models (Dettmer and Dosso, 2012;Galetti et al., 2016) or to discriminate between two different families of error models (Steininger et al., 2013;Xiang et al., 2018). In this last case, the number of hyper-parameters belonging to a model vector is not constant, but can be one or two, depending on the family. More recently, a nuisance parameter has been introduced to evaluate the probability for each datum to belong to the "regular data" or to the "outliers" (Tilmann et al., 2020).
For the DD case, we introduce a family of hyper-parameters to estimate which portions of the DD data violate our initial assumptions. In fact, in our assumptions, double difference data are computed from couple of seismic events occurred in the same rock volume, recorded at the same seismic station. For perfectly co-located events, and in absence of any change in the rock seismic velocity field between the first and the second event, double difference measurements should have mean 0 and should be distributed following the Gaussian error model defined in Section 2.1, represented by the covariance matrix C * e . In 220 this case, the value of the fitφ, expressed as: with d the DD data vector, should be close to N D .
When the valueφ significantly deviates from N D , a modified covariance matrix C e (m) should be considered, where the portion of the data un-consistent with the hypotheses are considered differently from the portions of DD data which do not 225 violate the hypotheses. The new modified covariance matrix C e (m) is obtained as where the matrix W (m) is a diagonal matrix which contains a weight for each DD datum, based on the hyper-parameters.

Noteworthy, if we use Equation 8 in Equation 3
, we see that in our case the dependence of the Likelihood function on the model does not reside anymore in the residuals, as generally done in geophysical inverse problems, but only in the Covariance matrix.

Model parameterization
In our algorithm, a model is described by a set of k changepoints that define the partitions of C * e , and their associated quantities, that is: m = (k, T k , π k ). The k-vector T k represents the time-occurrence of the k changepoints, while the k-vector π k contains 240 the weights associated to each changepoint. We assume that a DD datum d ij , associated to event i and j, retains its original variance σ 2 ij if no changepoint occurs between OT i and OT j . Otherwise, its importance is modified with weight W ij (m): where w ij is computed as: recalling that W is a N D × N D diagonal matrix and W ij represents the element along the diagonal associated to DD datum d ij . Following our approach for defining the W ij (m), specifically the summatory of the weights associated to the relevant changepoints, we assume that a couple of "distant" events in time has more probability of being "separated" by one or more changepoints. This assumption reflects the standard process of DD data, where distant (in space and/or time) events are almost never coupled in DD data-sets. However, if no changepoints are present between distant events, our trans-dimensional approach  (3), we follow the approach described in Mosegaard and Tarantola (1995) and we propose a completely new changepoint with T k+1 and π k+1 randomly sampled from their prior distributions. For move (4), we simply randomly select one changepoint and remove it from the model.

Prior information
Uniform prior probability distributions are selected for our inverse problem. Here, the number of changepoints is comprised 275 between 0 and 100. Changepoints can be distributed everywhere in time between 2011.5 and 2013.7. To make the algorithm more efficient, we set a minimum distance between two changepoints as large as 0.5 day (Malinverno, 2002). Changepoint weights π k follow an uniform prior probability distribution between 0.0 and 1.0.
4 Finding data-driven time-variations of rock elasticity during Katla's seismic swarm We apply our novel methodology for the definition of the hard-partitions in DD data to the data-set recorded on Katla volcano 280 in Iceland, during a two year monitoring experiment. Based on the observations of the limited dimension of the cluster with respect to the events-station distance (100 m versus 6.0 km) and the overall high similarity of the waveforms (correlation coefficient always larger than 0.9), our algorithm is able to map out which portions of the data violate our underlying hypotheses: co-located events and constant elasticity field in time. While separating those two effects with a single station would be challenging, here we want to illustrate in detail how the time-occurrence of the changepoints is defined and compare them to other 285 potential approaches for the definition of hard-boundaries in DD data, namely, variations in seismicity rate and waveforms cross-correlation.
As shown in Figure 1, defining hard-partitions for DD data using expert-opinion is a dangerous task, due to the limitation in the number of data available for subsequent uses. For example, seismologists could be tempted to test if using less data could give better images in a subsequent seismic tomography, based on pre-defined ideas on the subsurfaces structures. In fact, some 290 changes in DD data are obviously present in the observations (Figure 4, between close to event 550 and 1050 for example), but others are more subtle to be defined.
We compute the data-space exploration running 100 independent McMC samplings, where each chain sampled 2 Millions of changepoint models. We discarded the first million of models and collected one model every 1000 in the subsequent Million models. Our final pool of models used for reconstructing the PPD is composed by 100 000 models. The full cpu-time for with one of the most striking change in the DD data as shown in Figure 6a, confirming the goodness of the approach. The distribution of the weights clearly defines the partition of the C e (m), where initial data (i.e. DD data related to event occurring at the initial stages of the swarm, in 2011) slowly release their "connection" to later events and, thus, indicate that they should not be included in subsequent analysis. From the histogram of the number of changepoints in each sampled model, we can 310 see how the trans-D algorithm works: no less than 10 and no more than 20 changepoints are generally considered, even if we allow such number to increase to one hundred. Combining this information with the distribution of chagepoints in time given in Figure 6e, we define eleven relevant changepoints (red arrows). We acknowledge that such number could be, again, a subjective choice, however, looking to Figure 6d, we see that changepoints can be "ranked" in some sense given their mean PPD weights.
For example, changepoints 2, 5, and 9 have clearly associated lower weights with respect to the others and, thus, should be 315 considered as less relevant. Our methodology does not solve all issues connected to the preparation of DD data, but, at least, it can be used to quantify the occurrence of changepoints and their importance, and such quantification can be exploited in a later stage depending on the subsequent analysis planned.
We compare the time-occurrence of the resulting 11 changepoints with the cross-correlation coefficients between each event in the seismic swarm and the largest one (see Sgattoni et al., 2016a, for details). Both P-wave and S-wave cross-correlation 320 coefficients display some degree of variability and defined patterns in the time-window used in this study (Figure 7), even if we should consider that the smaller values are always larger than 0.9. We observe that there is no clear correlation between changepoint position in time and cross-correlation values. In mid 2011, around evetn 300, and early 2012, around evetn 700, we have two changepoints where cross-correlation seems stable for both P-and S-waves. At the beginning of 2012, when  . Correlation coefficients for P-waves (a) and S-waves (b) for each event with respect to the largest event (see Sgattoni et al., 2016a, for details). In each plot, the most probable time-occurrences for the 11 relevant changepoints are also reported (yellow arrows).
the seismic network has been redefined, the cross-correlation for S-waves changes dramatically, while the change in cross-325 correlation for P-waves is less evident, and no changepoint is found at all. Our results seem to indicate that variations in cross-correlation coefficients (for example, computed for repeating earthquakes) could indicate a un-realistic variations in elasticity and could be a problematic choice for a monitoring system of the subsurface.
We also compare the position of the retrieved changepoints with the seismicity rate, another parameter usually associated to time-variations of the subsurface properties (e.g. Dou et al., 2018). In Figure 8, we report the seismicity rate every before. If our changepoints indicate variations of subsurface elasticity, the time-history of seismicity rate should be carefully evaluated before using it for tracking elasticity changes in time.

Discussion
The DD data recorded on Katla volcano and the results presented here clearly indicate that time-variations in elastic properties should have occurred between 2011 and 2014 on the south flank of the volcanic edifice. Thus, data-driven time-windows can 340 be found using our approach to define where to apply standard DD analysis for retrieving elasticity variations, with no need of preparing the DD data following subjective choices on the coupled events. Being the algorithm naturally parsimonious, there is almost no possibility of having "no DD data" (i.e. one partition per event). Data-partitions are always in a limited number, even if, strictly speaking, their number should be given by the user because, as final output, we have the full PPD and not just one set of partitions. Defining the exact number of partitions to use in subsequent analysis depends on the analysis itself.

345
Our approach quantifies the presence and the relevance of the changepoints. Using such information could be straightforward in some cases (e.g. if we look for one most probable changepoint only) or more complex (e.g. if we also wish to appreciate correlation between changepoints, which can be measured using the PPD). It is worth noticing that, in simple cases, our algorithm generally performs as expert-opinion (e.g. in the case of the search for one most probable changepoint), and this confirms the overall performance of our methodology. In more complex cases, the weights associated to the changepoints 350 should be used to classify the changepoints themselves, and this allows selecting the most relevant changepoints using a quantified information.
The network of seismic stations deployed around Katla volcano changed in time. This fact has been previously indicated as a potential "bias" in the analysis of the seismic data themselves, as the location uncertainties increased after major network operations (January 2012). Our results point out that the changepoints found do not correlate with such change in the seismic 355 network. Being statistical analysis, our methodology seems to be insensitive to changes in the acquisition system. Alternatively, the changes in location uncertainties could be not large enough to affect our procedure. In both cases, our approach demon-strates to be well-suited for handling long-lived databases, where changes in the spatial distribution of observational points is likely to occur from time to time.
Finally, we investigate how our changepoints relate to more commonly-used indicators of sub-surface variations of elasticity, 360 like time-series of cross-correlation coefficients and seismicity rate. In both cases, we found poor correlation between our results and the time-series of the two quantities. While this observation is not totally unexpected since the two time-series are based on different seismological observables, it suggests that care should taken when investigating time-variations of elasticity retrieved from methodologies based on cross-correlation, and to re-asses approaches based on variations of the seismicity rate as a proxy of "rock instabilities" (Dou et al., 2018).

Conclusions
We developed an algorithm for defining data-driven partitions in a seismic database, for a more objective definition of doubledifference data. The algorithm is based on the trans-dimensional sampling of data-structures, here represented as partitions of the covariance matrix. The algorithm has been tested in the case a seismic database acquired in a volcanic setting, where subsurface variations of rock elasticity have probably occurred over a period of two years. Our results indicate that: