the Creative Commons Attribution 4.0 License.

the Creative Commons Attribution 4.0 License.

# On a new robust workflow for the statistical and spatial analysis of fracture data collected with scanlines (or the importance of stationarity)

### Andrea Bistacchi

### Silvia Mittempergher

### Mattia Martinelli

### Fabrizio Storti

We present an innovative workflow for the statistical analysis of fracture data collected along scanlines, composed of two major stages, each one with alternative options. A prerequisite in our analysis is the assessment of stationarity of the dataset, which is motivated by statistical and geological considerations. Calculating statistics on non-stationary data can be statistically meaningless, and moreover the normalization and/or sub-setting approach that we discuss here can greatly improve our understanding of geological deformation processes. Our methodology is based on performing non-parametric statistical tests, which allow detecting important features of the spatial distribution of fractures, and on the analysis of the cumulative spacing function (CSF) and cumulative spacing derivative (CSD), which allows defining the boundaries of stationary domains in an objective way. Once stationarity has been analysed, other statistical methods already known in the literature can be applied. Here we discuss in detail methods aimed at understanding the degree of saturation of fracture systems based on the type of spacing distribution, and we evidence their limits in cases in which they are not supported by a proper spatial statistical analysis.

The analysis of fracture systems is a traditional topic in structural geology and rock mechanics (e.g. Pollard and Aydin, 1988; Twiss and Moores, 2006), and it has been recognized for a long time that the quantitative characterization of fracture networks is necessary in both fundamental studies aimed at understanding brittle deformation processes in different geological environments (e.g. Jaeger et al., 2007; Scholz, 2019; Schultz, 2019) and in applications such as engineering rock mechanics (e.g. Hoek, 1980) and the characterization and modelling of subsurface fluid flow (e.g. Gleeson and Ingebritsen, 2012).

For “fractures” in this contribution we use the broad definition given e.g. by Twiss and Moores (2006), Davis et al. (2012), and Schultz (2019), who include in the definition of fractures or “cracks” all kinds of brittle discontinuities, such as joints, veins, shear fractures, (micro-faults), and in some cases even stylolites and other “anti-cracks”. Following the same authors, a fracture set is defined as a cogenetic set of fractures showing the same kinematics and orientation (with some variability), while a fracture system or network includes all fracture sets that are present in a volume of rocks.

For years, the common way of collecting quantitative fracture data has been based on scanlines: drawing a line, or laying a tape measure, on an outcrop and collecting all intersections between this line and fractures (e.g. Terzaghi, 1965; Hobbs, 1967; Priest and Hudson, 1981). Equivalent 1D datasets are also obtained from boreholes from both direct observation of cores and from geophysical imaging methods (e.g. Prioul and Jocker, 2009). The main advantage of scanline surveys is that they can be easily carried out in relatively short times and in a multitude of outcrops; hence, this technique has also become a standard in engineering applications (ISRM, 1978).

The minimum goal of scanline surveys is the measurement of fracture spacing – the distance between two neighbouring fractures – and the analysis of derived parameters such as fracture density (number of fractures per unit length) or intercept (inverse of density; e.g. Priest and Hudson, 1981).

Other fundamental information that can be obtained from scanlines is the 1D spatial distribution (e.g. Swan and Sandilands, 1995; Laubach et al., 2018), i.e. whether fractures are randomly distributed in space or they show some form of spatial organization, defined as any form of departure from complete randomness (Fig. 1a). We can consider two main and opposite types of organization: (i) regular, uniform, periodic arrangements (Fig. 1b) and (ii) arrangements with some form of clustering (Fig. 1c) or repeating pattern (Fig. 1d) (e.g. Swan and Sandilands, 1995; Bonnet et al., 2001; Marrett et al., 2018).

The spatial organization of fractures can be interpreted in the function of the processes responsible for their formation and evolution. Fractures nucleate at randomly distributed weakness points in the host rock (e.g. Schultz, 2019) and then propagate, increasing their length, initially without interacting with each other (e.g. Spyropoulos et al., 2002). When the number of fractures and their dimensions increase (e.g. fracture density and intensity increase), fractures start interacting, with two possibly opposite effects (e.g. Scholz, 2019). If the material is effectively weakened by fractures (e.g. at fracture tips where stress concentrations occur or due to diffuse damage in the material), fractures tend to propagate towards each other or to nucleate close to each other in an “attractive” process (e.g. Crider and Pollard, 1998). When instead stress shadows develop, fracture nucleation or propagation is inhibited close to other fractures in a “repulsive” process (e.g. Spyropoulos et al., 1999). Finally, Hooker and Kat (2015) show how vein cementation partly restores rock continuity, hence reducing repulsive effects related to stress shadows.

Mechanical and statistical simulations have shown that repulsive processes result in regular, uniform, and/or periodic spatial distributions for which the mean spacing might reflect the width of stress shadows (Rives et al., 1992; Bai and Pollard, 2000; Tan et al., 2014). On the other hand, attractive processes result in clustering of fractures or in the development or repeating patterns, which sometimes, but not always, show fractal distributions (e.g. Gillespie et al., 1993; Turcotte, 1997; Marrett et al., 2018). As also shown by Olson (2004), subcritical crack growth can result in clustered fracture distributions, which can be described as fracture corridors (sensu Sanderson and Peacock, 2019).

A different type of spatial organization, depending on external tectonic and/or lithological controls, is represented by large-scale trends in fracture density and spacing (Fig. 1e) that can develop e.g. in fault damage zones, with fractures that are more densely spaced close to the fault core and more dispersed as the distance increases (e.g. Vermilye and Scholz, 1998; Faulkner et al., 2008; Bistacchi et al., 2010; Choi et al., 2016) or when fracturing is controlled by folding (Tavani et al., 2006; Cosgrove, 2015; Tavani et al., 2015).

Following the fundamental paper by Hobbs (1967), many authors investigated the relationships between the spatial distribution of fractures and the statistical distribution of their spacing (e.g. Rives et al., 1992; Gross, 1993; Tan et al., 2014). These relationships are based on sound statistical grounds. For instance, if we consider a Poisson distribution in 1D (a perfectly random distribution of events along a line), the spacing between neighbouring events will be characterized by a negative exponential distribution, with a numerically equal mean and standard deviation (Swan and Sandilands, 1995). If we consider a regular spatial distribution, the mean spacing will be sharply defined, the spacing of the standard deviation will be much smaller, and the statistical distribution of spacing will be close to a symmetrical normal distribution (Rives et al., 1992). Intermediate situations will show spacing distributions with intermediate skewness, such as gamma or log-normal distributions (Rives et al., 1992).

These observations lead many authors using the distribution of spacing to
characterize the spatial distribution as if the above-mentioned
relationships can be taken as *biunivocal* relationships. For instance,
Gillespie et al. (1999) proposed a method to characterize spatial distribution based on the
coefficient of variation Cv = SD ∕ mean that is supposed to be
Cv > 1 for clustered systems, Cv = 1 for perfectly random systems (since the
SD equals the mean in a negative exponential distribution), and Cv < 1 for systems with a regular or uniform spatial distribution.

However, Wang et al. (2019), for instance, warned against using this methodology that is not able to “capture
mixtures of highly clustered and more regularly spaced patterns”. We will
demonstrate in the following that the problem is more general and lies in
the fact that the relationship between spatial distribution and the
distribution of spacing is *not always* biunivocal. A more robust approach to detect spatial
organization was recently undertaken by
Marrett et al. (2018), who proposed the
normalized correlation count (NCC) as a useful tool to detect spatial
organization that departs from complete randomness (NCC = 1) as clustering
(NCC > 1) or regular spacing seen as anti-clustering (NCC < 1). One major
advantage of this approach is that it is possible to detect how spatial
organization changes as a function of wavelength, for instance in cases
in which fractures are distributed in clusters and clusters have a regular
spacing, but the distribution of fractures within a single cluster is
fractal (as in e.g. Wang et al.,
2019).

A simple yet promising approach for studying spatial organization is based on cumulative distributions. Gillespie et al. (1999) plotted the cumulative vein thickness vs. vein position along a scanline to characterize spatial distribution. Choi et al. (2016) plotted the cumulative number of fractures along scanlines and used marked changes in the slope of this curve to reveal fault zone architecture features, such as the boundary of the damage zone. However, neither developed quantitative statistical criteria in their analysis.

In this contribution, we will first focus on a rigorous framework to measure spatial and spacing distributions in scanlines, on the relationships between these distributions, and particularly on the debated possibility to infer spatial distribution from spacing distribution (is it a biunivocal relationship?). An important point, not addressed by previous authors, will be dedicated to defining the stationarity of a statistical sample within a given spatial sampling domain – a prerequisite to detect any meaningful statistics of the sample itself and to extrapolate it to the underlying population. We will then propose non-parametric and parametric statistical tests that can be used to characterize the distributions, and we will propose a rigorous workflow that can be applied to the practical analysis of scanline data.

## 2.1 Fracture position and spacing in the scanline reference frame

To completely characterize the spatial distribution of fractures and their
spacing, we need to define two stochastic variables: position and spacing.
Having collected *N* fracture intersections along a scanline of length *L*,
chosen perpendicular to the fracture set's mean plane
(Priest and Hudson, 1981), we will call position
${X}_{i}^{\text{F}}$ the distance between the origin of the scanline reference frame
and the *i*th fracture intersection (Fig. 2).
Spacing is defined as the distance between one fracture at position
${X}_{i}^{\text{F}}$ and the next fracture at position ${X}_{i+\mathrm{1}}^{\text{F}}$, so it is a
property of the position of both fractures. Under a different point of view,
spacing can also be seen as the length of the block of intact rock between
two fractures. For this reason, we calculate the *i*th spacing *S*_{i} at
the *i*th “block” position ${X}_{i}^{\text{B}}$ as follows
(Fig. 2).

We will see in Sect. 2.4 that associating each spacing measurement with its position is fundamental to study the spatial distribution of fractures in terms of the spatial distribution of their spacing.

If local outcrop or borehole conditions do not allow collecting the scanline perpendicular to the mean plane of the fracture set, we will apply the Terzaghi (1965) correction to obtain the true position from the apparent one:

where *α* is the angle of deviation between the actual scanline and
the optimal one (perpendicular to the fracture set mean plane). Spacings will automatically be
correct if calculated from true positions. Fracture positions
can be schematically represented as “barcode plots”
(Fig. 1), which are useful for a visual inspection of the
spatial distribution of fractures.

## 2.2 Stationarity as a prerequisite for statistical analysis

In statistics, a stationary process is a stochastic process whose characteristic probability distributions do not change when the domain of the analysis (e.g. in space or time) is varied (e.g. shifted, shrunk, enlarged; e.g. Wasserman, 2004). Stationarity is a prerequisite in many kinds of analyses performed on time series, and it is also a fundamental concept for spatially distributed variables. In geostatistical studies on spatially distributed variables, a variable is said to be stationary if there is no significant drift or trend within a specified spatial domain (distance, area, or volume), with drift or trend defined as the component of a regionalized variable resulting from large-scale processes that can be defined with deterministic analytical functions of the spatial variables (Swan and Sandilands, 1995; Borradaile, 2003). An example of a trend in fracture studies is the exponential decay in fracture density that can be found in some fault damage zones (e.g. Caine and Forster, 1999; Mitchell and Faulkner, 2009). Within this framework, residuals represent the completely random component of the regionalized variable that, if a trend is present, can be obtained by normalizing the regional variable with the trend (Swan and Sandilands, 1995).

Even if this analysis is not routinely performed, understanding if our dataset is affected by a trend (Fig. 1e) is fundamental in every kind of statistical analysis on spatial variables since, if a trend is present, all statistics will be affected by the choice of the sampling domain. If, for instance, we calculate the sample mean of fracture spacing in a fault damage zone showing an exponential trend, the mean will change depending on the position and length of the scanline. Fracture spacing is not stationary in this case and its mean is meaningless. The same happens at a smaller scale (e.g. a small segment of the scanline) if fractures are clustered (Fig. 1c) or arranged in a pattern (Fig. 1d).

According to the general definition given above, a stationary process is a process whose statistics do not change when the sample is changed, moved, shifted, or resized in space (or time; Wasserman, 2004). With reference to Fig. 1, this restrictive condition is met only in cases when fractures are randomly distributed (Fig. 1a) or regularly spaced (Fig. 1b).

## 2.3 Non-parametric correlation

Non-parametric statistical methods discussed here are those in which (Davis, 2002) (i) data are represented with ordinal or rank-order scales instead of interval or ratio scales and (ii) data are not required to fit parametric statistical distributions (e.g. the normal or exponential distributions). Since they are independent from any assumption or statistical model, they are quite useful, particularly in the first phases of analysis (Swan and Sandilands, 1995).

The non-parametric Spearman's rank correlation coefficient measures the
correlation between two rank-order variables. These are variables obtained
by sorting numerical data (interval- or ratio-scale variables) from the
smaller to the larger value and then replacing each value with an integer
representing its position in the sequence. If we take a dataset with *N*
data pairs *x*_{i} and *y*_{i}, and the rank-order variables are *R*(*x*_{i})
and *R*(*y*_{i}), the Spearman's correlation coefficient *R*_{S} is given by

For instance, in the case of a perfect correlation between *x*_{i} and *y*_{i}
in terms of rank order, we will have, for every *i*th data pair, *R*(*y*_{i})=*R*(*y*_{i}) and hence *R*_{S}=1.

We can use the Spearman's rank correlation coefficient to test the null
hypothesis of no correlation vs. the alternative hypothesis of correlation,
considering critical values of *R*_{S} or the associated *p* values
(Wasserman, 2004), so, for instance, we will reject the null
hypothesis at 5 % significance if *p* value < 0.05.

The advantage of the Spearman's rank correlation coefficient with respect to its parametric counterpart – the Pearson's correlation coefficient – is graphically explained in Fig. 3, where we see that the non-parametric correlation is more robust in the case of outliers and is also able to detect non-linear correlation in addition to standard linear correlation (Swan and Sandilands, 1995).

## 2.4 Cumulative spacing spatial distribution: CSF and CSD

We have found that, to study the spatial distribution of spacing, it is very useful to plot the cumulative function of spacing normalized by scanline length and its first derivative (Fig. 1). This approach can be seen as a development of those proposed by Gillespie et al. (1999), Choi et al. (2016), and Sanderson and Peacock (2019), with the important difference that we have developed a new quantitative and objective method to detect segments of a scanline that show a stationary behaviour.

The cumulative spacing function (CSF) corresponds to a plot of the relative proportion of scanline length associated with each block limited by a pair of fractures. The reference CSF, associated with a perfectly uniform spatial distribution of fractures, corresponds to a constant-slope line with slope = 1 ∕ scanline length (Fig. 1b). Fracture clusters or other scanline segments with higher-than-average fracture density appear as segments with a slope higher than the reference CSF, and the opposite applies to segments with lower-than-average fracture density (Fig. 1c–e). The CSF for a random distribution shows just some noise symmetrically distributed on both sides of the reference CSF (Fig. 1a).

The first derivative of the CSF (i.e. the CSD, cumulative spacing derivative) follows the same behaviour. The reference CSD for a perfectly uniform distribution plots as a horizontal line with height = 1 ∕ scanline length (Fig. 1b). Clusters or high-fracture-density scanline segments plot above the reference CSD and lower-fracture-density segments plot below the reference CSD (Fig. 1c–e); hence, they can easily be recognized and classified with an objective criterion.

By comparing the CSF and CSD, it is possible to define stationary scanline
segments that show a constant slope and hence a constant fracture density.
If more than one stationary segment occurs in a scanline, the boundary
between different segments is found by comparing the data CSD with the
reference CSD. The only difficulty with this approach is that the CSD of
natural datasets can be very noisy, with many spikes (see the random
distribution in Fig. 1a). This problem is solved
by smoothing the derivative with an *n*th-order moving median filter (where
*n* is the kernel size). To avoid biases due to subjective choices on the
smoothing kernel size, we propose retaining subset boundaries that are
consistently defined when changing the smoothing kernel size over a wide
range.

Our workflow for the analysis of fracture data collected on scanlines can be subdivided into two major stages. First, we use non-parametric statistics to check whether the dataset is stationary, and if it is not stationary, we create subsets and/or we rescale or normalize the data in order to obtain stationary sets or subsets. Then, we analyse the datasets and/or subsets with parametric statistics, following well-established methods to characterize the type of spatial distribution and its parameters.

The MATLAB^{®} App DomStudioFracStat1D
(Bistacchi, 2020), with a user-friendly graphical user
interface that we developed to perform this analysis, is available for
download at https://github.com/bistek/DomStudioFracStat1D (last access: 19 October 2020).

## 3.1 Stationarity: non-parametric tests, rescaling, and sub-setting

We apply the Spearman's rank correlation coefficient to test the correlation between two pairs of rank variables; the first is designed to detect large-scale trends and the other one better suited to detect distributions with clusters or patterns. Our usage was inspired by Swan and Sandilands (1995), with some original variations and developments.

To test the data for large-scale trends (Fig. 1e),
we test the correlation between the position of each block bounded by two
fractures ${X}_{i}^{\text{B}}$ and the corresponding spacing *S*_{i}. When both
variables are expressed as rank order, the position rank order is simply
$R\left({X}_{i}^{\text{B}}\right)\equiv i$ (i.e. the block closer
to the origin has rank 1, the second 2, etc.), so the correlation
coefficient can be written as (for *N* blocks)

and we compare ${R}_{\text{S}}^{\text{TREND}}$ with tabulated critical values or use the
associated *p* value to test the null hypothesis of no trend in spacing. We
will obtain ${R}_{\text{S}}^{\text{TREND}}=\mathrm{1}$ in the case that fracture spacing keeps growing
steadily and *R*(*S*_{i})=*i* for every block. On the other
hand, for a completely random distribution we will obtain ${R}_{\text{S}}^{\text{TREND}}\to \mathrm{0}$. Plots of the large-scale position–spacing correlation are shown in
Fig. 1 for all spatial distributions.

Even if large-scale trends are not detected, local small-scale trends could reveal clustering (Fig. 1c) or repeating patterns (Fig. 1d). We therefore compare spacing between pairs of fractures taken in a sequence along the scanline using this formulation for the correlation coefficient:

and we compare ${R}_{\text{S}}^{\text{LOCAL}}$ with tabulated critical values or use the
associated *p* value to test the null hypothesis of no local correlation in
spacing. In this case ${R}_{\text{S}}^{\text{LOCAL}}\to \mathrm{1}$ if $R\left({S}_{i}\right)\approx R\left({S}_{i+\mathrm{1}}\right)$ for a large number of blocks, and
${R}_{\text{S}}^{\text{LOCAL}}\to \mathrm{0}$ otherwise. Plots of the local spacing correlation are
shown in Fig. 1 for all spatial distributions.

The spacing–spacing correlation test used to detect local clustering and/or patterns must be performed after the position–spacing test used to detect large-scale trends because a large-scale trend is seen by the local test as a single large-scale cluster; hence, the second test alone is not discriminant for local vs. large-scale correlation in spacing.

If the null hypotheses of no large-scale trend and no local clustering and pattern are retained, we can move on to the parametric analysis stage (next section). However, if this is not the case, the dataset can be rescaled, normalized, and/or segmented into subsets to obtain stationary sets.

Rescaling and normalization are best suited when we detect a smooth deterministic trend in spacing; i.e. the fracture density varies continuously along the scanline. In this case we must define the deterministic function of the trend and normalize the data with this function; the residuals can be considered a stationary set that can be analysed with the methods presented in the next section. The methods to find the deterministic trend function are various, and we feel that they must also be guided by geological and tectonic observations, so we think that there is no “general method” to complete this task. We will see an example in the first case study (Sect. 4.1), in which we study the variable spacing of joints in a laterally tapering layer of sandstone.

Sub-setting is very useful when we observe a stair-stepping pattern, e.g. where we observe lower-fracture-density domains intercalated by fracture corridors showing markedly higher fracture intensity, or in some fault damage zones (e.g. case study 2; Martinelli et al., 2020). These segments can be recognized and objectively classified by observing the CSF and CSD, particularly by comparing the data curves to the reference ones. In our workflow, we (i) select subset scanline segments (i.e. sub-scanlines) that are generally bounded by points of intersection of the data CSD with the reference CSD, corresponding to changes in the slope of the data CSF. (ii) We then test each subset once again for stationarity, and if the result is positive, (iii) we pass it to the parametric analysis stage (next section).

We also conducted an alternative test for randomness by comparing the spatial
distribution of *N* fractures in terms of position ${X}_{i}^{\text{F}}$ along a
scanline of length *L* with a theoretical Poisson distribution with the density
parameter equal to the fracture density ${\mathit{\mu}}_{\text{D}}=N/L$. The discrete Poisson
distribution expresses the probability Pr of having *n* discrete
events (fractures) distributed at random in sub-segments of length *l* as
(Swan and Sandilands, 1995)

The comparison of empirical and model distributions is performed with a
*χ*^{2} goodness-of-fit (GOF) test used to compare the observed *O*_{j}
and expected *E*_{j} frequencies in segments of the scanline, as in
Fig. 1a–b. To have reliable results, the expected
frequency in each class must be *E*_{j}≥5; hence, segments are
automatically pooled when they show lower values (Swan and
Sandilands, 1995). Unfortunately, this means that this test is reliable only
for very large datasets, and even in this case, in our experience, it shows
a limited sensibility in detecting random vs. other arrangements.

## 3.2 Evaluation of a parametric distribution of spacing

Once we have segments of scanline that show a stationary behaviour, we can use standard parametric methods to define the type of spacing distribution (e.g. normal vs. exponential) and fit its parameters (e.g. population mean and standard deviation) to the sample fracture spacing data. Then, following a rich literature (e.g. Rives et al., 1992; Tan et al., 2014), various parameters that correlate with the evolution of a fracture system with increasing deformation can be discussed. In this contribution, we test the spacing distribution for normal (Gaussian), log-normal, and negative exponential distributions.

The maximum likelihood estimate (MLE) of the parameters of these distributions
can be obtained directly in closed form from sample statistics. For
instance, the mean spacing *μ*_{S} and standard deviation *σ*_{S}
of a normal distribution are directly given by the sample mean and standard
deviation. However, MLE just provides an optimal estimate of the
distribution parameters, under the assumption that we have chosen the right
distribution, but does not allow comparing which distribution (i.e. which
statistical model) better conforms to the data.

The Kolmogorov–Smirnov (K-S) test is a non-parametric test that was
developed to check if two samples come from the same distribution, either
empirical or theoretical, and it can be used as a goodness-of-fit (GOF) test
for the null hypothesis that the empirical distribution function (EDF) is
not different from the model cumulative distribution function (CFD). This
test is used by many authors, but it is biased if the parameters of the
theoretical distribution are not fixed but estimated from the data
themselves (Wasserman, 2004). For this reason we
apply the Lilliefors test instead of K-S when possible (Lilliefors, 1967, 1969), which is not subject to this bias, but
unfortunately the Lilliefors test is not available for log-normal
distributions. In both cases the results can be expressed as *p* values, and
the null hypothesis of no difference with a given distribution is rejected
at 5 % significance if *p* value < 0.05.

In our workflow, if only one type of distribution passes the test, we retain
this as the best-fit one. If more than one distribution is retained and the
*p* values are similar, we generally discuss the possibility that the data
are equally well fitted by both distributions and that they represent some
sort of transitional regime.

## 4.1 Bed-confined joints in turbiditic sandstones

The scanline studied in this case comes from a single turbiditic sandstone
bed in the Langhian to Tortonian Marnoso-Arenacea Formation (Val Santerno,
northern Apennines of Italy; Ogata et al.,
2017). The scanline was collected on a photogrammetric digital outcrop model
(e.g. Bistacchi et
al., 2015) with a 5 mm pixel resolution. The sandstone bed has a variable
thickness decreasing from ca. 32 cm to ca. 16 cm over a distance of ca. 95 m. Fractures are bed-confined tensional joints and the hypothesis that we
are testing is whether the bed thickness controls joint spacing (as
predicted by e.g. Bai and Pollard, 2000). In
Fig. 4a the barcode plot shows the position
${X}_{i}^{\text{F}}$ of every joint (482 joints in total), with the Terzaghi correction
already applied. Here we already see – qualitatively – that fracture
density increases from left to right. The trend test based on the Spearman's
correlation coefficient yields ${R}_{\text{S}}^{\text{TREND}}=-\mathrm{0.335}$ and a very small
*p* value $\approx {\mathrm{10}}^{-\mathrm{14}}$; hence, the null hypothesis of no trend is
strongly rejected at a significance level nearing 100 %. A further
graphical confirmation of this behaviour comes from the CSF
(Fig. 4a) that shows a continuous curvature
indicating that spacing is larger than mean spacing (lower slope) to the
left and smaller to the right (higher slope). The same is confirmed by the
CSD (Fig. 4a), for which we notice that the data CSD
is lower than the reference CSD in the 0–46 m segment, very close to the
reference between 46 and 56 m and then higher than the reference up to the
end.

Based on these results, we cannot calculate any meaningful statistics on this scanline, since, for instance, the average fracture spacing is not stationary. The option to subset the scanline to obtain stationary subsets is not feasible, since the trend appears very continuous in the CSD. If we go back to the geological model to be tested – fracturing is controlled by bed thickness – it is natural to try and normalize the data with bed height.

Bed height ${H}_{i}^{\text{F}}$ measured at every fracture is shown in
Fig. 4b. This measurement is affected by some
noise (orange line), so we take a smoothed version of this variable,
obtained by fitting a polynomial to ${H}_{i}^{\text{F}}$ (purple line), as
representative of the real bed thickness ${H}_{i}^{\text{F-S}}$. The normalized
fracture position is then calculated as ${X}_{i}^{\text{NORM}}={X}_{i}/{H}_{i}^{\text{F-S}}$. In this way the total length of the scanline is altered, but
this is not a real issue since we can always switch back to the original
reference by using the sequence of fractures along the scanline. Different
polynomials of degree between 1st and 10th were tested, and the
2nd degree was chosen as the one that is more successful (i.e. yielding
the smaller residuals) in removing the trend. After normalization, the trend
test based on the Spearman's correlation coefficient yields
${R}_{\text{S}}^{\text{TREND}}=\mathrm{0.011}$ and *p* value = 0.806; hence, the null hypothesis of no
trend for the normalized dataset is retained at a significance level higher
than 80 %. We see the normalized barcode plot, CSF, and CSF in
Fig. 4c.

The second stage of analysis shows that normalized spacing is log-normal
distributed (Fig. 4c), with a *p* value = 0.185
obtained from the K-S test (the Lilliefors test cannot be performed for this
distribution). Exponential and normal distributions are rejected by both the
K-S and Lilliefors tests with a very low *p* value < 10^{−3}
(Fig. 4c). To confirm the rejection of a random
spatial distribution, the *χ*^{2} test performed on the Poisson
distribution resulted in a rejection with *p* value $\approx {\mathrm{10}}^{-\mathrm{6}}$
(Fig. 4c).

From these statistics we conclude that fracture spacing in this turbiditic sandstone layer is controlled by bed thickness and that the joint system is tending towards joint saturation (see discussion in Bai and Pollard, 2000.

## 4.2 Fault-related fracturing in platform carbonates

The scanline discussed in Fig. 5 was collected in the south-eastern part of the island of Pag (Croatia). In this region, Cretaceous carbonate platform successions are folded and imbricated by NW–SE-trending thrusts of the External Dinarides (Korbar, 2009; Mittempergher et al., 2019). The scanline was collected along a mudstone–wackestone bed in a high-angle anticline forelimb between two minor dextral E–W-trending strike-slip faults with horizontal offsets of 10.3 and 0.5 m, respectively. Closely spaced subvertical extensional fractures and veins, striking nearly parallel to the faults, cross-cut folded bedding planes and are therefore associated with the last stages of fold tightening or postdate folding. Extensional fractures, which are generally longer than veins, bear patches of blocky calcite having the same appearance as that cementing the veins. This evidence suggests the hypothesis that fractures and veins were cogenetic and both associated with the activity of strike-slip faults. To test the relations between faults, veins, and fractures, here we considered extensional fractures and veins to be a unique fracture set.

The barcode plot (Fig. 5a) highlights that, as
expected, fracture spacing increases with distance from the fault with the
highest offset, i.e. left to right. The Spearman's correlation coefficient
test for large-scale trends yields ${R}_{\text{S}}^{\text{TREND}}=\mathrm{0.438}$ and a very small
*p* value $\approx {\mathrm{10}}^{-\mathrm{16}}$ (Fig. 5a); thus, the
null hypothesis of no trend is strongly rejected at a significance level
nearing 100 %. The Spearman's correlation coefficient test for local
spacing correlation yields ${R}_{\text{S}}^{\text{LOCAL}}=\mathrm{0.328}$ and a *p* value $\approx {\mathrm{10}}^{-\mathrm{9}}$ (Fig. 5a); hence, the null
hypothesis of no pattern or clustering is also strongly rejected at a significance
level nearing 100 %. The sample CSF and CSD for the whole scanline
(Fig. 5a) depart significantly from the reference
CSF and CSD, with the first part of the function (1.0–5.0 m) having a
slope higher than average, the central part (5.0–12.8 m) having a slope
close to average, and the final part having a slope lower than average
(12.8–19.7 m).

As in the previous example, fracturing is not stationary along this scanline and calculating parameters such as average spacing or fracture density over the entire scanline would have been meaningless. We therefore test the hypothesis that the fault damage zone can be subdivided into internally homogeneous sectors. These should have relatively constant slopes in the CSF, and we selected their boundaries by recognizing plateaus in the CSD, separated by sharp transitions at 5.0 and 12.8 m (Fig. 5a).

When segmented, the three sectors do not show trends or patterns (non-parametric tests for large-scale and local spacing correlation; Fig. 5b) and instead have a CSF and CSD compatible with a random or uniform distribution (Fig. 5b). We conclude that fractures behave as stationary in the three sectors; hence, we can calculate sample statistics such as mean spacing and its inverse – fracture density (Fig. 5d). This suggests that, moving away from the largest fault, fracturing does not decrease continuously but stepwise.

Within each stationary sector, a parametric distribution can be fitted to the data. According to the Lilliefors test, no sector has an empiric EDF compatible with an exponential CDF (with significance nearing 100 %). Sector 1 is consistent with a log-normal distribution, sector 2 is compatible with a normal distribution, and sector 3 is compatible with both log-normal and normal distributions. Log-normal and normal distributions indicate a certain degree of organization and fracture saturation, i.e. an evolution towards regularly spaced fractures and the effect of some repulsive process as discussed in the Introduction. In this case, in which fractures cross-cut several mechanical layers and are genetically related to faults, the factors controlling fracture spatial distribution are likely related to inhomogeneous strain distribution (e.g. Spyropoulos et al., 1999).

If we consider the *χ*^{2} goodness-of-fit (GOF) test, we see that
fractures in each sector show a spatial distribution apparently compatible
with a random Poisson distribution (*p* values = 0.50 ÷ 0.83; Fig. 5c). This should have implied an exponential
spacing distribution that is strongly excluded by the Lilliefors test. We
conclude that the *χ*^{2} GOF test for the Poisson distribution is not
reliable in this case, and we confirm the regular spatial distributions
implied by normal and log-normal spacing distributions.

We have introduced an innovative workflow for the statistical analysis of fracture data collected along scanlines. The workflow is composed of two major stages, each one with alternative options for the analysis of data showing different behaviour and spatial arrangement. Navigating across different options is based on geological hypotheses on the data that can be validated or falsified by quantitative statistical tests; hence, the workflow is both adherent to the geological goals of the analysis and objective under the statistical point of view.

A prerequisite in our analysis is the assessment of the stationarity of the dataset. From the statistical point of view, this is motivated by the fact that calculating statistics on non-stationary samples can be meaningless, particularly if the goal is to estimate parameters of the underlying population. From the geological point of view, the normalization and sub-setting required to obtain stationary segments of scanline improve our understanding of the deformation processes very much.

For instance, in the first case study normalizing the fracture spacing distribution by bed thickness and observing that the residuals show an almost-saturated log-normal distribution yield an extremely strong demonstration of the effect of layering on jointing (sensu Bai and Pollard, 2000), even stronger than in many other published examples for which bed thickness is constant.

In the second case study, sub-setting the scanline to obtain stationary and homogeneous domains improved the understanding of fracturing in a damage zone, quite like in Choi et al. (2016) and in Martinelli et al. (2020). Our methodology based on the analysis of CSF and CSD allows setting the boundaries of stationary domains in an objective way, and repeating the non-parametric tests for large-scale and local spacing correlation on the sub-scanlines allows for confirmation that the subsets are really stationary. If applied routinely, this will help shed new light on the internal structure of fault zones (i.e. fault zone architecture studies) and probably in situations in which there is a lithological control on brittle deformation.

Once the stationarity requirement has been demonstrated for the dataset
or subset, many statistical methods already known in the literature can be
applied. In this contribution we mainly discussed methods aimed at
understanding the degree of saturation of fracturing based on the type of
spacing distribution (i.e. exponential, log-normal, or normal; e.g. Rives et al., 1992; Tan et
al., 2014). This is a simple achievement, but we would like to recall that it
is only possible to use this approach once stationarity has been
established; otherwise, errors cannot be avoided since the relationship
between spatial distribution and spacing distribution is *not* biunivocal. For instance, it
is possible to observe an exponential spacing distribution in the case of
spacing distributions with a trend (Fig. 1e) or
repeating pattern (Fig. 1d); hence, observing an
exponential distribution of spacing is not a valid proof of a random spatial
distribution. This also impedes using simplified approaches, such as
that of the correlation coefficient by
Gillespie et al. (1999), without double-checking the spatial distribution with other
approaches.

On the other hand, advanced analyses such as the normalized correlation count (NCC) by Marrett et al. (2018) can be performed on stationary datasets defined as proposed here and will improve the understanding of fracturing in many interesting case studies. If using NCC, whether the non-parametric test for local spacing correlation is really necessary must be evaluated, but we feel that demonstrating large-scale stationarity is still fundamental, at least to be able to use sample distributions to model the underlying population distributions.

Finally, we have also conducted a goodness-of-fit test aimed at directly
detecting the random Poisson spatial distribution. This seemed a logical
choice since the discrete Poisson distribution is the model of every random
distribution of discrete events in 1D, 2D, and 3D, and Poisson processes are
used to distribute fractures in discrete fracture network (DFN) simulations
(e.g. Dershowitz et al., 2003; Elmo and
Stead, 2010; Bonneau et al., 2016). Unfortunately,
the *χ*^{2} GOF test seems to return a lot of false negatives in the
sense that it fails to reject distributions that are not random but are, for
instance, log-normal (see second case study). For this reason, we prefer to
follow the approach presented above instead of using the *χ*^{2} GOF
test.

The MATLAB^{®} App DomStudioFracStat1D
(Bistacchi, 2020), with a user-friendly graphical user
interface that we developed to perform this analysis, is available for
download at https://github.com/bistek/DomStudioFracStat1D (last access: 19 October 2020).

The methodology presented in this paper was inspired by FS and developed by AB, who also developed the code used for the analysis. FS provided the data for the first case study that was analysed by AB. SM provided the data and performed the analysis for the second case study. MM contributed with fruitful discussions in all stages of the analysis while he was developing a similar analysis in a different context (Martinelli et al., 2020).

The authors declare that they have no conflict of interest.

This article is part of the special issue “Faults, fractures, and fluid flow in the shallow crust”. It is not associated with a conference.

We warmly acknowledge the editor, Roger Soliva, and the reviewers, Rodrigo Correa and an anonymous reviewer, for their very useful suggestions. We also acknowledge Angelo Borsani and Andrea Succo, who, in addition to collaborating on data collection, shared many fruitful and happy days in the field with us!

This paper was edited by Roger Soliva and reviewed by Rodrigo Correa and one anonymous referee.

Bai, T. and Pollard, D. D.: Fracture spacing in layered rocks: A new explanation based on the stress transition, J. Struct. Geol., 22, 43–57, https://doi.org/10.1016/S0191-8141(99)00137-6, 2000.

Bistacchi, A.: DomStudioFracStat1D, available at: https://github.com/bistek/DomStudioFracStat1D, last access: 19 October 2020.

Bistacchi, A., Massironi, M., and Menegon, L.: Three-dimensional characterization of a crustal-scale fault zone: The Pusteria and Sprechenstein fault system (Eastern Alps), J. Struct. Geol., 32, 2022–2041, https://doi.org/10.1016/j.jsg.2010.06.003, 2010.

Bistacchi, A., Balsamo, F., Storti, F., Mozafari, M., Swennen, R., Solum, J., Tueckmantel, C., and Taberner, C.: Photogrammetric digital outcrop reconstruction, visualization with textured surfaces, and three-dimensional structural analysis and modeling: Innovative methodologies applied to fault-related dolomitization (Vajont Limestone, Southern Alps, Italy), Geosphere, 11, 2031–2048, https://doi.org/10.1130/GES01005.1, 2015.

Bonneau, F., Caumon, G., and Renard, P.: Impact of a stochastic sequential initiation of fractures on the spatial correlations and connectivity of discrete fracture networks, J. Geophys. Res.-Sol. Ea., 121, 5641–5658, https://doi.org/10.1002/2015JB012451, 2016.

Bonnet, E., Bour, O., Odling, N. E., Davy, P., Main, I., Cowie, P., and Berkowitz, B.: Scaling of fracture systems in geological media, Rev. Geophys., 39, 347–383, https://doi.org/10.1029/1999RG000074, 2001.

Borradaile, G.: Statistics of Earth Science data: their distribution in space, time, and orientation, Springer, Berlin and Heidelberg, Germany, 2003.

Caine, J. S. and Forster, C. B.: Fault zone architecture and fluid flow: Insights from field data and numerical modeling, in: Geophysical monograph, vol. 113, edited by: Haneberg, W. C., AGU, Washington, D.C., USA, 101–127 1999.

Choi, J.-H., Edwards, P., Ko, K., and Kim, Y.-S.: Definition and classification of fault damage zones: A review and a new methodological approach, Earth-Science Rev., 152, 70–87, https://doi.org/10.1016/j.earscirev.2015.11.006, 2016.

Cosgrove, J. W.: The association of folds and fractures and the link between folding, fracturing and fluid flow during the evolution of a fold-thrust belt: a brief review, Geol. Soc. London, Spec. Publ., 421, 41–68, https://doi.org/10.1144/SP421.11, 2015.

Crider, J. G. and Pollard, D. D.: Fault linkage: Three-dimensional mechanical interaction between echelon normal faults, J. Geophys. Res.-Sol. Ea., 103, 24373–24391, https://doi.org/10.1029/98JB01353, 1998.

Davis, G. H., Reynolds, S. J., and Kluth, C. F.: Structural geology of rocks and regions, 3rd ed., John Wiley and Sons, 2012.

Davis, J. C.: Statistics and Data Analysis, in: Geology, 3rd ed., John Wiley and Sons, 2002.

Dershowitz, W. S., Pointe, P. R., and La Doe, T. W.: Advances in Discrete Fracture Network Modeling Convergence of DFN and Continuum Methods, in 2004 US EPA/NGWA Fractured Rock Conference:State of the Science and Measuring Success in Remediation, Portland, Maine, USA, 882–894, 2004.

Elmo, D. and Stead, D.: An integrated numerical modelling-discrete fracture network approach applied to the characterisation of rock mass strength of naturally fractured pillars, Rock Mech. Rock Eng., 43, 3–19, https://doi.org/10.1007/s00603-009-0027-3, 2010.

Faulkner, D. R., Mitchell, T. M., Rutter, E. H., and Cembrano, J.: On the structure and mechanical properties of large strike-slip faults, Geol. Soc. Spec. Publ., 299, 139–150, https://doi.org/10.1144/SP299.9, 2008.

Gillespie, P. A., Howard, C. B. B., Walsh, J. J. J., and Watterson, J.: Measurement and characterisation of spatial distributions of fractures, Tectonophysics, 226, 113–141, https://doi.org/10.1016/0040-1951(93)90114-Y, 1993.

Gillespie, P. A., Johnston, J. D., Loriga, M. A., McCaffrey, K. J. W., Walsh, J. J., and Watterson, J.: Influence of layering on vein systematics in line samples, Geol. Soc. London, Spec. Publ., 155, 35–56, https://doi.org/10.1144/GSL.SP.1999.155.01.05, 1999.

Gleeson, T. and Ingebritsen, S. E.: Crustal Permeability, in: edited by: Gleeson, T. and Ingebritse, S. E., John Wiley and Sons, Chichester, UK., 2012.

Gross, M. R.: The origin and spacing of cross joints: examples from the Monterey Formation, Santa Barbara Coastline, California, J. Struct. Geol., 15, 737–751, https://doi.org/10.1016/0191-8141(93)90059-J, 1993.

Hobbs, D. W.: The Formation of Tension Joints in Sedimentary Rocks: An Explanation, Geol. Mag., 104, 550, https://doi.org/10.1017/S0016756800050226, 1967.

Hoek, E.: Underground excavations in rock, Institution of mining and metallurgy, London., 1980.

Hooker, J. N. and Katz, R. F.: Vein spacing in extending, layered rock: The effect of synkinematic cementation, Am. J. Sci., 315, 557–588, https://doi.org/10.2475/06.2015.03, 2015.

ISRM: Suggested methods for the quantitative description of discontinuities in rock masses, Int. J. Rock Mech. Min. Sci. Geomech. Abstr., 15, 319–368, https://doi.org/10.1016/0148-9062(78)91472-9, 1978.

Jaeger, J. C., Cook, N. G. W., and Zimmerman, R. W.: Fundamentals of Rock Mechanics, 4th ed., Wiley-Blackwell., 2007.

Korbar, T.: Orogenic evolution of the External Dinarides in the NE Adriatic region: a model constrained by tectonostratigraphy of Upper Cretaceous to Paleogene carbonates, Earth-Science Rev., 96, 296–312, https://doi.org/10.1016/j.earscirev.2009.07.004, 2009.

Laubach, S. E., Lamarche, J., Gauthier, B. D. M., Dunne, W. M., and Sanderson, D. J.: Spatial arrangement of faults and opening-mode fractures, J. Struct. Geol., 108, 2–15, https://doi.org/10.1016/j.jsg.2017.08.008, 2018.

Lilliefors, H. W.: On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown, J. Am. Stat. Assoc., 62, 399, https://doi.org/10.2307/2283970, 1967.

Lilliefors, H. W.: On the Kolmogorov-Smirnov Test for the Exponential Distribution with Mean Unknown, J. Am. Stat. Assoc., 64, 387–389, https://doi.org/10.1080/01621459.1969.10500983, 1969.

Marrett, R. A., Gale, J. F. W., Gómez, L. A., and Laubach, S. E.: Correlation analysis of fracture arrangement in space, J. Struct. Geol., 108, 16–33, https://doi.org/10.1016/j.jsg.2017.06.012, 2018.

Martinelli, M., Bistacchi, A., Mittempergher, S., Bonneau, F., Balsamo, F., Caumon, G., and Meda, M.: Damage zone characterization combining scan-line and scan-area analysis on a km-scale Digital Outcrop Model: The Qala Fault (Gozo), J. Struct. Geol., 140, 104144, https://doi.org/10.1016/j.jsg.2020.104144, 2020.

Mitchell, T. M. and Faulkner, D. R.: The nature and origin of off-fault damage surrounding strike-slip fault zones with a wide range of displacements: A field study from the Atacama fault system, northern Chile, J. Struct. Geol., 31, 802–816, https://doi.org/10.1016/j.jsg.2009.05.002, 2009.

Mittempergher, S., Succo, A., Bistacchi, A., Storti, F., Bruna, P. O., and Meda, M.: Geological and structural map of the southeastern Pag Island, Croatia: field constraints on the Cretaceous – Eocene evolution of the Dinarides foreland, Geol. F. Trips, 11, 1–19, https://doi.org/10.3301/GFT.2019.06, 2019.

Ogata, K., Storti, F., Balsamo, F., Tinterri, R., Bedogni, E., Fetter, M., Gomes, L., and Hatushika, R.: Sedimentary facies control on mechanical and fracture stratigraphy in turbidites, Geol. Soc. Am. Bull., 129, 76–92, https://doi.org/10.1130/B31517.1, 2017.

Olson, J. E.: Predicting fracture swarms – the influence of subcritical crack growth and the crack-tip process zone on joint spacing in rock, Geol. Soc. London, Spec. Publ., 231, 73–88, https://doi.org/10.1144/GSL.SP.2004.231.01.05, 2004.

Pollard, D. D. and Aydin, A.: Progress in understanding jointing over the past century, Geol. Soc. Am. Bull., 100, 1181–1204, https://doi.org/10.1130/0016-7606(1988)100<1181:PIUJOT>2.3.CO;2, 1988.

Priest, S. D. and Hudson, J. A.: Estimation of discontinuity spacing and trace length using scanline surveys, Int. J. Rock Mech. Min. Sci. Geomech. Abstr., 18, 183–197, https://doi.org/10.1016/0148-9062(81)90973-6, 1981.

Prioul, R. and Jocker, J.: Fracture characterization at multiple scales using borehole images, sonic logs, and walkaround vertical seismic profile, Am. Assoc. Pet. Geol. Bull., 93, 1503–1516, https://doi.org/10.1306/08250909019, 2009.

Rives, T., Razack, M., Petit, J.-P., and Rawnsley, K. D.: Joint spacing: analogue and numerical simulations, J. Struct. Geol., 14, 925–937, https://doi.org/10.1016/0191-8141(92)90024-Q, 1992.

Sanderson, D. J. and Peacock, D. C. P.: Line sampling of fracture swarms and corridors, J. Struct. Geol., 122, 27–37, https://doi.org/10.1016/j.jsg.2019.02.006, 2019.

Scholz, C. H.: The Mechanics of Earthquakes and Faulting, 3rd ed., Cambridge University Press, Chambridge, UK, 2019.

Schultz, R. A.: Geologic Fracture Mechanics, Cambridge University Press., Cambridge, UK, 2019.

Spyropoulos, C., Griffith, W. J., Scholz, C. H., and Shaw, B. E.: Experimental evidence for different strain regimes of crack populations in a clay model, Geophys. Res. Lett., 26, 1081–1084, https://doi.org/10.1029/1999GL900175, 1999.

Spyropoulos, C., Scholz, C. H., and Shaw, B. E.: Transition regimes for growing crack populations, Phys. Rev. E, 65, 056105, https://doi.org/10.1103/PhysRevE.65.056105, 2002.

Swan, A. R. H. and Sandilands, M.: Introduction to geological data analysis, Wiley-Blackwell, 1995.

Tan, Y., Johnston, T., and Engelder, T.: The concept of joint saturation and its application, Am. Assoc. Pet. Geol. Bull., 98, 2347–2364, https://doi.org/10.1306/06231413113, 2014.

Tavani, S., Storti, F., Fernández, O., Muñoz, J. A., and Salvini, F.: 3-D deformation pattern analysis and evolution of the Añisclo anticline, southern Pyrenees, J. Struct. Geol., 28, 695–712, https://doi.org/10.1016/j.jsg.2006.01.009, 2006.

Tavani, S., Storti, F., Lacombe, O., Corradetti, A., Muñoz, J. A., and Mazzoli, S.: A review of deformation pattern templates in foreland basin systems and fold-and-thrust belts: Implications for the state of stress in the frontal regions of thrust wedges, Earth-Science Rev., 141, 82–104, https://doi.org/10.1016/j.earscirev.2014.11.013, 2015.

Terzaghi, R. D.: Sources of Error in Joint Surveys, Géotechnique, 15, 287–304, https://doi.org/10.1680/geot.1965.15.3.287, 1965.

Turcotte, D. L.: Fractals and Chaos in Geology and Geophysics, 2nd ed., 1997.

Twiss, R. J. and Moores, E. M.: Structural Geology 2nd ed., W. H. Freeman, 2006.

Vermilye, J. M. and Scholz, C. H.: The process zone: A microstructural view of fault growth, J. Geophys. Res.-Sol. Ea., 103, 12223–12237, https://doi.org/10.1029/98JB00957, 1998.

Wang, Q., Laubach, S. E., Gale, J. F. W., and Ramos, M. J.: Quantified fracture (joint) clustering in Archean basement, Wyoming: application of the normalized correlation count method, Pet. Geosci., 25, 415–428, https://doi.org/10.1144/petgeo2018-146, 2019.

Wasserman, L.: All of Statistics, Springer, New York, USA, 2004.