We present an innovative workflow for the statistical analysis of fracture data collected along scanlines, composed of two major stages, each one with alternative options. A prerequisite in our analysis is the assessment of stationarity of the dataset, which is motivated by statistical and geological considerations. Calculating statistics on non-stationary data can be statistically meaningless, and moreover the normalization and/or sub-setting approach that we discuss here can greatly improve our understanding of geological deformation processes. Our methodology is based on performing non-parametric statistical tests, which allow detecting important features of the spatial distribution of fractures, and on the analysis of the cumulative spacing function (CSF) and cumulative spacing derivative (CSD), which allows defining the boundaries of stationary domains in an objective way. Once stationarity has been analysed, other statistical methods already known in the literature can be applied. Here we discuss in detail methods aimed at understanding the degree of saturation of fracture systems based on the type of spacing distribution, and we evidence their limits in cases in which they are not supported by a proper spatial statistical analysis.

The analysis of fracture systems is a traditional topic in structural geology and rock mechanics (e.g. Pollard and Aydin, 1988; Twiss and Moores, 2006), and it has been recognized for a long time that the quantitative characterization of fracture networks is necessary in both fundamental studies aimed at understanding brittle deformation processes in different geological environments (e.g. Jaeger et al., 2007; Scholz, 2019; Schultz, 2019) and in applications such as engineering rock mechanics (e.g. Hoek, 1980) and the characterization and modelling of subsurface fluid flow (e.g. Gleeson and Ingebritsen, 2012).

For “fractures” in this contribution we use the broad definition given e.g. by Twiss and Moores (2006), Davis et al. (2012), and Schultz (2019), who include in the definition of fractures or “cracks” all kinds of brittle discontinuities, such as joints, veins, shear fractures, (micro-faults), and in some cases even stylolites and other “anti-cracks”. Following the same authors, a fracture set is defined as a cogenetic set of fractures showing the same kinematics and orientation (with some variability), while a fracture system or network includes all fracture sets that are present in a volume of rocks.

For years, the common way of collecting quantitative fracture data has been based on scanlines: drawing a line, or laying a tape measure, on an outcrop and collecting all intersections between this line and fractures (e.g. Terzaghi, 1965; Hobbs, 1967; Priest and Hudson, 1981). Equivalent 1D datasets are also obtained from boreholes from both direct observation of cores and from geophysical imaging methods (e.g. Prioul and Jocker, 2009). The main advantage of scanline surveys is that they can be easily carried out in relatively short times and in a multitude of outcrops; hence, this technique has also become a standard in engineering applications (ISRM, 1978).

The minimum goal of scanline surveys is the measurement of fracture spacing – the distance between two neighbouring fractures – and the analysis of derived parameters such as fracture density (number of fractures per unit length) or intercept (inverse of density; e.g. Priest and Hudson, 1981).

Schematic examples of different spatial distributions of fractures
sampled along a scanline or borehole. In all examples the fracture density is
the same. For each example we show the following, where applicable: a barcode plot of all
fracture intersections along the scanline; a rank-order position vs. rank-order spacing plot; rank-order spacing at

Other fundamental information that can be obtained from scanlines is the 1D spatial distribution (e.g. Swan and Sandilands, 1995; Laubach et al., 2018), i.e. whether fractures are randomly distributed in space or they show some form of spatial organization, defined as any form of departure from complete randomness (Fig. 1a). We can consider two main and opposite types of organization: (i) regular, uniform, periodic arrangements (Fig. 1b) and (ii) arrangements with some form of clustering (Fig. 1c) or repeating pattern (Fig. 1d) (e.g. Swan and Sandilands, 1995; Bonnet et al., 2001; Marrett et al., 2018).

The spatial organization of fractures can be interpreted in the function of the processes responsible for their formation and evolution. Fractures nucleate at randomly distributed weakness points in the host rock (e.g. Schultz, 2019) and then propagate, increasing their length, initially without interacting with each other (e.g. Spyropoulos et al., 2002). When the number of fractures and their dimensions increase (e.g. fracture density and intensity increase), fractures start interacting, with two possibly opposite effects (e.g. Scholz, 2019). If the material is effectively weakened by fractures (e.g. at fracture tips where stress concentrations occur or due to diffuse damage in the material), fractures tend to propagate towards each other or to nucleate close to each other in an “attractive” process (e.g. Crider and Pollard, 1998). When instead stress shadows develop, fracture nucleation or propagation is inhibited close to other fractures in a “repulsive” process (e.g. Spyropoulos et al., 1999). Finally, Hooker and Kat (2015) show how vein cementation partly restores rock continuity, hence reducing repulsive effects related to stress shadows.

Mechanical and statistical simulations have shown that repulsive processes result in regular, uniform, and/or periodic spatial distributions for which the mean spacing might reflect the width of stress shadows (Rives et al., 1992; Bai and Pollard, 2000; Tan et al., 2014). On the other hand, attractive processes result in clustering of fractures or in the development or repeating patterns, which sometimes, but not always, show fractal distributions (e.g. Gillespie et al., 1993; Turcotte, 1997; Marrett et al., 2018). As also shown by Olson (2004), subcritical crack growth can result in clustered fracture distributions, which can be described as fracture corridors (sensu Sanderson and Peacock, 2019).

A different type of spatial organization, depending on external tectonic and/or lithological controls, is represented by large-scale trends in fracture density and spacing (Fig. 1e) that can develop e.g. in fault damage zones, with fractures that are more densely spaced close to the fault core and more dispersed as the distance increases (e.g. Vermilye and Scholz, 1998; Faulkner et al., 2008; Bistacchi et al., 2010; Choi et al., 2016) or when fracturing is controlled by folding (Tavani et al., 2006; Cosgrove, 2015; Tavani et al., 2015).

Following the fundamental paper by Hobbs (1967), many authors investigated the relationships between the spatial distribution of fractures and the statistical distribution of their spacing (e.g. Rives et al., 1992; Gross, 1993; Tan et al., 2014). These relationships are based on sound statistical grounds. For instance, if we consider a Poisson distribution in 1D (a perfectly random distribution of events along a line), the spacing between neighbouring events will be characterized by a negative exponential distribution, with a numerically equal mean and standard deviation (Swan and Sandilands, 1995). If we consider a regular spatial distribution, the mean spacing will be sharply defined, the spacing of the standard deviation will be much smaller, and the statistical distribution of spacing will be close to a symmetrical normal distribution (Rives et al., 1992). Intermediate situations will show spacing distributions with intermediate skewness, such as gamma or log-normal distributions (Rives et al., 1992).

These observations lead many authors using the distribution of spacing to
characterize the spatial distribution as if the above-mentioned
relationships can be taken as

However, Wang et al. (2019), for instance, warned against using this methodology that is not able to “capture
mixtures of highly clustered and more regularly spaced patterns”. We will
demonstrate in the following that the problem is more general and lies in
the fact that the relationship between spatial distribution and the
distribution of spacing is

A simple yet promising approach for studying spatial organization is based on cumulative distributions. Gillespie et al. (1999) plotted the cumulative vein thickness vs. vein position along a scanline to characterize spatial distribution. Choi et al. (2016) plotted the cumulative number of fractures along scanlines and used marked changes in the slope of this curve to reveal fault zone architecture features, such as the boundary of the damage zone. However, neither developed quantitative statistical criteria in their analysis.

In this contribution, we will first focus on a rigorous framework to measure spatial and spacing distributions in scanlines, on the relationships between these distributions, and particularly on the debated possibility to infer spatial distribution from spacing distribution (is it a biunivocal relationship?). An important point, not addressed by previous authors, will be dedicated to defining the stationarity of a statistical sample within a given spatial sampling domain – a prerequisite to detect any meaningful statistics of the sample itself and to extrapolate it to the underlying population. We will then propose non-parametric and parametric statistical tests that can be used to characterize the distributions, and we will propose a rigorous workflow that can be applied to the practical analysis of scanline data.

To completely characterize the spatial distribution of fractures and their
spacing, we need to define two stochastic variables: position and spacing.
Having collected

If local outcrop or borehole conditions do not allow collecting the scanline
perpendicular to the mean plane of the fracture set, we will apply the
Terzaghi (1965) correction to obtain the true position
from the apparent one:

In statistics, a stationary process is a stochastic process whose characteristic probability distributions do not change when the domain of the analysis (e.g. in space or time) is varied (e.g. shifted, shrunk, enlarged; e.g. Wasserman, 2004). Stationarity is a prerequisite in many kinds of analyses performed on time series, and it is also a fundamental concept for spatially distributed variables. In geostatistical studies on spatially distributed variables, a variable is said to be stationary if there is no significant drift or trend within a specified spatial domain (distance, area, or volume), with drift or trend defined as the component of a regionalized variable resulting from large-scale processes that can be defined with deterministic analytical functions of the spatial variables (Swan and Sandilands, 1995; Borradaile, 2003). An example of a trend in fracture studies is the exponential decay in fracture density that can be found in some fault damage zones (e.g. Caine and Forster, 1999; Mitchell and Faulkner, 2009). Within this framework, residuals represent the completely random component of the regionalized variable that, if a trend is present, can be obtained by normalizing the regional variable with the trend (Swan and Sandilands, 1995).

Even if this analysis is not routinely performed, understanding if our dataset is affected by a trend (Fig. 1e) is fundamental in every kind of statistical analysis on spatial variables since, if a trend is present, all statistics will be affected by the choice of the sampling domain. If, for instance, we calculate the sample mean of fracture spacing in a fault damage zone showing an exponential trend, the mean will change depending on the position and length of the scanline. Fracture spacing is not stationary in this case and its mean is meaningless. The same happens at a smaller scale (e.g. a small segment of the scanline) if fractures are clustered (Fig. 1c) or arranged in a pattern (Fig. 1d).

According to the general definition given above, a stationary process is a process whose statistics do not change when the sample is changed, moved, shifted, or resized in space (or time; Wasserman, 2004). With reference to Fig. 1, this restrictive condition is met only in cases when fractures are randomly distributed (Fig. 1a) or regularly spaced (Fig. 1b).

Non-parametric statistical methods discussed here are those in which (Davis, 2002) (i) data are represented with ordinal or rank-order scales instead of interval or ratio scales and (ii) data are not required to fit parametric statistical distributions (e.g. the normal or exponential distributions). Since they are independent from any assumption or statistical model, they are quite useful, particularly in the first phases of analysis (Swan and Sandilands, 1995).

The non-parametric Spearman's rank correlation coefficient measures the
correlation between two rank-order variables. These are variables obtained
by sorting numerical data (interval- or ratio-scale variables) from the
smaller to the larger value and then replacing each value with an integer
representing its position in the sequence. If we take a dataset with

We can use the Spearman's rank correlation coefficient to test the null
hypothesis of no correlation vs. the alternative hypothesis of correlation,
considering critical values of

Scanline reference frame with the definition of position and spacing
variables. L: scanline length;

Comparison of the Spearman's rank correlation coefficient with respect to the parametric Pearson's correlation coefficient. The non-parametric Spearman's correlation is more robust in the case of outliers and is also able to detect non-linear correlation in addition to standard linear correlation (Swan and Sandilands, 1995).

The advantage of the Spearman's rank correlation coefficient with respect to its parametric counterpart – the Pearson's correlation coefficient – is graphically explained in Fig. 3, where we see that the non-parametric correlation is more robust in the case of outliers and is also able to detect non-linear correlation in addition to standard linear correlation (Swan and Sandilands, 1995).

We have found that, to study the spatial distribution of spacing, it is very useful to plot the cumulative function of spacing normalized by scanline length and its first derivative (Fig. 1). This approach can be seen as a development of those proposed by Gillespie et al. (1999), Choi et al. (2016), and Sanderson and Peacock (2019), with the important difference that we have developed a new quantitative and objective method to detect segments of a scanline that show a stationary behaviour.

The cumulative spacing function (CSF) corresponds to a plot of the relative
proportion of scanline length associated with each block limited by a pair
of fractures. The reference CSF, associated with a perfectly uniform spatial
distribution of fractures, corresponds to a constant-slope line with
slope

The first derivative of the CSF (i.e. the CSD, cumulative spacing
derivative) follows the same behaviour. The reference CSD for a perfectly
uniform distribution plots as a horizontal line with height

By comparing the CSF and CSD, it is possible to define stationary scanline
segments that show a constant slope and hence a constant fracture density.
If more than one stationary segment occurs in a scanline, the boundary
between different segments is found by comparing the data CSD with the
reference CSD. The only difficulty with this approach is that the CSD of
natural datasets can be very noisy, with many spikes (see the random
distribution in Fig. 1a). This problem is solved
by smoothing the derivative with an

Our workflow for the analysis of fracture data collected on scanlines can be subdivided into two major stages. First, we use non-parametric statistics to check whether the dataset is stationary, and if it is not stationary, we create subsets and/or we rescale or normalize the data in order to obtain stationary sets or subsets. Then, we analyse the datasets and/or subsets with parametric statistics, following well-established methods to characterize the type of spatial distribution and its parameters.

The MATLAB^{®} App DomStudioFracStat1D
(Bistacchi, 2020), with a user-friendly graphical user
interface that we developed to perform this analysis, is available for
download at

We apply the Spearman's rank correlation coefficient to test the correlation between two pairs of rank variables; the first is designed to detect large-scale trends and the other one better suited to detect distributions with clusters or patterns. Our usage was inspired by Swan and Sandilands (1995), with some original variations and developments.

To test the data for large-scale trends (Fig. 1e),
we test the correlation between the position of each block bounded by two
fractures

Even if large-scale trends are not detected, local small-scale trends could
reveal clustering (Fig. 1c) or repeating patterns
(Fig. 1d). We therefore compare spacing between
pairs of fractures taken in a sequence along the scanline using this
formulation for the correlation coefficient:

The spacing–spacing correlation test used to detect local clustering and/or patterns must be performed after the position–spacing test used to detect large-scale trends because a large-scale trend is seen by the local test as a single large-scale cluster; hence, the second test alone is not discriminant for local vs. large-scale correlation in spacing.

If the null hypotheses of no large-scale trend and no local clustering and pattern are retained, we can move on to the parametric analysis stage (next section). However, if this is not the case, the dataset can be rescaled, normalized, and/or segmented into subsets to obtain stationary sets.

Rescaling and normalization are best suited when we detect a smooth deterministic trend in spacing; i.e. the fracture density varies continuously along the scanline. In this case we must define the deterministic function of the trend and normalize the data with this function; the residuals can be considered a stationary set that can be analysed with the methods presented in the next section. The methods to find the deterministic trend function are various, and we feel that they must also be guided by geological and tectonic observations, so we think that there is no “general method” to complete this task. We will see an example in the first case study (Sect. 4.1), in which we study the variable spacing of joints in a laterally tapering layer of sandstone.

Sub-setting is very useful when we observe a stair-stepping pattern, e.g. where we observe lower-fracture-density domains intercalated by fracture corridors showing markedly higher fracture intensity, or in some fault damage zones (e.g. case study 2; Martinelli et al., 2020). These segments can be recognized and objectively classified by observing the CSF and CSD, particularly by comparing the data curves to the reference ones. In our workflow, we (i) select subset scanline segments (i.e. sub-scanlines) that are generally bounded by points of intersection of the data CSD with the reference CSD, corresponding to changes in the slope of the data CSF. (ii) We then test each subset once again for stationarity, and if the result is positive, (iii) we pass it to the parametric analysis stage (next section).

We also conducted an alternative test for randomness by comparing the spatial
distribution of

Once we have segments of scanline that show a stationary behaviour, we can use standard parametric methods to define the type of spacing distribution (e.g. normal vs. exponential) and fit its parameters (e.g. population mean and standard deviation) to the sample fracture spacing data. Then, following a rich literature (e.g. Rives et al., 1992; Tan et al., 2014), various parameters that correlate with the evolution of a fracture system with increasing deformation can be discussed. In this contribution, we test the spacing distribution for normal (Gaussian), log-normal, and negative exponential distributions.

The maximum likelihood estimate (MLE) of the parameters of these distributions
can be obtained directly in closed form from sample statistics. For
instance, the mean spacing

The Kolmogorov–Smirnov (K-S) test is a non-parametric test that was
developed to check if two samples come from the same distribution, either
empirical or theoretical, and it can be used as a goodness-of-fit (GOF) test
for the null hypothesis that the empirical distribution function (EDF) is
not different from the model cumulative distribution function (CFD). This
test is used by many authors, but it is biased if the parameters of the
theoretical distribution are not fixed but estimated from the data
themselves (Wasserman, 2004). For this reason we
apply the Lilliefors test instead of K-S when possible (Lilliefors, 1967, 1969), which is not subject to this bias, but
unfortunately the Lilliefors test is not available for log-normal
distributions. In both cases the results can be expressed as

In our workflow, if only one type of distribution passes the test, we retain
this as the best-fit one. If more than one distribution is retained and the

Analysis for the case study on bed-confined joints in turbiditic sandstones. See the discussion in the main text.

The scanline studied in this case comes from a single turbiditic sandstone
bed in the Langhian to Tortonian Marnoso-Arenacea Formation (Val Santerno,
northern Apennines of Italy; Ogata et al.,
2017). The scanline was collected on a photogrammetric digital outcrop model
(e.g. Bistacchi et
al., 2015) with a 5 mm pixel resolution. The sandstone bed has a variable
thickness decreasing from ca. 32 cm to ca. 16 cm over a distance of ca. 95 m. Fractures are bed-confined tensional joints and the hypothesis that we
are testing is whether the bed thickness controls joint spacing (as
predicted by e.g. Bai and Pollard, 2000). In
Fig. 4a the barcode plot shows the position

Based on these results, we cannot calculate any meaningful statistics on this scanline, since, for instance, the average fracture spacing is not stationary. The option to subset the scanline to obtain stationary subsets is not feasible, since the trend appears very continuous in the CSD. If we go back to the geological model to be tested – fracturing is controlled by bed thickness – it is natural to try and normalize the data with bed height.

Bed height

The second stage of analysis shows that normalized spacing is log-normal
distributed (Fig. 4c), with a

From these statistics we conclude that fracture spacing in this turbiditic sandstone layer is controlled by bed thickness and that the joint system is tending towards joint saturation (see discussion in Bai and Pollard, 2000.

Analysis for the case study on fault-related fracturing in platform carbonates. See the discussion in the main text.

The scanline discussed in Fig. 5 was collected in the south-eastern part of the island of Pag (Croatia). In this region, Cretaceous carbonate platform successions are folded and imbricated by NW–SE-trending thrusts of the External Dinarides (Korbar, 2009; Mittempergher et al., 2019). The scanline was collected along a mudstone–wackestone bed in a high-angle anticline forelimb between two minor dextral E–W-trending strike-slip faults with horizontal offsets of 10.3 and 0.5 m, respectively. Closely spaced subvertical extensional fractures and veins, striking nearly parallel to the faults, cross-cut folded bedding planes and are therefore associated with the last stages of fold tightening or postdate folding. Extensional fractures, which are generally longer than veins, bear patches of blocky calcite having the same appearance as that cementing the veins. This evidence suggests the hypothesis that fractures and veins were cogenetic and both associated with the activity of strike-slip faults. To test the relations between faults, veins, and fractures, here we considered extensional fractures and veins to be a unique fracture set.

The barcode plot (Fig. 5a) highlights that, as
expected, fracture spacing increases with distance from the fault with the
highest offset, i.e. left to right. The Spearman's correlation coefficient
test for large-scale trends yields

As in the previous example, fracturing is not stationary along this scanline and calculating parameters such as average spacing or fracture density over the entire scanline would have been meaningless. We therefore test the hypothesis that the fault damage zone can be subdivided into internally homogeneous sectors. These should have relatively constant slopes in the CSF, and we selected their boundaries by recognizing plateaus in the CSD, separated by sharp transitions at 5.0 and 12.8 m (Fig. 5a).

When segmented, the three sectors do not show trends or patterns (non-parametric tests for large-scale and local spacing correlation; Fig. 5b) and instead have a CSF and CSD compatible with a random or uniform distribution (Fig. 5b). We conclude that fractures behave as stationary in the three sectors; hence, we can calculate sample statistics such as mean spacing and its inverse – fracture density (Fig. 5d). This suggests that, moving away from the largest fault, fracturing does not decrease continuously but stepwise.

Within each stationary sector, a parametric distribution can be fitted to the data. According to the Lilliefors test, no sector has an empiric EDF compatible with an exponential CDF (with significance nearing 100 %). Sector 1 is consistent with a log-normal distribution, sector 2 is compatible with a normal distribution, and sector 3 is compatible with both log-normal and normal distributions. Log-normal and normal distributions indicate a certain degree of organization and fracture saturation, i.e. an evolution towards regularly spaced fractures and the effect of some repulsive process as discussed in the Introduction. In this case, in which fractures cross-cut several mechanical layers and are genetically related to faults, the factors controlling fracture spatial distribution are likely related to inhomogeneous strain distribution (e.g. Spyropoulos et al., 1999).

If we consider the

We have introduced an innovative workflow for the statistical analysis of fracture data collected along scanlines. The workflow is composed of two major stages, each one with alternative options for the analysis of data showing different behaviour and spatial arrangement. Navigating across different options is based on geological hypotheses on the data that can be validated or falsified by quantitative statistical tests; hence, the workflow is both adherent to the geological goals of the analysis and objective under the statistical point of view.

A prerequisite in our analysis is the assessment of the stationarity of the dataset. From the statistical point of view, this is motivated by the fact that calculating statistics on non-stationary samples can be meaningless, particularly if the goal is to estimate parameters of the underlying population. From the geological point of view, the normalization and sub-setting required to obtain stationary segments of scanline improve our understanding of the deformation processes very much.

For instance, in the first case study normalizing the fracture spacing distribution by bed thickness and observing that the residuals show an almost-saturated log-normal distribution yield an extremely strong demonstration of the effect of layering on jointing (sensu Bai and Pollard, 2000), even stronger than in many other published examples for which bed thickness is constant.

In the second case study, sub-setting the scanline to obtain stationary and homogeneous domains improved the understanding of fracturing in a damage zone, quite like in Choi et al. (2016) and in Martinelli et al. (2020). Our methodology based on the analysis of CSF and CSD allows setting the boundaries of stationary domains in an objective way, and repeating the non-parametric tests for large-scale and local spacing correlation on the sub-scanlines allows for confirmation that the subsets are really stationary. If applied routinely, this will help shed new light on the internal structure of fault zones (i.e. fault zone architecture studies) and probably in situations in which there is a lithological control on brittle deformation.

Once the stationarity requirement has been demonstrated for the dataset
or subset, many statistical methods already known in the literature can be
applied. In this contribution we mainly discussed methods aimed at
understanding the degree of saturation of fracturing based on the type of
spacing distribution (i.e. exponential, log-normal, or normal; e.g. Rives et al., 1992; Tan et
al., 2014). This is a simple achievement, but we would like to recall that it
is only possible to use this approach once stationarity has been
established; otherwise, errors cannot be avoided since the relationship
between spatial distribution and spacing distribution is

On the other hand, advanced analyses such as the normalized correlation count (NCC) by Marrett et al. (2018) can be performed on stationary datasets defined as proposed here and will improve the understanding of fracturing in many interesting case studies. If using NCC, whether the non-parametric test for local spacing correlation is really necessary must be evaluated, but we feel that demonstrating large-scale stationarity is still fundamental, at least to be able to use sample distributions to model the underlying population distributions.

Finally, we have also conducted a goodness-of-fit test aimed at directly
detecting the random Poisson spatial distribution. This seemed a logical
choice since the discrete Poisson distribution is the model of every random
distribution of discrete events in 1D, 2D, and 3D, and Poisson processes are
used to distribute fractures in discrete fracture network (DFN) simulations
(e.g. Dershowitz et al., 2003; Elmo and
Stead, 2010; Bonneau et al., 2016). Unfortunately,
the

The MATLAB^{®} App DomStudioFracStat1D
(Bistacchi, 2020), with a user-friendly graphical user
interface that we developed to perform this analysis, is available for
download at

The methodology presented in this paper was inspired by FS and developed by AB, who also developed the code used for the analysis. FS provided the data for the first case study that was analysed by AB. SM provided the data and performed the analysis for the second case study. MM contributed with fruitful discussions in all stages of the analysis while he was developing a similar analysis in a different context (Martinelli et al., 2020).

The authors declare that they have no conflict of interest.

This article is part of the special issue “Faults, fractures, and fluid flow in the shallow crust”. It is not associated with a conference.

We warmly acknowledge the editor, Roger Soliva, and the reviewers, Rodrigo Correa and an anonymous reviewer, for their very useful suggestions. We also acknowledge Angelo Borsani and Andrea Succo, who, in addition to collaborating on data collection, shared many fruitful and happy days in the field with us!

This paper was edited by Roger Soliva and reviewed by Rodrigo Correa and one anonymous referee.