Uncertainty assessment for 3D geologic modeling of fault zones based on geologic inputs and prior knowledge

Characterizing the zone of damaged and altered rock surrounding a fault surface is highly relevant to geotechnical and geo-environmental engineering works in the subsurface. Evaluating the uncertainty associated with 3D geologic modeling of these fault zones is made possible using the popular and flexible input-based uncertainty propagation approach to geologic model uncertainty assessment – termed probabilistic geomodeling. To satisfy the automation requirements of probabilistic geomodeling while still preserving the key geometry of fault zones in the subsurface, a clear and straightforward modeling approach is developed based on four geologic inputs used in implicit geologic modeling algorithms (surface trace, structural orientation, vertical termination depth and fault zone thickness). The rationale applied to identifying and characterizing the various sources of uncertainty affecting each input are explored and provided using open-source codes. In considering these sources of uncertainty, a novel model formulation is implemented using prior geologic knowledge (i.e., empirical and theoretical relationships) to parameterize modeling inputs which are typically subjectively interpreted by the modeler (e.g., vertical termination depth of fault zones). Additionally, the application of anisotropic spherical distributions to modeling disparate levels of information available regarding a fault zone’s dip azimuth and dip angle is demonstrated, providing improved control over the structural orientation uncertainty envelope. The probabilistic geomodeling approach developed is applied to a simple fault zone geologic model built from historically available geologic mapping data, allowing for a visual comparison of the independent contributions of each modeling input on the combined model uncertainty, revealing that vertical termination depth and structural orientation uncertainty dominate model uncertainty at depth, while surface trace uncertainty dominates model uncertainty near the ground surface. The method is also successfully applied to a more complex fault network model containing intersecting major and minor fault zones. The impacts of the model parameterization choices, the fault zone modeling approach and the effects of fault zone interactions on the final geologic model uncertainty assessment are discussed.

Abstract. Characterizing the zone of damaged and altered rock surrounding a fault surface is highly relevant to geotechnical and geo-environmental engineering works in the subsurface. Evaluating the uncertainty associated with 3D geologic modeling of these fault zones is made possible using the popular and flexible input-based uncertainty propagation approach to geologic model uncertainty assessment -termed probabilistic geomodeling. To satisfy the automation requirements of probabilistic geomodeling while still preserving the key geometry of fault zones in the subsurface, a clear and straightforward modeling approach is developed based on four geologic inputs used in implicit geologic modeling algorithms (surface trace, structural orientation, vertical termination depth and fault zone thickness). The rationale applied to identifying and characterizing the various sources of uncertainty affecting each input are explored and provided using open-source codes. In considering these sources of uncertainty, a novel model formulation is implemented using prior geologic knowledge (i.e., empirical and theoretical relationships) to parameterize modeling inputs which are typically subjectively interpreted by the modeler (e.g., vertical termination depth of fault zones). Additionally, the application of anisotropic spherical distributions to modeling disparate levels of information available regarding a fault zone's dip azimuth and dip angle is demonstrated, providing improved control over the structural orientation uncertainty envelope. The probabilistic geomodeling approach developed is applied to a simple fault zone geologic model built from historically available geologic mapping data, allowing for a visual comparison of the independent contributions of each modeling input on the combined model uncertainty, revealing that vertical termination depth and structural orientation uncertainty dominate model uncertainty at depth, while surface trace uncertainty dominates model uncertainty near the ground surface. The method is also successfully applied to a more complex fault network model containing intersecting major and minor fault zones. The impacts of the model parameterization choices, the fault zone modeling approach and the effects of fault zone interactions on the final geologic model uncertainty assessment are discussed.

Introduction
Three-dimensional (3D) geologic models are becoming the state of the art for the prediction and communication of subsurface geology in a wide range of projects (Turner and Gable, 2007;Wellmann and Caumon, 2018) including regional geologic characterization (Stafleu et al., 2012;Waters et al., 2015), natural resource exploration (Zhou et al., 2007(Zhou et al., , 2015Zhou, 2009;Anderson et al., 2014), structural geology Ailleres et al., 2019), geotechnical site characterization (Thum and De Paoli, 2015;Zhu et al., 2013), geophysics Høyer et al., 2015;Anderson et al., 2014), hydrology (Watson et al., 2015), and mining Yang et al., 2019). The recent widespread adoption of flexible, implicit 3D geologic modeling algorithms (Cowan et al., 2003;Guillen et al., 2008;Jessell et al., 2014;Hillier et al., 2014Hillier et al., , 2017 is leading the field of 3D geologic modeling away from the creation of static models based on a single, best interpretation and towards stochastic geologic modeling with quantified uncertainty (Caumon, 2010). Understanding the uncertainty of a 3D geologic model not only provides a measure of model quality to an end user (Turner and Gable, 2007;Walker et al., 2003;Stamm et al., 2019) but also aids the geologist during model creation by analyzing the quality of input data and highlighting the impacts of subjective prior knowledge and interpretations Wood and Curtis, 2004;Jessell et al., 2018). As the use of 3D geologic modeling continues to grow, novel methods for assessing the uncertainty of various aspects of geologic models is pertinent.
Because a single model conveys no information regarding its uncertainties (Wellmann and Caumon, 2018), multiple realization approaches are becoming a popular method for assessing geologic model uncertainty (Wellmann et al., 2010;Wellmann and Regenauer-Lieb, 2012;Pakyuz-Charrier et al., 2018aJessell et al., 2014;Jessell et al., 2018;Lindsay et al., 2013;de la Varga and Wellmann, 2016;Schweizer et al., 2017;Thiele et al., 2016;Yang et al., 2019;Schneeberger et al., 2017). Recently, the well-established Monte Carlo simulation method has been adopted into a widely used method for 3D geologic model uncertainty assessment by way of uncertainty propagation of geologic model inputs into the 3D geologic model space (Wellmann and Caumon, 2018). The method, henceforth termed "probabilistic geomodeling", focuses on the impact of uncertainty in geologic modeling inputs on a 3D geologic model by generating a set of model realizations based on perturbations in selected modeling inputs, sampled using Monte Carlo simulation algorithms. Probabilistic geomodeling is flexible, allowing for a wide variety of uncertainty sources affecting various geologic modeling inputs to be quantified by the user and propagated into the 3D geologic model.
While growing in popularity, the field of geologic model uncertainty assessment remains a developing one, and the application of probabilistic geomodeling to new, practical problems requires unique model formulations. The development of novel probabilistic geomodeling approaches to address specific aspects of 3D geologic modeling will lead to growth in the field not only by broadening the usability of the method but also by advancing the understanding of the method's strengths and limitations. In addition to assessing the uncertainty in a single geologic model, probabilistic geomodeling using Monte Carlo sampling naturally fits into Bayesian inference schemes (de la Varga and Wellmann, 2016;Salvatier et al., 2016;Scalzo et al., 2019;Thiele et al., 2019), allowing for future refinement of model uncertainty as new information is made available.
This study expands the use of probabilistic geomodeling to a new aspect of geologic modeling -fault zones, the localized volume of fractured and displaced rock surrounding a finite fault surface, typically composed of a fault core and a damage zone (Caine et al., 1996;Childs et al., 2009;Peacock et al., 2016;Choi et al., 2016). Fault zones introduce regions of altered geotechnical strength and hydraulic permeability into the surrounding intact rockmass and are there-fore of major importance to geological engineering projects that rely on accurate assessments of subsurface rock properties (e.g., tunnels, mines). While faults have been the focus of a significant amount of recent geologic modeling research (Røe et al., 2014;Cherpeau et al., 2010;Cherpeau and Caumon, 2015;Aydin and Caers, 2017), these works have focused on modeling fault surfaces directly rather than modeling the 3D geometry of fault zones. Detailed modeling of the 3D geometry of fault zones can improve the understanding of faults' impacts on geotechnical and reservoir engineering projects due to the fact that variations in fault zone thickness or composition can greatly alter the mechanical and hydrological behavior of a fault, e.g., its sealing potential (Caine et al., 1996;Fredman et al., 2008;Manzocchi et al., 2010). Building on the existing literature on understanding the uncertainties about faults in the subsurface (Choi et al., 2016;Shipton et al., 2019;Torabi et al., 2019b), this study develops a novel, dedicated approach to leveraging probabilistic geomodeling to characterize the uncertainty in fault zones using 3D geologic models.
Fault zones may be irregular in shape, creating complex geometries which are difficult to characterize quantitatively (Torabi et al., 2019a, b). Peacock et al. (2016) provide a detailed list of the various types of damage zones and intersecting fault networks that comprise the general term "fault zone". The inherent complexity of fault zone structure makes their precise modeling intractable in an automated geologic modeling application, such as that required by probabilistic geomodeling. A simplified approach to modeling fault zones in 3D geologic models is developed in this study based on the key elements defining fault zone geometry at a practical level of detail.

Model implementation
The proposed workflow for modeling fault zones is provided in Fig. 1. The workflow combines observations from a geologic map with prior knowledge from the literature to approximate the 3D geometry of subsurface fault zones. The implicit geologic modeling software Leapfrog Works, a software specially designed to support creation of subsurface geological models, is used in this study. The 3D fault zone is modeled in Leapfrog Works from four inputs -surface trace (polyline), structural orientation (dip/dip azimuth vector), fault zone thickness (scalar distance function) and vertical termination depth (discretized surfaces). The modeling approach preserves the essential 3D geometry of fault zones in the subsurface while providing sufficient generalization to fit into an automated implicit modeling workflow for uncertainty propagation.
The popular implicit geologic modeling method was chosen for modeling the uncertainty in fault zones due to its ability to directly incorporate structural orientation data to modeling geologic structures. Leapfrog Works uses radial basis Figure 1. The proposed fault zone modeling workflow implemented includes (a) modeling the central fault surface from a polyline and structural orientation, (b) terminating the fault surface on a predefined vertical termination surface, and (c) defining the 3D fault zone volume using a distance function from the central fault surface (Krajnovich et al., 2020a). functions (RBFs) to efficiently interpolate the scalar fields describing implicit geologic surfaces (Seequent, 2014). Implicit geologic modeling using RBFs is comparable in quality to modeling using popular co-kriging approaches (Cowan et al., 2003;Hillier et al., 2014Hillier et al., , 2017.

Probabilistic geomodel setup
Setting up the probabilistic geomodel begins with the careful selection of key geologic modeling inputs for perturbation (Fig. 1). The set of geologic modeling inputs is characterized using probability distributions chosen and parameterized based on the believed and/or observed uncertainties in the input variables. Monte Carlo simulations independently explore the uncertainty space of each input, generating a set of input realizations which are propagated into 3D geologic models through the use of an automated implicit geologic modeling algorithm. From the set of geologic model realizations, the commonly used Shannon information entropy metric (Shannon, 1948) allows for quantifying the uncertainty about modeled geologic structures (Wellmann and Regenauer-Lieb, 2012). This overview is provided merely to introduce the reader to the general idea of probabilistic geomodeling as applied to implicit geologic modeling; the reader is referred to the book by Wellmann and Caumon (2018) for a more thorough review of the probabilistic geomodeling and implicit modeling methods' conceptual bases.
The same flexibility that allows probabilistic geomodeling to be effectively formulated for nearly any geologic modeling problem is also a potential susceptibility -the model formulation and input uncertainties must be predefined by the user. This can lead to potential over-or underestimation and biases in the uncertainty assessment performed with probabilistic geomodeling due to inappropriate selection of modeling inputs or incorrect parameterization of input probability distributions (de la Varga and Wellmann and Caumon, 2018;Pakyuz-Charrier et al., 2018b). Following the school of thought reviewed by Nearing et al. (2016), the modeler must ask questions along the lines of the following: -What inputs control the geometry of the modeled structure?
-How can these inputs be defined probabilistically?
-What information is available to characterize each source of input uncertainty?
For hard data inputs (e.g., observed contacts, structural orientation measurements), the answers to these questions are relatively well-established using measures of variance and deviation to directly characterize uncertainties (Caers, 2011;Wellmann and Caumon, 2018). On the other hand, for subjective inputs to geologic models (e.g., interpreted fault terminations), there are generally two approaches with which the subjective uncertainties can be characterized. The first is describing and quantifying the uncertainty associated with prior knowledge by utilizing believed theoretical or empirical relationships to quantify the reasoning behind a subjective geologic interpretation (Wood and Curtis, 2004). The second method, operating within a Bayesian inference scheme, is to incorporate additional geological knowledge or observations to validate -or rather, as Tarantola (2006) states, invalidatemodel realizations. This study focuses on the first method of applying prior knowledge from published structural geology literature (Torabi et al., 2019a) to parameterize the reasoning behind subjective inputs used for probabilistic geomodeling of fault zones. This approach is demonstrated effectively in this study when considering the vertical termination depth of fault surfaces (Sect. 4.3).
Having identified the geologic modeling inputs controlling the geometry of the modeled fault zone and appropriate methods for defining these inputs probabilistically, Fig. 2 shows an illustration depicting possible uncertainty envelopes due to key sources of uncertainty affecting each input. In the 3D geologic model, the geometry and extent of fault zones in the subsurface often are approximated from limited information available. While useful for conceptualizing subsurface geology and guiding future investigations, the lack of detailed data contributes significant sources of uncertainty to the 3D geologic model (Fig. 2). This study focuses on and provides guidelines for performing realistic uncertainty assessments when creating 3D geologic models of fault zones from limited, preliminary investigation data (e.g., a geologic map), further demonstrating how prior knowledge is used to characterize uncertainty about inputs which are typically subjective (i.e., vertical termination depth). While formulated for a case with limited data, the developed probabilistic geomodeling approach allows for accommodating additional observations of the fault zone geologic modeling inputs (e.g., via a modern outcrop study) through a straightforward reparameterization of the input probability distributions to include the new data.
Moving forward, Sect. 3 details the appropriate selection and parameterization of probability distributions to accurately characterize input uncertainties for various the data types used in 3D geologic modeling of fault zones. Section 4 highlights the considerations for characterizing each modeling input's uncertainty and references the Python code written for performing and assessing the input perturbations on a single fault zone model built from a geologic map in the Rocky Mountains of Colorado, USA. Section 5 demonstrates the probabilistic geomodeling approach applied to a more complex fault network model to assess how the method scales and investigate the interaction of intersecting fault zones. Section 6 discusses the observed results of each model, highlighting key contributions of the probabilistic geomodeling approach including the impact of different uncertainty parameterizations and guidelines for future model refinement. Section 7 reviews the probabilistic geomodeling implementation for fault zones and reiterates the importance of the rationale used, concluding with recommendations for future work.

Probability distributions
The probabilistic geomodeling approach developed requires additional advancements in the selection and parameterization of probability distributions used for characterizing the uncertainty in objective and subjective geologic modeling inputs. The selection of an appropriate probability distribution type from the various distributions available for modeling involves the consideration of two factors: the data type and the level of knowledge about the input. Geologic data may be discrete (e.g., lithological categories) or continuous (e.g., thickness), with various probability distributions available to characterize both data types (e.g., normal, uniform, log-normal, binomial -see Gelman et al., 2013). All of the modeling inputs used for modeling the 3D geometry of fault zones are described using continuous data types.
An additional consideration in the case of continuous data types is the distinction between scalar and vectorial data (e.g., structural orientations). A probability distribution describing orientation data resides on the surface of a unit sphere in 3D and can be characterized using spherical probability distributions (Fisher et al., 1987;Mardia and Jupp, 2000). The benefit of using spherical probability distributions to describe structural orientation uncertainty in 3D geologic modeling is clearly stated by Pakyuz-Charrier et al. (2018b), and their application in probabilistic geomodeling continues to develop (Pakyuz-Charrier et al., 2018b, a;Carmichael and Ailleres, 2016). To remain concise, the following section focuses on the new contributions made to the use of spherical probability distributions utilizing the Rfast open-source package available in the R language (Papadakis et al., 2018).

Spherical probability distributions
Fault orientations are vectors described by dip and dip azimuth components. Stereographic projection is often used to describe the fault plane using its pole (i.e., normal), defined by a unit vector or a trend and plunge. For handling orientation data in the probabilistic geomodel, distributions of the Fisher-Bingham family (Bingham, 1964;Kent, 1982) provide a wide variety of choices for modeling varying degrees of uncertainty. Pakyuz-Charrier et al. (2018b) recently showed that scalar distributions are inadequate for modeling the uncertainty in structural orientation data, providing an example using the von Mises-Fisher (vMF) distribution (spherical analogue to the isotropic bivariate normal distribution). This research continues the exploration into the use of spherical distributions in probabilistic geomodeling by implementing the more general Bingham and Kent distributions (Fisher et al., 1987) from the Fisher-Bingham family to characterize anisotropic uncertainty in structural orientations used in 3D geologic modeling.
In structural geology, the use of spherical distributions to understand the uncertainty about structural orientation measurements is well established (Mardia, 1981;Cheeney, 1983;Davis and Titus, 2017;Roberts et al., 2019). The open-source Orient and Stereonet softwares by Vollmer (2018) and Allmendinger (2015) provide uncertainty estimates of structural data using spherical statistics, while Davis and Titus (2017) and Roberts et al. (2019) used spherical distributions to estimate confidence intervals about uncertain structural orientation measurements of folds and foliations. The analysis of anisotropic orientation uncertainty in structural geology is well established with Zhou and Maerz (2002), Peel et al. (2001), Carmichael and Ailleres (2016), and Davis and Titus (2017) applying it to joint set identification, structural data clustering and foliation-lineation characterization. While Pakyuz-Charrier et al. (2018a) showed clearly that the dip angle and dip azimuth should not be simulated independently as scalar values, it is apparent that the uncertainty affecting each of these aspects of the structural orientation need not be equal.
Two spherical distributions were reviewed in this study: the Bingham and Kent distributions. As opposed to the isotropic vMF distribution, the Bingham and Kent distributions are capable of modeling potentially more realistic anisotropic uncertainty envelopes. This study explores a novel application of anisotropic spherical distributions in probabilistic geomodeling: characterizing subjective bias in the structural orientation uncertainty in fault zones (i.e., differing levels of information regarding the dip angle and dip azimuth of faults modeled in implicit 3D geologic models). The distributions, typically characterized from a series of input measurements, can also be characterized by directly controlling the input parameters themselves. This method allows for the modeler to assume the size and shape of the structural orientation uncertainty envelope in the probabilistic geomodel setup.
The Bingham distribution is an antipodally symmetric distribution for axial data, defined explicitly in R 3 (p = 3) by a set of orthogonal eigenvectors (e 1 , e 2 , e 3 ) and corresponding eigenvalues (λ 1 ≥ λ 2 ≥ λ 3 = 0). The eigenvector and eigenvalue pairs, respectively, detail the direction and degree of maximum, intermediate and minimum variance in the Bingham distribution. The distribution is described by Eq. (1), where A = diag(λ 1 , λ 2 , λ 3 ), and c(A) is the corresponding normalization constant (Fallaize and Kypraios, 2016). Setting λ 3 = 0 merely ensures that the distribution shows maximum variance in the axial direction (i.e., across the unit sphere), allowing λ 1 and λ 2 to fully control the shape of the distribution when projected onto the lower hemisphere.
The Kent distribution (or Fisher-Bingham five-parameter distribution) is a more generalized form of the Bingham distribution, defined for vectorial data focused around a known mean vector with an anisotropic uncertainty envelope. It is characterized by a mean vector (x = x 1 , x 2 , x 3 ), the concentration parameter κ and an ovalness parameter β. Its density is described by Eq.
(2) for κ ≥ 0, β ≥ 0, where is an orthogonal p × p matrix that can be likened to the eigenvector matrix used in simulating the Bingham distribution. The reader is referred to Appendix C of Pakyuz-Charrier et al.
(2018a) for a more thorough explanation of the parameterization of the Kent distribution.
When modeling structural geologic orientation data, the distinction between axial (i.e., undirected) and vectorial (i.e., directed) data is irrelevant following the application of lowerhemisphere stereographic projection. The stereographic projection is applicable because in reality the orientation of geologic structures is defined by an axis laying within the plane of the structure. Regardless of the method of characterizing and sampling the orientation data, a stereographic projection to the lower-hemisphere will return the conventional downdip orientations as defined by dip/dip-azimuth or righthandrule systems.

Sampling
Sampling of scalar data is straightforward and well established through the use of Monte Carlo sampling algorithms, easily accessible through the open-source Python package PyMC3 (Salvatier et al., 2016). The PyMC3 library is designed to facilitate Bayesian inference using computational sampling algorithms, though the inclusion of likelihood functions is not required, thereby allowing for utilization of the package functions for Monte Carlo sampling alone. The use of PyMC3 has been demonstrated successfully in the context of 3D geologic modeling by de la Varga and Wellmann (2016)  the open-source geologic modeling platform GemPy (de la . This study focuses solely on the step of probabilistic geomodeling based on 3D geologic modeling inputs, leveraging only the Monte Carlo sampling capabilities of PyMC3. Simulating samples from spherical distributions requires dedicated algorithms separate from those used for scalar data types, due to the transformation between rectangular and spherical coordinates creating nonuniform areas of angular trend and plunge increments on the unit sphere. Several solutions have been demonstrated to simulate random samples from spherical distributions, which are comprehensively documented in Kent et al. (2018). Simulation algorithms for distributions of the Fisher-Bingham family have been implemented by Papadakis et al. (2018) in an open-source R package Rfast. Open-source tools for statistical simulation in the R and Python environments (including their combined usage through the rypy2 package), provide convenient, welldocumented tools for applying established statistical techniques to novel fields in geoscience.
The algorithm for simulating random points from the Bingham and Kent distributions included in Rfast uses the acceptance-rejection method, inspired by Kent et al. (2013) and Fallaize and Kypraios (2016). The method uses a central angular Gaussian (CAG) distribution as an envelope to approximate the Bingham distribution. The algorithm uses only the first two eigenvalues for identifiability, resulting in the need to use a rotation to align the sampled points with the desired orientation. The rotation of data sampled from spherical distributions to any new set of axes is possible due to the rotation independence of the dispersion of spherical distributions.
Two rotations using the Euler-Rodrigues formula (Dai, 2015) are useful for properly aligning the data simulated using the Rfast algorithms. The first, necessary for the Bingham distribution, interchanges two axes by a rotation of π about an axis defined by the sum of the two axes to be interchanged, k = v 1 + v 2 . The second rotation is necessary with either the Bingham or Kent distribution to correct for the arbitrary alignment of the orientation uncertainty envelope (Fig. 3a), which is not desired when characterizing anisotropic uncertainty in dip angle and dip azimuth of a fault. This occurs in the probabilistic geomodel setup when characterizing the distributions from input parameters directly rather than through eigen-decomposition or maximum likelihood estimation. An effective approach to rotating anisotropic spherical distributions was developed that rotates iteratively by small increments about the input orientation vector until an accuracy threshold on the desired alignment of the e 3 plane is satisfied (Fig. 3b).

Uncertainty assessment
After determining the appropriate probability distributions to use in the probabilistic geomodel, the uncertainty in chosen geologic modeling inputs is quantified based on geologic data and prior knowledge available. This essential step requires a thorough consideration of the types of uncertainty and methods for quantification available for each modeling input. To minimize potentially unwanted bias in the probabilistic geomodel, careful attention must be paid to understanding the geologic nature of the objective and subjective sources of uncertainty affecting each modeling input. Two models were generated for this study from data derived from a historical geologic map in the Rocky Mountains of Colorado, USA (Fig. 4). The first model contains a single fault zone (Fig. 4a) and is analyzed to clearly illustrate the impacts of different geologic modeling inputs and uncertainty parameterizations on a 3D geologic model. The second model contains a network of major and minor fault zones (Fig. 4b) and is provided to demonstrate how the methodology developed can be generalized to more complex models and to discuss the implications of fault zone interactions on geologic model uncertainty. The models follow the workflow illustrated in Fig. 1 to simplify the 3D geometry of fault zones to satisfy the flexibility and automation required for uncertainty propagation.
The geology in the project area consists of uplifted, Precambrian crystalline igneous and metamorphic rocks. The most recent period of tectonic activity occurred from ca. ∼ 70 to 40 Ma. during the Laramide Orogeny, which formed the modern-day Rocky Mountains. During this time, brittle faulting occurred pervasively throughout the study area as part of the regional Loveland-Berthoud Pass fault zone which passes just east of the study site, where it trends NNE for nearly 50 km (Lovering, 1935). A large number of fault zones of varying widths cross through the study area and were mapped at a 1 : 12 000 scale by Robinson et al. (1974). The geologic map (Fig. 4) contains approximate traces and boundaries of fault zones in the study area, showing fault zones with thickness ranging from 5 to 50 ft (∼ 1.5-15 m) as lines and fault zones greater than 50 ft (∼ 15 m) thick as hatched zones. Robinson et al. (1974) provided contoured stereonets of all faults mapped in the area, revealing that the majority of fault zones mapped dip steeply to the eastnortheast. However, specific information on the dip angle of individual faults is missing from the geologic map. In addition to this significant source of orientation uncertainty, the historic map also includes significant geographical uncertainty resulting from georeferencing and drafting errors.
Model realization creation is handled by custom Leapfrog Works back-end support developed for this study to allow for automated updating of geologic modeling parameters from an initial model containing input fault zones. The initial model must be created in Leapfrog using the workflow provided in Fig. 1, and naming conventions for the geologic model, fault zones, polylines and termination surfaces are specified in a user-generated text file. The custom version of Leapfrog Works uses this text file in conjunction with the Leapfrog model and input realization files to automatically step through creation of each geologic model realization based on the simulated data, following the workflow from Fig. 1. Model realizations generated in this manner are automatically evaluated onto a grid of cells defined by the text file for subsequent analysis. The method put forward by Wellmann and Regenauer-Lieb (2012) is implemented to calculate the probability of occurrence of fault zone lithology in each cell. The probability of occurrence is then used to compute information entropy to describe the uncertainty in fault zones in the geologic model. In a binary system (e.g., fault zone vs. intact rock), information entropy is maximal when the probability of occurrence of a fault zone is 50 %, which as discussed in Krajnovich et al. (2020a) can introduce potential for misinterpretation of the geologic model uncertainty envelope if an inappropriate color map is used. The method implemented in Leapfrog can be made available to other researchers on the basis that they contact the developers of Leapfrog (Seequent) independently to inquire about access to the unique functionality (which is built on top of a default Leapfrog installation).
A set of 1000 realizations for each modeling input was propagated into the 3D geologic model independently and compared with the combined geologic model uncertainty assessment from all four modeling inputs (Fig. 5). The significance of orientation and vertical termination depth uncertainty on the combined model uncertainty is clearly apparent at depth, while the uncertainty in the fault zone near the ground surface is dominated by surface trace uncertainty. Uncertainty about the fault zone thickness appears to be largely overshadowed by that of the surface trace, which is considered to be a consequence of the significant georeferencing and drafting errors arising from the use of the historic geologic map (Sect. 6.1).
The following sections provide a detailed description of the methods implemented for parameterizing each geologic modeling input propagated into the model shown in Fig. 5. The input perturbation script, which is compatible with virtually any 3D geologic modeling software, has been developed using open-source code of the R and Python languages and is published at the code and data supplement. It is suggested that the interested reader refers to the script alongside Sect. 4.1-4.4 for exact results, figures and parameters of the probabilistic geomodel setup.

Structural orientation
The structural orientation of a fault zone varies along its surface and is implemented in modeling as a single dip azimuth and dip angle vector applied to the implicit modeling interpolant. Natural variability in the fault surface and error of individual measurements contribute objective uncertainty to the orientation used for modeling (Whitmeyer et al., 2019;Stigsson, 2016). Additional, subjective sources of uncertainty are present that may affect the fault orientation in nonrandom ways , including the application of prior knowledge (e.g., regional structural analysis) or measurement bias from difficulty interpreting fault slip surfaces. Stigsson (2016) shows that the objective uncertainty in measurement imprecision often underestimates the true uncertainty associated with a geologic structure's orientation, suggesting that a more robust characterization of structural uncertainty should consider potential sources of subjective bias. In the example of modeling fault zones from a geologic map (Fig. 4), anisotropic uncertainty untold by any single measurement is present due to the inconsistent information available regarding fault zone dip azimuth and dip angle. An anisotropic Bingham distribution is simulated to capture the uncertainty in the input orientation data from the geologic map. Observed variability in the surface trace and the regional structural analysis provided in Robinson et al. (1974) were used to establish the uncertainty space used in the simulation. Section 6.3 continues the discussion regarding the use of an anisotropic structural orientation uncertainty envelope.
As the dataset from Robinson et al. (1974) lacks specific measurements of fault zone orientations to generate a Bingham distribution directly using the orientation matrix (Fisher et al., 1987), the parameters of the Bingham distribution were assigned manually to generate a distribution covering the expected range of orientations. Varying the parameterization of the Bingham distribution ultimately resulted in determining an appropriate parameterization for the fault zone in the single-fault model using an input orientation of 75 • /123 • (dip/dip azimuth), maximum eigenvalue of λ 1 = 200 (matching the observed azimuth variation in the surface trace) and intermediate eigenvalue of λ 2 = 22.5 (providing an approximate ±20 • dip angle variation).

Surface trace
The surface trace of a fault zone is a polyline along the topography which follows an approximate central fault surface. In ideal conditions, the fault trace would follow the centerline of the fault zone and reach along the whole length of the fault, ending at the fault tip points. However, in reality, the centerline of the fault zone can rarely be determined exactly (Childs et al., 2009). Additionally, when extracting the surface trace from a scanned geologic map, digitization error and geographical errors are likely to be present as well. In the proposed probabilistic geomodel setup, the fault centerline inherits uncertainty from its possible position within the perceived fault zone as well as from the relevant geographical errors.
The uncertainty affecting the surface fault trace results in changes in the trace location and shape. Independent perturbations of the trace's endpoints are applied and linearly propagated along the fault trace to arrive at a smoothly varied location and shape. The three primary sources of uncertainty are quantified using the available information listed in respective order: average fault zone thickness, published metrological studies (Zhong-Zhong, 1995) and approximate geographical error of known landmarks (e.g., mountain tops).
First, a bounded uniform distribution is parameterized to simulate a random direction of perturbation for each trace endpoint due to geographical error (i.e., drafting and georeferencing error). A normal distribution representing the total bound on geographical error is converted to respectivex andŷ components using the directional cosine of the angle sampled from the uniform distribution. This conversion to unit components is used similarly with the fault zone centerline definition uncertainty and digitization uncertainty using the acute angle θ between the orientation of the fault trace with the northing and easting directions. An additional logical check for the strike quadrant of the surface trace is required to implement this approach.
The three individual sources of uncertainty affecting the surface trace endpoint locations are combined into a derived distribution using a deterministic function to determine the total uncertainty affecting the location of each endpoint, given by Eq. (3). P (x|σ centerline , σ dig , σ geo , θ ) = cos(θ ) The average fault zone thickness was used to characterize the fault zone centerline definition uncertainty affecting each surface trace endpoint. The geographical error was calculated to be approximately 40 m based on the average distance measured between known landmarks (e.g., mountain tops) on the geologic map and modern satellite imagery data. For both of these sources of uncertainty, the maximum error range described is treated as a 95 % confidence interval, allowing a normal distribution to be parameterized with a mean of zero and a standard deviation equal to maximum error divided by 3.92. The digitization error for a 1 : 12 000 map was represented by a normal distribution with a standard deviation of 3.666 m based on (Zhong-Zhong, 1995).

Vertical termination depth
In implicit geologic modeling, by default, all faults extend through the entire model domain. To arrive at a more realistic approximation of the 3D fault geometry, a series of surfaces at fixed elevations are defined and used to terminate faults. The depth at which a fault is expected to terminate is based on prior knowledge obtained from past works into approximating the 3D geometry of faults (Walsh and Watterson, 1988;Nicol et al., 1996;Schultz and Fossen, 2001;Torabi et al., 2019a). These works established empirical relationships for fault surface geometry using an aspect ratio of fault length along strike from tip to tip (f length , henceforth length) vs. fault height along dip (f height , henceforth height): Aspect ratio = f length f height . The fault surface aspect ratio, while highly variable (ranging from ∼ 1.5 to 16 in the sedimentary basin rocks studied by Torabi et al., 2019a), has been demonstrated to be an effective measure for quantifying the 3D geometry of fault surfaces. While studies into the aspect ratio of faults in crystalline rock are scarce and typically focus on major thrust faults, a reasonable assumption is that the aspect ratio will lie in the lower range of that for sedimentary basin rocks (Torabi et al., 2019a) due to the lack of mechanical stratigraphy; an aspect ratio of 1 to 5 was assumed to be appropriate for this study.
The vertical termination depth used in modeling inherits uncertainty from the aspect ratio and persistence of the fault trace used. Lacking specific information on the location of fault tip points, an assumption of length must be made using the persistence of a fault trace at the surface as a proxy for the fault length. However, the endpoints of the fault trace may not be the true tip points due to artificial truncation by overburden or lack of outcrop. Lacking a detailed study of fault tip points, this uncertainty can be characterized by averaging the fault trace lengths of a number of faults of the same group (i.e., similar orientation).
Sampling the uncertainty in the fault zone vertical termination depth involves combining multiple probability distributions using a deterministic function to generate an empirically derived probability distribution. In the derived distribution for fault zone vertical termination depth, f length and aspect ratio are characterized as independent probability distributions and combined using a deterministic function based on the empirically derived description of 3D fault surface geometry, z term = f height · sin(θ ); f height = f length Aspect ratio . In this manner, the vertical termination depth (z term ) is calculated by converting the fault height to the vertical height using the average dip angle (θ ) and subtracting this from the average elevation of the fault outcrop (z outcrop ). Sampled vertical termination depths are subsequently discretized onto the predefined termination surfaces created during the initial geologic modeling step (Fig. 1b). The interval of these termination surfaces can be adjusted based on the end-user needs of the geologic model; 50 m intervals were used in this study to balance illustrative quality with model processing time.
f length is characterized as a normal distribution with a mean of zero and a standard deviation of 20 m based on assumed deviance of the surface trace length vs. the true tip-tip fault length. Aspect ratio was characterized using both a uniform and a log-normal distribution, respecting the expected maximum and minimum values of 1 and 5. Section 6.2 explores the impact of these two different parameterizations on the shape of the derived distribution for vertical termination depth.

Fault zone thickness
In the ideal case, a detailed study of fault zone thickness at outcrop provides a measure of the average thickness and its variability. Given the thickness range of 5-50 ft (1.5-15 m) on fault zones mapped as lines in Fig. 4, a conservative estimate on fault zone thickness in the case study project area was characterized using a normal distribution with µ = 30 m and σ = 2 m.
While not directly applied in the model shown in Fig. 5, when lacking direct observations, fault zone thickness can also be approximated based on prior knowledge of a fault's estimated displacement using an established displacement to thickness (D : T) relationship appropriate for the project's geology (Torabi et al., 2019b;Childs et al., 2009). This approach would inherit uncertainty from the subjective interpretation of the fault's historical displacement and the empirical derivation of the fault's D : T relationship. With a mea-sure of displacement, a power-fit relationship allows for approximating the fault zone thickness. The input perturbation script includes functions for exploring the use of the D : T relationship based on a linearized power-law function provided by Torabi et al. (2019b), with curve-fitting parameters log 10 (b) and m to approximate fault zone thickness (f t ) from displacement (f d ), and a modifier f CoreVsZone used to model different sections of the fault zone based on the work by Childs et al. (2009). Uncertainty arises in the parameters of the D : T power-fit relationship, log 10 (b) and m and the assumed fault displacement.

Simulation quality assessment
The quality of the probabilistic simulation is a product of the size of the uncertainty space, the simulation method used and the number of samples drawn. For any simulation, the realizations generated can be plotted in the data space and visually examined for appropriate coverage and shape (termed a realization plot). For scalar data types, histograms of the Monte Carlo draws provide an intuitive method for independently assessing the quality of simulation for each input. Visual analysis of the shape of the histogram compared to the expected shape of the distribution and a comparison between the input distribution parameters (e.g., mean and standard deviation for a normal distribution) and their values calculated from the samples can quickly determine whether the samples drawn have sufficiently explored the uncertainty space. Figure 6 shows an example of the realization plot sample histograms generated for the simulation of vertical termination depths from Sect. 4.3. This figure allowed the identification of a strong tailing behavior in the output realizations, leading to a reparameterization discussed in Sect. 6.2.
For spherical data simulations, histograms may be replaced by exponential Kamb contouring (Vollmer, 1995) or Rose diagrams to visualize the density of sampled poles across the surface of the unit sphere (as projected onto a lower-hemisphere projection). This visual assessment provides a semiquantitative evaluation of the shape and distribution of the sampled structural orientations. Additionally, a recalculation of the eigenvector decomposition from the set of simulated samples provides a measure of the accuracy of the distribution of samples with respect to the input parameter values. Tools for generating figures for simulation quality assessment are provided and detailed in the input perturbation script.
Based on the assessment of simulation quality and consideration of compounding factors during uncertainty propagation, probabilistic geomodeling of the single-fault model was run for a number of various realization counts (100, 300, 500, 1000, 2000 and 3000). The processing time generally increases linearly with realization count, reaching many hours to several days for high realization counts on the single-fault mock model containing 2.5 million cells. The vast majority of processing time is taken up by the model updating and block model calculation in Leapfrog. For the single-fault mock model with 1000 realizations and 2.5 million cells the sampling benchmark time was 87 s, while the model processing benchmark time was 38.5 h. This study is intended to introduce and expand on the use of probabilistic geomodeling for specific geologic modeling problems, and work regarding optimizing the efficiency of model processing is not a focus. The experiments conducted do highlight the need to understand (i) the realization requirement for exploring modeling inputs independently and its relationship to the size of the independent uncertainty spaces, (ii) the interactions of various related parameters during the uncertainty propagation step, and (iii) identification of a balance between final model resolution, coverage, complexity, and processing time.

Fault network model
The fault network model shown in Fig. 4b contains a major fault zone (thickness = ∼ 175 m) with five minor fault zones (thickness ≤ 16 m) branching out of it. The parameterization of modeling inputs was conducted following the methods described in Sect. 4. The input orientations of each fault were determined based on the measured surface trace strike and a dip angle assigned initially by a static random value based on the data published by Robinson et al. (1974). The resultant uncertainty model is shown in Fig. 7, with cross sections highlighting the intersection of three minor fault zones with the major fault zone. The modeling formulation scaled effectively to the more complex fault network model, which included two deterministic horizontal terminations of minor faults into the major fault zone. Focusing on the intersection of fault zones reveals the interaction between fault zone uncertainty envelopes. In Fig. 7, the slice (a) shows the uncertainty envelope of the major fault zone being deflected by a zone of lower entropy as the nearby minor fault zone approaches it. Slices (b) and (c) show the uncertainty envelopes of major and minor fault zones merging, showing how the deflection from (a) transfers into a hotspot of high uncertainty at the peak intersection point (overlapping boundaries). Just as the uncertainty in a single fault zone is maximal at its boundaries, the overlapping of two fault zone boundaries produces an entropy hotspot.

Historic dataset sensitivity
The observed sensitivity of the model uncertainty to the surface trace perturbation may be due to the relative significance of geographical sources of uncertainty in the case of modeling from a historic geologic map. These errors include inherent error in the geologic map stemming from its accuracy and georeferencing errors arising from the conversion between geographical and projected coordinate systems. Georeferencing typically uses a rubber-sheet method, moving, rotating and stretching the map -in an ArcGIS platformto optimally translate from the geographical coordinate system of the map (lat-long) to the projected coordinate system required by modeling (UTM northing and easting), and the error typically reported by these methods may underestimate the true error associated with the georeferenced map. Quantifying the true geographical error is difficult but can be approximated by comparing the location of known landmarks (e.g., intersection of roads, mountain tops, corners of buildings) between the georeferenced map and modern, digital datasets. From Fig. 4, a maximum bound on the error in the location of known features across the geologic map was approximated to be 40 m by visual examination of the georeferenced map. While the same methodology would apply to a surface trace obtained from modern mapping methods (e.g., global positioning systems), it is apparent that the contribution to model uncertainty could be drastically reduced.
Reapplying the methodology to a modern dataset may highlight different sensitivity.

Model reparameterization
Based on the inherent variability in fault aspect ratios, initially a bounded uniform distribution was determined to be appropriate (Fig. 6a). However, the empirically derived distribution of vertical termination depths resulting from a bounded uniform parameterization of fault aspect ratio showed a strong tailing effect (right skewed). This could be interpreted as being unrealistic, arising as an artifact due to the shortening of vertical termination depth intervals for equivalent increases in fault aspect ratio. A reparameterization using a custom log-normal distribution of aspect ratio respecting similar maximum bounds of 1 and 5 is illustrated in Fig. 8b, showing a significant reduction in the tailing effect present in the vertical termination depth realizations. This reparameterization highlights the key strength -and susceptibility -of probabilistic geomodeling based on geologic modeling inputs, the reliance on a user-defined characterization of input uncertainty. Again, it is necessary to reiterate that the modeler must take into consideration not only field observations and theoretical prior knowledge when assessing a geologic modeling uncertainty formulation but also their informed expectation of what is geologically realistic for their chosen modeling problem. Furthermore, reparameterizing individual aspects of the probabilistic model may prove to be insufficient due to the presence of inherently unknown relationships between the chosen model parameters.

Parameter relationships
While many probabilistic geomodeling approaches -including this study -have focused on independent perturbations of geologic modeling inputs, relationships between modeling inputs are apparent in the modeling formulation developed. Intuitively, there is a relationship between the average orientation of a fault's surface trace and the structural orientation applied to the fault surface modeling interpolant's global trend. Deviations in the modeled fault surface from the input orientations can occur when the two inputs are significantly different, typically arising in Leapfrog Works by way of overestimation of fault surface dip by up to 10 • when the surface trace azimuth (i.e., average normal to the fault surface trace) and global trend dip azimuth differed by greater than 20 • (Krajnovich et al., 2020a). Comparing two geologic models generated with orientations sampled from anisotropic and isotropic Bingham distributions with equivalent maximum uncertainty ranges (Fig. 9) showed a skewing of the geologic model uncertainty envelope when generated from the isotropic Bingham distribution. The results show a consistent skew towards near-vertical dips of the modeled fault zone realizations. This issue is alleviated when the structural orientation is parameterized with an anisotropic Bingham distribution, allowing for increased variability in the dip angle without compromising the certainty of the dip azimuth.
The observed interaction between the surface trace and structural orientation inputs to the implicit 3D geologic model suggests that coupling and correlation may be present between different geologic modeling inputs.
Despite performing a thorough exploration of each, independent parameter's uncertainty during Monte Carlo sampling (Sect. 4.5), undersampling of the combined geologic model uncertainty space can still occur during uncertainty propagation. An example of this arises when considering the vertical termination depth and structural orientation. Truncation of fault zone realizations at any given termination interval effectively reduces the number of realizations available for sampling the full range of structural orientation uncertainty at deeper intervals. This is evidenced in Fig. 5b by the increasing prevalence of "stair-stepping" artifacts in the combined model uncertainty with depth.
Coupling of the surface trace and structural orientation or structural orientation and vertical termination depth is not implemented in the current formulation due to inconsistencies in the sampling methods, which required independent sampling of spherical and nonspherical distributions. In-  . Visual comparison of geologic model entropy generated using (a) anisotropic and (b) isotropic Bingham distributions. The entropy difference between these two models in (c) highlights slight skewing of the uncertainty envelope towards steeper dipping fault zones when parameterized using samples from an isotropic Bingham distribution (d). Entropy difference is shown such that negative values indicate higher entropy in the isotropic orientation model. creasing the number of model realizations mitigated some of the effects of input correlations, though this comes at the expense of increased processing time. A treatment of these relationships through parameterizing previously independent input probability distributions using a joint distribution (and an appropriate sampling scheme) could potentially generate more realistic and efficient assessments of model uncertainty. While the correlation between parameters in a probabilistic model is typically unknown, in some cases -such as the correlation between fault trace azimuth and structural orienta-tion azimuth -the correlation can be assumed based on available prior geologic knowledge of the parameter's real-world relationships. However, until standard methods for simulating spherical and scalar data types in a single system are made available, creative model parameterizations -such as constraining the dip azimuth uncertainty to observed surface trace variability using an anisotropic spherical distributioncan circumvent some of the issues associated with these relationships.

1470
A. Krajnovich et al.: Uncertainty assessment for 3D geologic modeling of fault zones

Anisotropic spherical distributions
For modeling anisotropic uncertainty in structural orientation data, the Bingham and Kent distributions generate practically equivalent uncertainty envelopes, differing primarily in their parameterization. For the Bingham distribution, variations in the magnitudes of λ 1 and λ 2 allow for independently varying the size of the uncertainty space in orthogonal directions, while for the Kent distribution variations in the values of κ and β can generate distributions with different overall size and ellipticity. To achieve independent uncertainty ranges of the dip angle and dip azimuth, both distributions require a series of rotations (described in Sect. 3.2) to properly align the distribution such that the major ellipse axis is aligned with a great circle with 90 • dip (Fig. 3). Once aligned, the range of the Bingham distribution in the dip azimuth and dip angle directions can easily be varied independently through direct changes in the magnitudes of λ 1 and λ 2 . The Kent distribution, however, introduced difficulty in varying these uncertainties independently due to coupling of the κ and β parameters. Compared to the Kent distribution, the Bingham distribution was observed to be more efficient at modeling strongly girdle-shaped distributions, which as discussed in Sect. 4.1 can be particularly useful for the limited data available from a geologic map. For these reasons, the Bingham distribution was chosen for the analysis in this study, though methods for simulating and rotating the Kent distribution are still provided for thoroughness.

Additional complexity for fault zone geometry
The authors acknowledge that in reality fault zone geometry includes horizontal terminations. The mechanisms affecting the location of horizontal terminations are numerous and varied, including fault zone anastomosing, abrupt termination in intact rock and false trace termination due to obscuring by overburden. While several works have investigated the nature of fault-fault terminations using stochastic simulation (Aydin and Caers, 2017;Cherpeau and Caumon, 2015), due to the presence of other poorly defined sources of uncertainty, the placement of horizontal terminations in this study remained deterministic. Future work supplementing the limited dataset used in this study with detailed outcrop studies will be required for defining the nature and uncertainty in horizontal fault zone terminations.
Aside from horizontal fault terminations, fault zones present other interesting complexities that introduce additional levels of refinement for the developed modeling workflow. For example, internal fault zone composition is heterogeneous, and modeling the different fault zone components (fault core, transition zone and damage zone) could be implemented using the developed methods if desired. Asymmetry of fault zone structure between the hanging wall and footwall has also been documented by Choi et al. (2016), which would require a new method of defining the distance function for generating the 3D fault zone volume. Similarly, variations in fault zone thickness along the area of the central fault surface are realistic, though would similarly require a new method for defining the distance function. These research questions enter into the realm of implicit geologic modeling theory and are introduced merely to shed light on where refinements in model creation can benefit the modeling of fault zone structure.

Conclusions
The flexible probabilistic geomodeling method should continue to be leveraged to model novel problems in geologic modeling, such as the uncertainty in fault zones in 3D geologic models based on limited data from a historic geologic map and available prior knowledge. The setup of the probabilistic geomodel should make full use of open-source statistical packages in the R and Python languages, many of which are experiencing ongoing development at the time of writing. While a unique modeling case was presented, the rationale applied to assessing uncertainty in poorly constrained, preliminary geologic models sheds light on new ways to implement varied types of information in probabilistic geomodeling formulations.
Practical guidelines for developing probabilistic geomodels are also provided, reinforcing (i) the importance of developing a clear yet robust modeling workflow for the structure of interest, (ii) consideration of varied sources of geologic uncertainty and (iii) creatively exploring the methods available for characterizing both objective and subjective modeling inputs. Prior knowledge in the form of established empirical and theoretical relationships from structural geology present an opportunity to quantify and parameterize geologic modeling inputs that are usually interpretive, allowing for their inclusion in the probabilistic geomodel. The Bingham distribution, while only moderately impactful on model uncertainty when comparing anisotropic and isotropic parameterizations, is recommended to replace the vMF distribution for modeling structural orientations due to the increased flexibility of its parameterization. While the Bingham distribution was preferred in this study, the use of the Kent distribution appears to be practically equivalent.
Future work stemming from this input-based probabilistic geomodeling formulation may include incorporating new information in a Bayesian inference scheme to further refine the geologic model, by following the methodology introduced either by de la Varga and Wellmann (2016) to infer additional information about the model parameters (requiring a full integration of the probabilistic modeling with the automated geologic modeling approach) or by Schneeberger et al. (2017) to validate the initial model in light of its generated uncertainty. Additionally, refining the modeling of fault zone internal structure and variability is recommended not only to further the usefulness of 3D geologic models in prac-tical applications (e.g., subsurface construction, fluid flow) but also to expand the understanding of the geometry and characteristics of these complex geologic structures.
Code and data availability. The necessary code and data for generating realizations of the geologic modeling inputs (including example results) is available at https://github.com/ajkran2/ Geologic-Model-Input-Uncertainty-Characterization (last access: