Interpretative modelling of a geological cross section from boreholes : sources of uncertainty and their quantification

We conducted a designed experiment to quantify sources of uncertainty in geologists’ interpretations of a geological cross section. A group of 28 geologists participated in the experiment. Each interpreted borehole record included up to three Palaeogene bedrock units, including the target unit for the experiment: the London Clay. The set of boreholes was divided into batches from which validation boreholes had been withheld; as a result, we obtained 129 point comparisons between the interpreted elevation of the base of the London Clay and its observed elevation in a borehole not used for that particular interpretation. Analysis of the results showed good general agreement between the observed and interpreted elevations, with no evidence of systematic bias. Between-site variation of the interpretation error was spatially correlated, and the variance appeared to be stationary. The between-geologist component of variance was smaller overall, and depended on the distance to the nearest borehole. There was also evidence that the between-geologist variance depends on the degree of experience of the individual. We used the statistical model of interpretation error to compute confidence intervals for any one interpretation of the base of the London Clay on the cross section, and to provide uncertainty measures for decision support in a hypothetical routeplanning process. The statistical model could also be used to quantify error propagation in a full 3-D geological model produced from interpreted cross sections.


Introduction
Three-dimensional (3-D) models are now the state of the art for presenting geologists' knowledge and interpretation of subsurface structures, and are supplied to varied users of ge-ological information.There is no single methodology for the production of models, and the method will reflect the geological setting and the nature of the information available to the modeller, which may include geophysical imagery, boreholes and surface observations.Models can be produced by geostatistical interpolation (e.g.Lark and Webster, 2006) or by a combination of geostatistical methods with expert intervention to ensure geologically realistic results (e.g.Gunnink et al., 2013).Models may also be based on inversions of geophysical data, constrained by geological knowledge and interpretation (Jessell et al., 2010).The approach of particular interest here is based on expert interpretation of boreholes along interlocking sets of cross sections with subsequent interpolation from the interpreted sections to produce models of volumes in 3-D.This is exemplified by the GSI3D software (Kessler and Mathers, 2004;Kessler et al., 2009).Expert interpretation of a cross section entails the interpretation of boreholes and the sequential construction of the basal contact of each geological unit in the stack.This process depends on the expert interpretation of boreholes in line with rules, explicit or tacit, which control the shapes of surfaces and the circumstances in which faults must be invoked to explain their observed positions.Because these rules encapsulate geological knowledge, they provide a sound basis for modelling, particularly when limited observations are available.However, the interpretation of the cross sections inevitably has an attendant uncertainty, and this is propagated when the interpreted cross sections are combined to model volumes in 3-D by interpolation.
The uncertainty in a 3-D model is of interest to data users who will apply it for decision making.For this reason, there has been considerable interest in the development of quantitative or semi-quantitative operational methods to characterise the uncertainty in 3-D models and the variation of this uncertainty in space (e.g.Lelliott et al., 2009).
If information in 3-D is produced by geostatistical interpolation, then the uncertainty can be quantified directly on the basis of the geostatistical model (Lark and Webster, 2006).In the study reported by Gunnink et al. (2013), the geostatistical predictions were modified to ensure geological consistency.The original geostatistical measures of uncertainty no longer hold for the modified values, so Gunnink et al. (2013) used a cross-validation method to quantify uncertainty.However, this is feasible only if many borehole observations are available.Bistacchi et al. (2008) present a case study where the uncertainty in the modelled position of planar surfaces in the 3-D space could be computed from information about the uncertainty of the angular observations on which the model was based, and the distance over which these observations were projected.Tacher et al. (2006) used the simple kriging variance as a measure of uncertainty for the position of modelled geological surfaces, the parameters of the variogram being informally elicited to reflect expert judgement about uncertainty and its spatial dependence.In many cases 3-D modelling is supported by interpretation of geophysical data.Bond et al. (2007) and Torvela and Bond (2011) examine the uncertainties in expert interpretation of seismic imagery, and particularly how uncertainties in the conceptual geological model which underly the interpretations contribute to the final uncertainty.Aitken et al. (2013) discuss a measure of "data richness" to quantify the extent to which the geological interpretability of geophysical data, the complexity of these data and their quality determine the uncertainty of resulting models, and the variation of this uncertainty in space.
In this paper, we are particularly interested in the uncertainty of models produced by the cross-section interpretation methodology.Lark et al. (2013) made a direct empirical assessment of the quality of one such model in a designed experiment.They compared the predicted heights of units with observed heights at a set of validation boreholes.This gave a quantitative measure of uncertainty.However, Lark et al. (2013) concluded that it is necessary to understand how error enters into the initial interpretation of cross sections prior to interpolation, since the errors in the cross sections may be predictable from factors such as the distance to boreholes or crop lines, but propagated into the 3-D model by the interpolation step in a complex way.If we can understand and quantify the uncertainty in cross-section interpretation, then it may be possible to develop quantitative models of the uncertainty for different "benchmark" geological settings, and to use this to develop uncertainty measures for application to geological models.
To this end, we undertook, and report here, an experiment to study the error in cross-section interpretation, hypothesizing that the variability of the interpretation error changes along the section in ways that can be described by a statistical model.We considered statistical models in which the variance of the interpretation error at some location depends on two factors.The first factor was the distance from the location to the nearest borehole available to support the interpretation of the cross section.Our hypothesis was that the variance of interpretation error would increase with the distance to the nearest available borehole.The second factor was the experience of the geologist making the interpretation; our hypothesis was that the variance of interpretation error would diminish with increasing geologist experience.
If our hypothesis is verified, then we could compute confidence intervals for the interpreted height of a contact along a cross section, and model how this uncertainty may propagate in the subsequent interpolation from the interpreted cross section into a 3-D geological model.If statistical models of the uncertainty in cross-section interpretation could be estimated for a variety of geological settings, then these could be used to compute uncertainty measures for new geological models, and so to calculate, for example, decision-theoretical measures of the value of the model information (Howard, 1966) or other criteria by which model users can make rational decisions that account for model uncertainty.

Geological context of the cross section
This study is based on an 8 km cross section in London which roughly follows the A12 road from Hackney northeast across the Lea Valley to Wanstead.The local geology (Fig. 1) consists of Quaternary deposits comprising alluvium along the valleys of the rivers Lea and Roding, with river terrace deposits at several levels beneath and flanking the alluvium and capping the low interfluvial ridge.
The Quaternary deposits are generally less than 5 m in total thickness, except beneath the Lea Valley, where up to 10 m are encountered.They rest everywhere on Palaeogene bedrock units.In order of increasing age and depth, these are the London Clay Formation, the Lambeth Group and the Thanet Formation.The Quaternary deposits rest on the London Clay Formation along part of the section, but cut down beneath the Lea Valley into the underlying Lambeth Group (Fig. 1).The Palaeogene deposits are underlain by the Chalk Group (Upper Cretaceous), which is several hundred metres thick and is the lowest unit considered that is encountered here in approximately 10 % of the 143 available boreholes along the cross section (Fig. 1).
The Palaeogene strata in this region are affected by the Alpine Orogeny, and underwent gentle folding, faulting and tilting in Oligocene-Miocene times (Sumbler, 1996).In this study, our interest lies in the definition of the base of the London Clay Formation.In the London area, the London Clay comprises a grey marine silty clay with thin interbeds of sandy clay, sand and pebble beds (Ellison et al., 2004).The whole sequence locally exceeds 100 m in thickness, although due to erosion, considerably less is preserved along the line of the cross section discussed here.It rests conformably on the Lambeth Group, which consists of about 15 m of interbedded colour mottled clays, sands and silts arranged in a complex and variable vertical sequence of facies (Ellison, 1983).The London Clay Formation is present over about 60 % of the length of the cross section, and the base of this unit, the surface of interest, is proven by 51 of the boreholes.Along the section, the elevation of the base of the London Clay Formation observed in the boreholes varies from nearly 10 m to −13 m relative to the Ordnance Datum (Fig. 1).
In the London area, the London Clay Formation is a relatively thick firm clay without significant water flow, and it is therefore regarded as a good medium for tunnels and excavations (Ellison et al., 2004).The Lambeth Group, by contrast, contains thin layers of alternating clay and water bearing soft sands and silts, and the clays are also characterised by a strong propensity to shrink-swell during cycles of wetting and drying.As such, it consists of a very difficult medium for excavation and tunnelling, and is best avoided wherever this is possible (Ellison et al., 2004).As the cross section demon-strates, parts of London are underlain at a few tens of metres in depth by these two units (Fig. 1).Hence, the position of the base of the London Clay Formation is critical, as it separates these two units of radically different engineering behaviour, and the measures of uncertainty derived in this study have considerable potential for application in this context.

Data subsetting, geologists' self-assessment, and modelling
The key idea of the experiment was that each of a set of participating geologists would make an interpretation of the three Palaeogene bedrock units on the cross section, drawing continuous (if occasionally interrupted) basal contacts of the units as interpretations of the information in a set of boreholes.Any one participant would use a subset of all available boreholes, so that their interpretation could be compared directly with each of a complementary validation subset.The difference between the interpreted and observed elevations of the base of the London Clay, the cross-section error, would then be treated as a variable for statistical analysis to identify important features of its variability.Note that, while we only examined the base of the London Clay, the participants interpreted this in the wider stratigraphical context by also drawing the bases of the other Palaeogene units.
The 51 available boreholes which prove the base of the London Clay were subdivided by independent random sampling without replacement into ten non-overlapping subsets of five validation boreholes.We call each of these subsets a validation batch; each is paired with its corresponding interpretation batch -the complementary subset of 46 boreholes.In this way, ten different although overlapping interpretation batches, each with 46 boreholes which proved the base of the London Clay, were prepared for use by geologists in the experiment.Any one participant would use just one interpretation batch.His or her interpretation of the cross section could then be compared with the five boreholes in the corresponding validation batch, boreholes not used in the interpretation of the cross section, to generate five observations of crosssection error.
A total of 28 geologists participated in the experiment.Of these, 22 were delegates at the GSI3D workshop which took place at the British Geological Survey (BGS), Keyworth, from 17 to 18 October 2012, and the GSI3D software was used for the experiment.Some of the workshop participants were staff of BGS, others were geologists from a variety of organisations and countries, with varying levels of experience in geological modelling, but all with some interest and experience, if rudimentary, in the use of the GSI3D software, which was used for this experiment by all participants.The remaining six geologists were BGS staff who participated in the experiment after the workshop.
Each participant was asked to complete a questionnaire before undertaking the exercise.Their unique number was recorded on the form.They had the option of recording their name and contact details on the form, or of remaininganonymous.In the questionnaire, each participant was asked to record a self-assessment of their experience of geological modelling in 3-D by identifying the most appropriate of four general descriptions.The descriptions and responses are presented in Table 1.Note that there was some variation in experience among the participants: two were novices in 3-D modelling, and eight had limited experience.This allows us to quantify the effect of increasing experience on the variability of interpretation error.
The key principle of the experiment was explained to all delegates, who were also provided with an explanation of the units in the cross section.Each participant in the experiment, on presenting at the workstations, was given a unique number, and an interpretation batch of boreholes.In addition to the boreholes, a standard interpretation of the superficial material (as a single unit) was provided, so that all participants were working on a common rockhead surface.The intersections of outcrops, as mapped in 2-D, with the cross section were also provided to all participants.A set of guidance notes on the GSI3D software was available, and at all times a staff member experienced with the software was available to help.When the interpretation was complete it was saved with a code which indicated the participant's unique number and the number of the interpretation batch and complementary validation batch of boreholes which had been allocated.As each geologist presented to participate, they were allocated one of the interpretation batches of boreholes, so that a more or less even distribution of participants over batches was achieved.
Once each geologist had completed and saved their interpretation, this was compared with the corresponding batch of validation boreholes, and the observed and interpreted elevation of the base of the London Clay was extracted.One modeller's interpretation was not correctly saved, so this was lost, and in some cases the London Clay was not present in the interpretation at the location of a validation borehole.Over all validation batches, we were able to make a total of 129 comparisons between an interpreted elevation of the base of the London Clay at the location of a borehole in a validation batch observed elevation in that validation borehole (i.e. in a borehole which had not been available to the geologist who made the particular interpretation).As described in Sect.3.1 below, and formalised in Eq. ( 1), an observation of interpretation error is the difference between the interpreted and observed elevation of the base of the London Clay for one such comparison.Between 10 and 20 interpretation errors could be calculated for any validation batch.

Overview of models and analyses
This section provides an overview of the analyses undertaken to test our hypothesis, avoiding the statistical detail.The reader will find technical information about the statistical models and their estimation in Sects 3.2-3.3,and these can be ignored by the reader who requires only a summary of the statistical methods.Section 3.4 explains how the selected statistical model for cross-section errors was interrogated to represent the cross-section uncertainty with confidence intervals and an analysis of the implications of this uncertainty for a hypothetical application.
As reported in the previous section, the experimental results consist of a set of 129 comparisons between the interpreted and observed elevations of the base of the London Clay, where each interpretation in the set had been made without access to that particular observation.The variable for statistical analysis is the cross-section error, obtained for each of the 129 comparisons by subtracting the interpreted elevation of the base from the observed elevation.An error of zero therefore means that the observed and interpreted elevations were the same in the particular comparison.A negative error means that the interpreted base was higher than the observed base in that comparison.
The statistical analysis of these values was done with linear mixed models.These treat the cross-section errors as a combination of a fixed effect (here a constant, the mean cross-section error) with random effects.The random effects represent sources of variation in the observed errors, and here account for differences between batches of validation boreholes (are the mean errors for the different batches significantly different?), between the sites of validation boreholes within batches (are the mean errors for different locations within each batch significantly different from each other?), and between the geologists.The means of the random effects are zero; their variances are interesting because they quantify the uncertainty introduced into the interpretation of the cross section by the factors which they represent (differences between modellers, differences between locations).In some of the more complex models, we used the variance of a random effect that was modelled as a function of some covariate.For example, in one case, the variance of the effect of location was modelled as a function of the distance from the location to the nearest borehole available for interpretation (i.e. the nearest borehole in the interpretation batch allocated to the particular geologist).Such models could be used to predict how interpretation uncertainty varies along a cross section.
We considered seven linear mixed models which were fitted in order, so in some cases a statistical inference about one model (i.e.showing that a particular random effect was not significant) determined the form of subsequent models (that effect was dropped).
The random effects which we considered can be defined with respect to two properties.The first is dependency.If a random effect is independent, then the value that it takes for one instance tells us nothing about the value that it takes in other instances.In the first model, 1a, the random effect that models differences between batches was independent, because the batches were formed by independent random sampling.In other models, a random effect may not be independent, but may have a correlation structure.In all models, the random effect that models differences between sites had a spatial correlation structure: one might expect cross-section errors at two nearby sites to be more similar than errors at two sites which are far apart.In models 1a and 1b, the random effect which accounts for variability of geologist interpretations was independent within any site (the effect for one geologist is independent of the effect for another), but the cross-section errors for any one geologist at different sites were modelled as correlated (a geologist who tends to interpret the base too high at one site might make a similar error at other sites).
The second property of random effects is stationarity in the variance (stationarity hereafter).A stationary random effect has a constant variance.However, the variance of a non-stationary random effect may be modelled as a variable which depends on some other factor.For example, in model 2a, the variance of the geologist random effect de-pends on the level of experience that each geologist recorded in the questionnaire (Table 1).
Table 2 summarises the differences between the models.Mode 1a is a general one in which there are stationary random effects for batch, site and geologist differences.The batch effect is also independent, the site effect is spatially correlated (as in all models) and the geologist effect shows correlation between errors made by the same geologist.Models 1b and 1c were fitted to test, respectively, whether the variance of the batch effect could be assumed to be zero and whether the geologist random effect could be modelled as independent.The final model in group 1, 1d, was meant to see whether the variance of the site effect was non-stationary, depending on the distance to the nearest available borehole.
In all the models in a second group of three, the batch effect was dropped, and the site effect was spatially correlated and stationary.The geologist effect was independent, but we considered non-stationary alternatives in which the variance depended on (2a) the distance to the nearest borehole available for interpretation, (2b) modeller self-identified experience, and (2c) both these factors.
We compared models in two ways (details in Sect.3.2).In some cases, it was possible to compare models by a loglikelihood ratio statistic L. These are presented in Tables 4  and 5 for comparisons where they can be made.In each case the compared models are indicated and the statistic presents the strength of evidence for the effect of additional terms in the more complex model.The recorded p value is the probability of finding evidence as strong or stronger than the value of L if the simpler model were true.If p is larger than 0.05, we retain the simpler model.Not all models can be compared this way, and where the log-likelihood ratio statistic could not be used, we compared models by Akaike's information criterion (AIC, details in Sect.3.3).In any comparison, the model for which AIC is smallest was selected.The AIC is not a formal significance test, but by selecting the model with smaller AIC, one minimises the expected information loss through the selection decision (Verbeke and Molenberghs, 2000).A summary of the key comparisons between models, and the inferences made from them, is provided in Table 6.

Statistical methodology: linear mixed models, the general model (1a), and three variants
The results from this experiment were analysed by the fitting and comparison of linear mixed models (LMM) (Verbeke and Molenberghs, 2000) for the cross-section errors.One observation of cross-section error corresponds to a particular geologist's interpretation at one of the sites in the validation batch corresponding to the interpretation batch to which that geologist had been allocated.The interpretation at that site had therefore been made without access to the information in the borehole there.This gives us a total of N = 129 observations of cross-section error.k within batch i is z s (b i , s k , g m ), and the corresponding observed elevation in the validation borehole is z o (b i , s k ), then the correspondingobservation of cross-section error is defined as (1) A negative error therefore means that the geologist's interpretation is higher than the observed elevation of the base of the London Clay.
The fixed effect in all LMM that were considered here was the mean cross-section error.The random effects modelled the contribution of differences between batches, differences between sites and differences between geologists.In an LMM, the random effects are modelled as Gaussian random variables with mean zero and a variance.The variance may be stationary, a parameter of the LMM, or it may be a variable expressed as a parametric function of some covariate with parameters to be estimated (e.g.Nelder and Lee, 1998;Lark, 2009).The random effects and their parameters are of interest because they may be informative about sources of cross-section error, and allow us to predict cross-section error variance in similar settings.Once an appropriate model for the random effects has been selected, one may use generalised least squares to estimate the overall mean model error and test whether it appears to be significantly different from zero.
Model 1a takes the following form for a set of observations of cross-section error in a vector ε of length N: where M is an N × p design matrix that associates each observation of cross-section error in ε with a value of a fixed effect variable, contained in the vector α of length p.In all models considered in this paper, the fixed effect is a constant, the mean cross-section error, so p = 1, α contains the mean and M is an N × 1 vector of ones.Other terms in the model are explained in the following paragraphs.
The matrix X b is an N ×N b design matrix for the betweenbatch random effect where N b is the number of batches.Row n of X b corresponds to the nth observation of cross-section error.If the nth observation of cross-section error belongs to the mth batch out of N b , then the element in column m of row n of X b is one, and all other elements in the row are zero.The vector β b is an N b × 1 vector which contains the mean errors for the batches, which are treated as random variables.One may write down an expression for the covariance matrix of the N between-batch components of the observed crosssection errors, C b .
Because the sites are randomly allocated to batches, it is assumed that the batch effects are independent, and so where σ 2 b is the variance of the batch effect, and R b denotes the correlation matrix of batch effects which is obtained, given the assumptions of independence, as the product of the batch design matrix with its transpose (denoted by the superscript T).
The term X s is an N × N s design matrix which associates each of the N observations of cross-section error with one of the N s validation sites.These sites are not assumed to be independent of each other, since they were chosen by purposive sampling, and not by an independent random sampling design.Since the sampling design does not allow us to treat the between-site effect as an independent random variable, we rather invoke a random statistical model of the betweensite effect ( de Gruijter et al., 2006).The random variable, which is contained in the length-N s random vector Z s is assumed to be N s -variate Gaussian with mean zero and N s ×N s covariance matrix S: where 0 N s is a vector length N s of zeroes.We assume that Z s is a second-order stationary random variable, so that the covariance of the values at any two sites depends only on the interval in space between those sites (Stein, 1999).Here we use a standard covariance function from geostatistics, the Matérn function (Matérn, 1986).Under this model, the covariance between two locations separated by distance d is where K κ (•) is a modified Bessel function of order κ, κ is a smoothness parameter -see Diggle and Ribeiro (2007) for a discussionφ is a distance parameter, and c 0 and c 1 are, respectively, the spatially uncorrelated and correlated components of variance of the between-site variable.Note that, while in principle, the covariance can be modelled as a function of the direction as well as the length of the separation vector between locations, when our observations of crosssection error are aligned on an almost-straight cross section, we consider distance only.
If the distance between site k in batch i and site l in batch j is denoted by d {i,k},{j,l} , then one may compute a betweensite covariance matrix S, which is an N s × N s matrix.If the rth out of N s sites is site k in batch i, and the cth out of N s sites is site l in batch j , then and the N × N between-sites effect covariance matrix for the LMM for all N observations of cross-section error is given by where X s is the N × N s design matrix for sites.Given the site design matrix, and the distances among the observations of cross-section error, this covariance matrix is determined by the four parameters of the Matérn covariance function: c 0 , c 1 , κ and φ.
The geologist effect in model 1a, the term η g in Eq.
(2), is somewhat more complex.At each site within a batch, a cross-section error is observed for each geologist who was allocated the corresponding batch of boreholes.The term η g is the difference between the cross-section error for a particular geologist at a particular site, and the mean cross-section error at that site.It is therefore the between-geologist withinsite effect, but we call it the geologist effect for brevity.
If each geologist had one and only one validation borehole, then the geologist effect would be simply nested within sites as an independent random error (regardless of whether there was one or more observations of cross-section error at each validation site).However, in the current experiment, each of the geologists was allocated all validation boreholes in a particular batch, and so we must choose an appropriate statistical model for the between-geologist effect observed at each of a set of boreholes.In model 1a, we treat the geologist effects as correlated random variables within batches.If we denote by ε (b i , s k ) the mean cross-section error at site k in batch i, the geologist effect for geologist m at the same site is In the random effects component of model 1a, we assume that the correlation of the geologist effects is In words, the geologist effects for observations of crosssection error at two different sites are uncorrelated if the geologists are different (which includes all between-batch comparisons), and have a correlation of ρ if the geologist is the same.The covariance matrix for the geologist effect in model 1a is therefore where R g is an N × N correlation matrix of geologist effects with values 1 on the main diagonal, ρ on off-diagonal elements which correspond to pairs of cross-section errors corresponding to the same geologist, and zero in all other elements.The variance of the between-geologist effect is σ 2 g .
The random effects of the model in Eq. ( 2) are characterised by the between-batch variance, σ 2 b , the four parameters of the Matérn covariance model for the between-site variable (c 0 , c 1 , κ and φ), the between-geologist within site variance σ 2 g and the correlation parameter ρ.We used residual maximum likelihood (REML) to estimate these parameters (Patterson and Thompson, 1971;Smyth and Verbyla, 1996).This proceeds on the assumption that ε in Eq. ( 2) is a realisation of a multivariate Gaussian random variable, E: where V is the covariance matrix given by Under this model the residual log-likelihood, ignoring constants, is given by where The Gaussian assumption can not be tested strictly, because it is an assumption about a multivariate distribution of which we have a single realisation.However, its plausibility can be tested by examining a histogram and summary statistics of the residuals of an ordinary least squares fit of the fixed effects model, equivalent to the statistics of the data in this case where a uniform mean is the only fixed effect.Where necessary, data may be transformed to a new scale of measurement to make the assumption more plausible.We used the optim procedure in the R package (R development core team, 2013) to find REML estimates of the random effects parameters, the values that maximise the likelihood as defined in Eq. ( 13).The L-BFGS-B optimisation method was selected, a quasi-Newton optimiser in which upper and lower bounds are supplied for the parameters to be estimated (Byrd et al., 1995).
In the proposed model, there are P = 7 random effects parameters (or variance parameters) to be estimated by REML.One may consider the "null hypothesis" that one of these parameters can be set at a fixed value, to simplify the model.For example, if one assumed that the cross-section errors for the same geologist at two sites within a batch are uncorrelated, then ρ = 0.In general a "null" model with P − g parameters is said to be nested within a more complex "full" model with P parameters if the null model can be regarded as a particular case of the full model with the g additional parameters taking fixed values.The maximised residual likelihood for the full model R,F is at least as large as that for the null model, R,N .To test whether the improvement of fit from the g additional parameters is large enough to justify their inclusion within the model one may compute the log-likelihood ratio statistic (Verbeke and Molenberghs, 2000) We call a comparison between two models a "standard case" if the g additional parameters in the more complex model all take definite values in the null model, and these parameter values are not on the boundary of the parameter space in the null model.In a standard case where the null model is the true model, L is distributed as χ 2 with g degrees of freedom (Stram and Lee, 1994).Note that this procedure is valid for residual likelihoods only when the models have the same fixed effects structure.
One may use this procedure to compare the LMM in Eq. ( 2) with one in which the geologist effects are regarded as uncorrelated between sites within batches.In the full model, ρ ∈ [−1, 1], so the fixed value, ρ = 0, in the null model is not at a boundary.The comparison is therefore a standard case with L ∼ χ 2 (1) under the null hypothesis.
However, if we consider a null model in which the between-batch variance is zero this is not a standard case since zero is the lower bound for a variance.A more general criterion for comparing models of differing complexity, although not a formal test, is to compute for each model Akaike's information criterion -AIC (Akaike, 1973): where is the maximised log likelihood (natural logarithms) and P is the number of parameters.That model is preferred for which A is smallest, so the term 2P is, in effect, a penalty for model complexity.Model 1b is a variant of 1a in which the between-batch variance is dropped.Since the batches were formed at random, one may expect that the mean error does not differ between the batches, except for random sample variation.However, in a comparison between these two models, the null (1b) is formed by fixing the between-batch variance at zero, which is a boundary in parameter space (variances cannot be negative).The models are therefore compared on the AIC.
Model 1c is a variant of 1a in which the correlation ρ = 0.As noted above, this comparison can be made by computing the log-likelihood ratio statistic L and testing it against χ 2 (1).
Having selected one model from among 1a-1c, a variant was considered in which the correlated variance of the between-site random variable, c 1 in Eq. ( 5), depends on the distance from that site to the nearest borehole available for interpretation (i.e.not in the validation set for the batch).We considered the possibility that this variance is a linear function of distance to the nearest borehole.The intercept and slope of this function, α s,0 and α s,1 , respectively, are therefore substituted for c 1 in model 1d.The comparison of between the null model selected from among 1a-1c and the more complex variant 1d can be made using the loglikelihood ratio, assumed to be distributed as χ 2 (1) under the null model since model 1d has one more parameter than the null.

Statistical methodology: refining the model to explain the geologist variance (models 2a, 2b and 2c)
Here we consider the possibility that the between-geologist variance can be replaced by a parametric function.In principle this is compatible with any variant of the models considered so far.The expression for the between-geologist covariance matrix in Eq. ( 10) is modified to where R g is defined as for Eq. ( 10), and where σ g is a vector of length N which contains the standard deviation of the between-geologist effect for each observation of cross-section error, predicted from some parametric function.The operator "diag" denotes that the elements of this vector are put in order on the main diagonal of an N × N matrix, with off-diagonal elements equal to zero.Three parametric functions were considered.In the first, the between-geologist variance for the rth observation of cross-section error depends on the distance from the site which corresponds to the rth observed cross-section error and the nearest borehole available for interpretation to the corresponding site.Again, a linear function was considered, so the parameter σ 2 g in the first and second group of models was replaced by the intercept and slope of this predictive relationship, α g,0 and α g,1 , respectively.These parameters, along with the remaining ones, were estimated by REML.
The second parametric model considered used the geologist's self-assessment of experience in 3-D geological modelling.There were four levels of experience to choose from, so the parameter σ 2 g in the first and second group of models was replaced by four parameters, variances for each level of experience: σ 2 g,1 , σ 2 g,2 , σ 2 g,3 , σ 2 g,4 .A final model was considered which combined the last two variants, with separate intercepts and slopes of the linear function for the geologist standard deviation being specified for each level of experience (i.e.eight new parameters replacing σ 2 g in the first and second group of models.
Note that the parametric functions in these three models return variances, which may vary from one observation of cross section to another.The terms in σ g are standard deviations, i.e. the square roots of the corresponding variances.

Simulating from the selected model to represent cross-section uncertainty
We used the selected model (model 2a as described in the results section below) to simulate realisations of the random component of cross-section error along a part of the cross section (from 4000 m from the start of the section to the end).We considered a situation where all the boreholes along the cross section were available to the geologist.We assumed that the cross-section error is zero at the location of a borehole, and then simulated the components of the error under model 2a conditional on this at regularly-spaced locations along the cross section.The between-site component was simulated as a multivariate normal random variate by Cholesky decomposition of the joint covariance matrix of the regularly spaced sampling locations and the borehole locations.This is described in detail by Goovaerts (1997).We used the CHOL R procedure (R development core team, 2013).To simulate the between-geologist component, we evaluated the variance of this component at each regularly spaced location on the cross section from the parameters of model 2a as a function of the distance to the nearest borehole.
A realisation of the between-geologist component of model error at each location was then simulated as a normal random variable, with mean zero and variance set to this computed value.We used the rnorm R procedure to do this (R development core team, 2013).The overall cross-section error was then simulated by the sum of these two components.A total of 10 000 independent realisations of cross-section error were simulated this way.By finding the 2.5th and 97.5th percentiles of the simulated cross-section errors at any location, we approximate the 95 % confidence interval for model error.This can be used to visualise the uncertainty.The simulations can also be used to answer other questions.Consider, for example, an engineer who wishes to dig a tunnel through the London Clay along the length of this part of the cross section.We assume that the engineer wants to put the route of the tunnel as close as possible to the base of the London Clay, but wants to avoid intruding on the underlying Lambeth Group.The conditional simulations can be used to assess the risk of intruding on the Lambeth Group if the tunnel route is k m above the interpreted base of the London Clay everywhere along the route.Assume that the engineer specifies that the tunnel should enter the Lambeth Group over no more than 1 % of its length.What is the smallest value of k consistent with this?One could examine the 10 000 realisations of cross-section error and find, for increasing values of k, the number of realisations for which the engineer's specification is met: n k .The probability of meeting the specification given some k can then be estimated as n k /10 000.

Summary statistics on model error from all validation sites
Figure 2 shows a scatter plot of interpreted and observed heights of the base of the London Clay for all observations of cross-section error.The points are scattered around the bisector (where observed and interpreted heights are equal), and there is no visual evidence of a systematic bias.Table 3 shows the summary statistics of cross-section error, and Fig. 3 shows the histogram of this variable.The symmetrical form of the histogram and the weak skewness and kurtosis values suggest that an assumption of normality is plausible for the analysis of these data.They also suggest that, if there is any systematic tendency for the base of the London Clay to be interpreted too high or too low, then this effect is small.

Model comparisons
The results for model 1a and its variants are shown in Table 4.Note that the estimated between-batch variance is zero.When a REML estimate of a parameter is at the boundary of parameter space, as here, it is advisable to examine the likelihood profile in the vicinity of the estimate.To compute the likelihood profile for a model parameter, that parameter is fixed at a series of values and, for each, the remaining parameters are estimated by maximum (residual) likelihood.The maximised likelihoods are then plotted against the values of the parameter of interest.The profile likelihood should increase smoothly towards the estimated value.The profile likelihood for the batch variance satisfied this requirement.This is not unreasonable; because the batches were formed at random, we would hope that the between-batch variation is purely explicable in terms of sampling error.The comparison of models 1a and 1b can be done by examining the AIC, which is smaller for the latter model, in which the batch effect is dropped.Model 1b is therefore selected over 1a.The profile likelihood for the uncorrelated between-site variance in these models, c 0 , also approached the estimated value, 0.0, smoothly.In model 1c, the correlation between within-site effects for particular geologists is dropped (set to zero).The maximum likelihood is slightly smaller than for model 1b, in which this parameter is estimated.However, the log-likelihood ratio statistic, L, for the comparison of (null) model 1c with (the full) model 1b is small, and the probability of obtaining a value of L this large or larger under the null model is large, and so the more complex model is rejected in favour of the null one.This is also consistent with the small estimated value of this correlation, −0.09.
In model 1d, a stationary correlated variance for the between-site effect (as in model 1c) is replaced by two parameters for a linear function which expresses this variance as a function of distance to the nearest borehole available for interpretation.This (full) model can be compared with a (null) model (1c) with a stationary variance by the loglikelihood ratio test.Once again, L is too small to support a choice of the more complex model.
In summary, the consideration of model 1a and its variants in Table 4 leads us to the selection of model 1c (smallest AIC in the table), in which the batch effect and the correlation parameter ρ for geologist effects are dropped, and the between-site variation is modelled as a stationary correlated random variable.
Table 5 shows results for model 2a and its variants.These models are based on 1c, but differ in that, rather than assuming a stationary geologist effect, the between-geologist within-site variance is modelled as a function of covariates.In model 2a, the geologist variance is modelled as a linear function of distance to the nearest borehole available to the geologist for interpretation.The zero value of the intercept, α g,0 , is plausible, under the assumption implicit in our analysis that the borehole data are correct, and the cross-section error should be zero at the location of a borehole.The positive value of α g,1 implies that the geologist variance increases with increasing distance from a borehole, which is also plausible.Model 1c can be regarded as nested within 2a, a null model with α g,0 equivalent to σ 2 g and α g,1 = 0.The models can be tested by the log-likelihood ratio statistic; Table 5 shows that the null model (1c) can be decisively rejected in favour of the full model 2a.
Model 2b is an alternative to 2a, in which the geologist variance depends on the self-identified experience of the geologist in 3-D modelling.The estimated parameters in Table 5 are plausible in that the variance is largest for geologists who identified themselves as having "no experience of modelling in 3-D" and smallest for those who identified themselves as having "substantial experience of more than 2 years of modelling independently."Once again, this model could be compared with 1c by a log-likelihood ratio test, and the null model (1c) can be rejected, indicating that there is significant evidence for differences in geologist variance, related to geologist experience.However, the evidence for this model is weaker than for 2a.
In model 2c different relationships between geologist variance and distance to nearest borehole were fitted for the four levels of geological experience.In the fitted model the intercepts were all zero, the smallest slope is for the geologists with the highest experience level.However, while the loglikelihood ratio test shows that model 2c is significantly better than model 2b (i.e.adding the information on distance to nearest borehole to a model with geologist experience gives  a significant improvement), the comparison of model 2c with 2a leads to the conclusion that adding geologist experience to a model which already has the distance to nearest borehole incorporated does not give a significant improvement.On the basis of the AIC, model 2a is preferred among all those considered in this study.Table 6 summarises all the key comparisons between models and the inferences which arise from these comparisons.
Table 7 shows the estimated mean cross-section error and its standard error, under model 2a.The Wald statistic (e.g.Dobson, 1990) is a test of the null hypothesis that the mean error is zero, and the large p value shows that this cannot be rejected, so the data provide no evidence for systematic bias in the interpretation.Note that the estimate of mean error is rather less than the average for all observations reported in Table 3.That is because (i) the mean reported in Table 6 is the model mean, the fitted effect in the underlying statistical model for cross-section error, and (ii) the original data were not a random sample in space, and show some local clustering which is likely to bias the arithmetic average as an estimate of the spatial mean of cross-section error.
Fig. 4 shows the 95 % probability interval for cross-section errors along the section, approximated by the 2.5th and 97.5th percentiles of the conditionally simulated errors.The red symbols show the locations of the boreholes.There are two features of the interval.First, there is a rapid narrowing near the boreholes (the interval is zero at the boreholes, but this is only seen if the borehole coincides with a point where the error is sampled).This arises from the spatial correlation of the between-site component of cross-section error.The second feature is a gradual widening of the interval to a local maximum at the midpoint between successive boreholes.This is particularly apparent in the second half of the plot.This arises from the dependence of the between-modeller effect on the distance to the nearest borehole, showing how the constraint of the borehole on model error decays with distance.In Fig. 5, the confidence intervals are added to the interpretation of the base of the London Clay by one of the modellers.
Fig. 6 shows a plot of the estimated probability that a tunnel built k m above the interpreted base of the London Clay will intrude on the underlying Lambeth Group over no more than 1 % of its length for different values of k.This shows that the engineer can be 90 % confident that this specification will be met if the route is a little less than 8 m above the interpreted base.

Conclusions
Both the summary statistics and the scatter plot (Fig. 1), and the estimate of the mean cross-section error from the selected model 2a (Table 7), show that the data obtained in this study provide no evidence that there is any bias in the interpretation of the base of the London Clay by the geologists in this study; i.e. the mean error is not significantly different from zero.
We established this experiment to test the hypothesis that the variability of the error of interpretations of cross sections varies spatially.This hypothesis has been supported.First, we found that there is spatial dependence in the variability of the between-site component of cross-section error.This is to say that the cross-section error at one location is likely to be more strongly correlated with the error at a nearby location than at one farther away.This is reasonable, since if, for example, a surface tends to be interpreted as being too high above the Ordnance Datum at a site, perhaps because of faulting, then it is likely that a similar error will occur at nearby sites.There was no evidence, however, that the between-site variance depends on the distance to the nearest borehole.
The between-geologist variance is rather smaller than the between-site variance (compare c 0 with σ 2 g in model 1c).However, there was evidence that the variance of this error depends on geologist experience and also on the distance to the nearest borehole available for interpretation.The results for these two models are consistent with our hypothesis, and also make intuitive sense in that the variance of cross-section error declines with the geologist's experience, and increases with increasing distance from the borehole.However, the preferred model for the data, given a penalty on model complexity, considers only the distance to nearest borehole.It is interesting to note that our results on how model uncertainty increases with distance to constraining interpretation boreholes, and the effect of modeller experience, are consistent with the opinions on sources of uncertainty that have been elicited in published studies (e.g.Lelliott et al., 2009).This study provides empirical evidence for these opinions, and a direct quantification of the effects.
The fitted model can be used to simulate cross-section errors, conditional on a distribution of boreholes.One may use this procedure to compute confidence intervals around the interpreted cross section which quantifies uncertainty in this interpretation and shows how this changes in space.One could also use this simulation method to study the propagation of cross-section error in further processing to interpolate the surface into 2-D, and so produce 3-D volumes.
The methodology presented in this paper could be deployed in a wider range of geological settings in order to generate statistical models of cross-section error for those settings.These could then be used to compute confidence intervals for new models or measures of uncertainty specific to the requirements of particular data users, such as the example for the London Clay illustrated in Fig. 5.
The experimental design used in this study allowed us to make best use of somewhat sparse boreholes by examining multiple geologist interpretations at each validation site.However, if there had been a significant correlation between within-site effects for the same geologist, then subsequent modelling of the geologist variance would have been complicated.Alternatively, one might use an experimental design in which validation sites are nested within modellers (so each modeller has a unique batch of validation sites).This requires there to be many boreholes available, however, since each validation borehole is compared with just one interpretation.It also reduces the information that we obtain on betweenmodeller differences.
One way to get around the problem of insufficient validation observations is to generate synthetic cross sections, perhaps conditioned on geophysical data such as interpretations from seismic lines.These synthetic cross sections can then be notionally sampled at as many locations as we want to provide synthetic borehole data for interpretation and validation.In such an experiment, the syntheticvalidationboreholes should be sampled according to an optimised design (e.g.Lark, 2002) to ensure good estimation of the spatial variance parameters and to give good coverage of possible covariates, e.g.spanning a range of distances to the nearest borehole available for interpretation.

Figure 1 .
Figure 1.Map of surface geology (superficial and exposed bedrock) in the study area, with the line of the cross section shown.Map coordinates are in km on the British National Grid.One interpretation of the cross section is shown below, with the position and depth of the full borehole set indicated.

Figure 2 .
Figure 2. All validation observations of the interpreted and observed height of the base of the London Clay AOD.The red line is the bisector.

Figure 4 .
Figure 4. 95 % probability interval for simulated cross-section errors conditional on the location of the nearest borehole (red symbol) and model 2a.Note that these are evaluated at discrete locations.

Figure 5 .
Figure 5.One geologist's interpretation of the base of the London Clay (red) with 95 % confidence intervals (blue).

Figure 6 .
Figure6.How close to the modelled base of the London Clay could you build a tunnel (over the last 4 km of the cross section) and have a specified probability (ordinate) that the tunnel will stray into the underlying Lambeth Group for no more than 1 % of its length?

Table 1 .
Questionnaire on modelling experience and responses received.

Table 2 .
Summary of statistical models.In all cases the form of the random effects component for between-batch, between-site and betweengeologist effects is indicated.For each term the dependency is given (independent or an indicated correlation structure) and it is indicated whether the variance is constant (stationary) or modelled as a variable quantity.A ↓ indicates that a term is dropped from the model.
1 Errors of interpretations by the same geologist are correlated. 2ance depends on distance to nearest available borehole for interpretation.3Variancedependsongeologists self-identified experience of 3-D modelling (Table1).

Table 3 .
Summary statistics of cross-section error.

Table 4 .
Model 1 and variants, parameter estimates and inferences.The first-named model is the null model.* * A ↓ indicates that a term has been dropped from the model. *

Table 5 .
Model 2 and variants, parameter estimates and inferences.

Table 6 .
Summary of model comparisons.In each case, the first-named simpler "null" model is compared with a more complex alternative, either on the log-likelihood ratio L or Akaike's information criterion (AIC).The key conclusion from the comparison is indicated.

Table 7 .
Estimate of the mean cross-section error conditional on model 2a.