Towards plausible lithological classification from geophysical inversion: honouring geological principles in subsurface imaging

.


Introduction
The idea of geoscientific integration is not new and has been advocated since the inception of quantitative geoscientific studies involving geophysics and during the advances of geophysics as a discipline (see for instance Wegener, 1915, Nettleton, 1949, Towles, 1952, Jupp and Vozoff, 1975, Lines et al., 1988, Li and Oldenburg, 2000, ).In the natural resource sector, the exploitation of the fundamental complementarity between geology and geophysics in modelling the same object (the earth) has been recognized as one of the pre-requisites to exploration success as early as the 1940's during the early years of the Society of Exploration Geophysicists (Eckhardt, 1940, Green, 1948).Numerous authors have since tackled the issue of integrating petrophysical and geological information to model geophysical quantities (seismic velocities, mass-density, etc.) through inversion, with an increasing trend in the past 15 years or so (see for instance references reviewed in Lelièvre and Farquharson 2016;Meju and Gallardo 2016;Moorkamp et al. 2016, andGiraud et al., 2017).In contrast, the recovery of geological quantities from geophysical inversion has seen much less effort.Recent studies have started to rectify this by proposing the idea of lithological differentiation of inversion results (Paasche et al. 2010, Sun and Li 2015, Paasche and Tronicke, 2007), which consist of the identification of lithologies from inversion results.While lithological differentiation is expected to hold much potential in mineral exploration it still remains underexplored (Li et al., 2019).
In oil and gas exploration scenarios, seismic facies analyses and classification using techniques developed for what is commonly called machine learning (neural networks, support vector machine algorithms, etc.) have become popular in recent years (Zhao et al., 2015, Chopra and Marfurt, 2018, Wrona et al., 2018, Zhang et al., 2018).This was driven, by the need for quantitative interpretation methods in the geosciences, and by the 'renaissance' phase machine learning went through after 2006 (Chap.5, Goodfellow et al., 2016).Once recovered, spatial facies distribution can be used for geological interpretation and downstream decision-making.However, like all modelling results, the identification of facies or lithologies using machine learning relies upon statistical models and is affected by ambiguity and uncertainty.One reason for this is that validation datasets are usually treated as the ground 'truth' while they are fraught with uncertainty.For instance, the interpretation of borehole data or outcrops with their uncertainty can lead to significantly different models honouring geological measurements equally well (Wellmann et al., 2010, de la Varga et al., 2018, Pakyuz-Charrier et al., 2018a, b, c).
Lithologies (or facies) can be characterised by a broad range of rock properties that are the result of geological processes, such as weathering, compaction, metamorphism and deformation.These physical processes are usually non-linear, especially when in combination, and produce complex representations of different lithotypes which are difficult to discriminate from geophysical data.In this context, one possible solution is to use neural networks for lithological classification as they are "universal approximators" (van der Baan and Jutten, 2000 ).As a consequence, however, lithological classification is affected by uncertainty from the data used to train and validate the algorithm.Such uncertainty is difficult to quantify and is rarely estimated or even considered.
To date, whether it be in oil and gas or mining exploration, uncertainty in recovered lithologies is a research avenue, which, to the best of our knowledge, only a few authors have addressed.Sun and Li (2019) assess uncertainty by varying the number of clusters in their lithological differentiation scheme, and Bauer et al. (2003) classify lithologies and estimate the resolution of their results using synthetic data.As a result of the lack of comprehensive uncertainty analyses, practitioners often lack quantitative, robust uncertainty modelling necessary to inform interpretation or risk evaluation (Jessell et al., 2018).In addition, apart from Zhao et al., (2017) who account for seismic data-driven stratigraphy in their seismic facies classification, established workflows relying on neural networks to identify facies or lithologies in three-dimensions give little to no consideration of geological information and rules for their classification.
To complement existing methodologies, we propose a solution that partially addresses the lack of consideration given to geological information during classification.We introduce a general post-processing (i.e., post-inversion) workflow for the recovery of lithologies from geoscientific modelling results and the estimation of the related uncertainty.For this purpose, we complement existing classification techniques by ensuring the geological and geophysical consistency of, and estimating the confidence in, the recovered lithologies.Using an artificial neural network trained in a fully controlled environment (all variables in the model used for training being perfectly known) with attributes characterizing the inversion results, we perform lithological classification applying plausibility filters relying on geological principles which we refer to as 'geological post-regularisation'.The application of geological post-regularisation is to reduce the non-geological character of models obtained through classification.After classification, we calculate the frequentist probability of the different lithologies (i.e., apparition frequency relative to all lithologies) associated with each unit of the SOM and report it in each modelcell discretising the studied area for interpretation.
The methodology we propose can serve two main objectives.Our first objective is to introduce a methodology that is made efficient by leveraging existing geoscientific inputs and prior information, and cost-effective by imposing requirements that do not exceed the computational power available on a personal computer.Secondly, our aim is to complement inversion workflows by providing a general, automated method to derive a nondeterministic lithological interpretation of inversion results.Thirdly, we propose a real-world application based on a case study in the Yerrida Basin (Western Australia) where we build upon recent work by Giraud et al. (2019a) and Lindsay et al. (2018) who performed the geophysical inversion and geological modelling of data collected in the area, respectively.
In this work, lithologies are identified though a classification technique relying on a simple artificial neural network.We chose 'self-organizing maps' Kohonen (1982aKohonen ( ), (1982b)), a well-established algorithm that has been successfully applied to seismic facies classification and geological mapping purposes (Chang et al., 2002, Klose, 2006, Köhler et al., 2010, Bauer et al., 2012, Carneiro et al., 2012, Du et al., 2015, Roden et al., 2015).We first train and test the self-organizing maps (SOM) using data extracted from a semi-synthetic dataset (i.e., a geophysical inversion feasibility study based on geological and petrophysical field data) assumed to represent the geophysical characteristics of the studied area.We utilize this controlled environment to estimate the accuracy our predictions for each class identified in the studied volume without the errors associated with well positioning or lithology interpretation errors.We then use the trained network to perform classification using field data only.
We obtain, for each model cell, a suite of frequentist probabilities for each lithology observed in the area.In both cases, geological post-regularisation is applied before the calculation of uncertainty metrics to ensure the geological consistency of the results.
The rest of this paper develops as follows.Section 2 provides the theoretical background necessary to reproduce the work presented.It first briefly describes the geophysical and geological modelling schemes (subsection 2.1) used to obtain the models that are used as input for classification using SOM (subsection 2.2).Post-regularization as applied to such classified lithologies (subsection 2.3) and the related uncertainty analysis in terms of prediction accuracy and geophysical consistency are then detailed (subsection 2.4).Following this, Section 3 presents an application case using data from the Yerrida Basin (Western Australia), which was investigated using gravity data, petrophysical and geological information.Geological and geophysical modelling results are first summarized and the rules defining post-reguralisation operator in the area are introduced (subsection 3.1).The classification of results from geological and geological modelling and post-regularization are then presented alongside the related uncertainty analysis, supporting a potential re-interpretation of the geological model of the area (subsection 3.2).The discussion and conclusion sections follow and complete this contribution.

Methodology for geoscientific modelling and classification
In this section, we first introduce essential information about the geophysical and geological modelling used as pre-requisite to this study.We then introduce the utilisation of SOM and the tools we developed in sufficient detail to allow the reproducibility of the procedure.

Geophysical and geological modelling
Inverse geophysical modelling was performed using the least-square inversion platform Tomofast-x.This inversion platform enables the use of a series of constraints as detailed in Martin et al. (2018), Giraud et al. (2019aGiraud et al. ( ), (2019b)).Constraints are enforced through a minimum-structure gradient regularisation approach where weights vary locally accordingly with geological uncertainty (Giraud et al., 2019a) The matrix   is calculated following Giraud et al. (2019a), who use the probabilistic geological modelling approach described in Pakyuz-Charrier et al. (2018c, 2018b, 2019).In the case of gravity inversion as presented here, the complete Bouguer anomaly of density contrast model  is calculated as the product of the Jacobian matrix  with model .Therefore, we have () = .
Geological uncertainty is estimated from probabilistic geological modelling.During this process, and ensemble of geological models is generated using the Monte-Carlo Uncertainty Estimator (MCUE) of Pakyuz-Charrier et al. (2018a, 2018b, 2018c, 2019).MCUE relies on the perturbation of orientation measurements (interfaces and foliations) defining structures of a referencegeological model accordingly with their uncertainty.From this series of models, geological uncertainty can be estimated (Wellmann and Regenauer-Lieb, 2012)through calculation of Shannon's entropy (Shannon, 1948) for the simulated geological models.Shannon's entropy, which can be used as a proxy for geological uncertainty, indicates how well geological information constraints the model locally.It can be used to constrain inversion in a structural sense when integrated in   as per equation (1) (Giraud et al., 2019a).
More detailed information about the usage of MCUE results in geophysical inversion can be found in Giraud et al. (2017Giraud et al. ( , 2018Giraud et al. ( , 2019aGiraud et al. ( , 2019b)).

Classification using SOM
The SOM artificial neural network relies on competitive, non-supervised learning.The relative simplicity and the efficiency of the SOM algorithm has made it a popular tool for classification, data imputation, visualization and dimensionality reduction (Vesanto and Alhoniemi, 2000, Kalteh et al., 2008, Miljkovic, 2017, Klose, 2006, Kohonen, 1998, 2013, Roden et al., 2015, and Martin and Obermayer, 2009).In essence, it consists in the projection of the SOM's latent space onto a manifold of superior dimension (i.e., our dataset).This map, which can be 2D or 3D, is made of a predefined number of interconnected neurons (also referred to as 'nodes' or 'units') that have a fixed network configuration.Projection occurs during the training phase, where the locations of the neurons in the manifold are iteratively adjusted so approximation is optimal.
In this study, we follow common practice by training two-dimensional (2D) maps using a hexagonal lattice topology and applying a Gaussian-shaped neighbourhood function.We chose to use a 2D map for the sake of simplicity after our testing revealed that other configurations did not improve results significantly.The hexagonal lattice topology seemed to provide better results than square lattice topology using the dataset we present here.
Ideally, the SOM should be trained in a controlled environment where all the variables used are perfectly known, which motivates the utilisation of synthetic geophysical data.We calculate such data from a geological structural framework derived from real-world field geological and petrophysical field measurements data in the same During training, we examine SOM quality using quantization error  and lithology prediction accuracy.
Quantization error "measures the average distance between each data vector and its best matching unit [BMU]" (Uriarte and Martín, 2005), thereby indicating how well the different BMUs approximate the dataset.It can be interpreted as analogous to a misfit between calculated and observed data.The mean quantization error  of the SOM is expressed as follows for  data vectors   : where   is the BMU of   .In the application of SOM to field data, the trained map is used for the classification of inversion results where input 1 through 4 listed above are obtained from previous modelling and lithologies are the quantity sought for.In our case, the utilisation of SOM for partitioning the input models allows the recovery of lithology, which is a geological quantity reflective of all input data.It is also useful in that, as we will see later, the consistency of the recovered lithological model can be analysed from a geophysical point of view.
In the approach we follow, the optimum number of neurons (or units) is determined using the elbow curve of the mean quantization error Q (equation 2) of the trained SOM.Note that we apply the same principle as the wellknown L-curve principle (Hansen and O'Leary, 1993, Hansen and Johnston, 2001, Santos and Bassrei, 2007) for the determination of optimum weights in least-square geophysical inversion.Here, we train the SOM using functions from the SOM Matlab toolbox implemented by Vatanen et al. (2015).

Geological post-regularisation
This subsection introduces the post-regularization scheme used in this work and details its implementation and usage in the workflow introduced here.

Motivations
Geological rules have the potential to provide an important constraint on the classification of lithologies recovered from inversion.Such rules, like adjacency (Egenhofer and Herring 1990) define which rock bodies can be in contact with each other, and which cannot.These rules are typically expressed in geological terms as stratigraphy, where the relative age and event classification of geological units are stated.For example, a sedimentary depositional event of five separate units may define a simple sub-horizontal layer cake configuration where the oldest unit is never adjacent (or in contact) with the youngest unit.A magmatic event that follows may result in a vertical dyke that intrudes all sedimentary layers adjacent to all other rock units.Using geological rules as a constraint relies on finding those that are restrictive (such as the youngest unit never being in contact with the oldest), rather than permissive (such as the intruding dyke).The process of postregularisation, which consists in the application of spatial-contextual filters to the classification results to eliminate geologically unrealistic features, has been shown to increase prediction accuracy in surface (2D) geological mapping (Tarabalka et al., 2009, Stavrakoudis et al., 2014, Cracknell and Reading, 2015).

Implementation
The post-regularisation scheme we develop for the recovery of lithologies in 3D relies on two hypotheses.Firstly, we assume that the presence of isolated lithologies contradicts the geological principle of continuity.Although such post-regularisation has been used mostly in 2D or shallow 3D, there is no theoretical obstacle to the extension of this methodology to the purely 3D classification case we present here.Secondly, we introduce the utilisation of adjacency relationships between the different lithologies in post-regularisation to ensure that base topological rules are respected across the entirety of the three-dimensional volume.This is particularly important for structural geological interpretation (Freeman et al., 2010, Godefroy et al., 2019).Here, we extend existing postregularisation approaches (i.e., Tarabalka et al., 2009, Stavrakoudis et al., 2014, Cracknell and Reading, 2015) by integrating geological information in the classification analysis in the form of topological relationships (see Egenhofer and Herring 1990, Zlatanova, 2000, and Thiele et al., 2016, for the different topologies) defined by geological principles.
The general formulation of post-regularisation is as follows, for a given model-cell: where arg min returns the argument  satisfying the conditions it precedes.Here, we set: where  is   × 1 column vector of ones, and    ,  is the   ×   matrix of zeros;   and   are adjacency matrices, and   ∘   is the Hadamard product of   and   , such that (  ∘   )  = (  )  (  )  , and    ,  is the zero matrix of dimension   ×   ;   encapsulates geological knowledge and principles about contacts between lithologies;   contains the adjacency relationships between the considered cell and its neighbourhood .The derivation of   and   is detailed below.
The first part of the condition in equation ( 4) is enforced using morphological closing where isolated lithologies are replaced by the most prevalent one in their neighbourhood (see Benavent et al., 2012, Ackora-Prah et al., 2015).In such cases, the BMU is updated as follows.Isolated cells (in terms of their lithology) are identified through examination of the 26-cell 3D cubic Moore neighbourhood  of every model-cell.A cell is considered isolated if, and only if, at least 25 cells of  have a lithology that differs from it.Once such a cell is identified, its BMU is updated using the closest neuron ensuring continuity between adjacent cells in the neighbourhood , where the lithology to be assigned is determined by a majority vote in  subject to adjacency conditions.These conditions are determined by geological knowledge (as explained below).This process is repeated for all locations until the lithological model stops changing.The general principle of post-regularisation is illustrated in Figure 1.
We point out that in contrast to Tarabalka et al. (2009) and Stavrakoudis et al. (2014) who use the first and second Chamfer neighbourhoods in 2D around the considered model cell, we do not follow the same approach in 3D.
Our implementation of the extension of their approach to 3D showed that, in our application case study, the adjustments of the recovered lithological model it imposes are detrimental to the consistency of the classification with geophysical measurements.That is, the perturbation of the corresponding geophysical response of the model it generates exceeds noise level and compromises the geophysical validity of the recovered model (see geophysical validation subsection below for more details).The same remark applies to the utilisation of a mode filter with a 3×3×3 kernel.The conditions relating to adjacency relationships forces the model to honour adjacency relationships extracted from surface geology (Burns, 1988, Thiele et al., 2016) in the recovered lithological model.
We determine lithological topology by identifying the contacts between adjacent model-cells and represent the topological signature of lithological models using the adjacency representation of Godsil and Royle (2001).Let the adjacency matrix  of a given cell be defined as: where   is the number of contacts between lithologies  and .Similarly, geological laws and knowledge allow the derivation of a matrix   defined as follows: From there, it is straightforward to identify occurrences of forbidden contacts by calculating   ∘   .Therefore,     ∘    = 0 (equation 4) indicates that no contact violating the condition imposed by   is observed.The last condition in equation ( 4) can be used to prevent the local stratigraphy (in the Moore neighbourhood  of the considered cell) from violating geological rules such as "lithology B must be in conformable sequence between lithologies A and C".
After identification of configurations forbidden by the conditions set in equation ( 4), its BMU is updated using the closest neuron honouring the set of conditions (Error!Reference source not found.below, box 4-a).
The next stage of the methodology we introduce is the calculation of the apportionment of each neuron in terms of the lithologies of the testing data vectors (from the synthetic survey) they predict (Error!Reference source not found., box 5-a).For instance, in a two-lithology scenario, a given node may be found to predict lithology A using the validation dataset correctly 80% of the time (80% accuracy) and lithology B with correctly 20% of the time (20% accuracy).This process is described below.
The methodology is summarized in Error!Reference source not found.below.

2.4
Uncertainty analysis

Prediction accuracy of the recovered lithologies
The prediction accuracy   of lithology   [1, … ,   ] (with   the total number of lithologies) is the ratio of correct predictions to the total number of predictions.It is obtained from the 'matching matrix' of the recovered lithologies   .We remind that the 'matching matrix' (or 'confusion matrix' in supervised learning) is a matricial representation of the number of occurrences of true/false positives/negatives.
We use the prediction accuracy   as a metric measuring the capability of the node  of the trained SOM to recover lithologies.Let   be expressed as: which is a particular case of the overall accuracy : From equation ( 7), it appears that   is equivalent to the frequentist probability of the  th lithology over the entire SOM.When considering a specific node , it becomes the relative frequentist probability of the lithology  for cells classified as having the  th node as their BMU, noted   .

Geophysical consistency
The consistency of the classification performed using SOM after application of post-regularisation with field geophysical measurements might be altered by both the classification itself and by post-regularisation.It is therefore necessary to ensure that the approximation of the dataset by values from the units of SOM is consistent with geophysical measurements.To this end, we verify that the geophysical response of the density contrast model   corresponding to the BMUs of each cell in the studied area fits the field measurement  within a certain tolerance assumed to approximate noise level.Consequently, we ensure that the difference between   and   , ∆ =   −   , satisfies the following condition:

Synthetic geophysical inversion
where  is the threshold depending on noise levels in the data above which   and   are not considered geophysically equivalent.
The implication of equation ( 9) is that the difference ∆ belongs to the null-space of the inverse problem considered.The null-space is characteristic of geophysical inversion's non-uniqueness.It is defined as the ensemble of models that reproduce geophysical data with a comparable misfit.The models honouring equation ( 9) can therefore be considered equivalent from a geophysical data point of view (Muñoz and Rath, 2006, Chen et al., 2007, Deal and Nolet, 1996).For qualitative assessment of geophysical consistency, we complement the utilisation of equation ( 9) with the visual comparison of data misfit maps corresponding to the model corresponding to   and   .

Survey setting
This subsection introduces and summarises the geological and geophysical context of the application case presented here.More details about the geology of the area and the initial geophysical inversion can be found in Giraud et al. (2019a) and Lindsay et al. (2018), (2020).

Geological and geophysical setting
The Paleoproterozoic Yerrida Basin is located in the southern part of the Capricorn Orogen (WA), and covers approximately 150km N-S and 180 km E-W (Pirajno and Adamides (2000)) (Figure 3a).The structures of interest in this work are Archean greenstone belts (Figure 3) as they are prospective for Au and Ni and underlie the younger basin rocks.Basement to the Yerrida Basin is considered to be Archean granite-gneiss or greenstone rocks of the Yilgarn Craton.Lithospheric extension initiated the formation of the Yerrida Basin at c. 2200 Ma to 1990 Ma with deposition of the Windplain Group (Occhipinti et al., 2017, Pirajno andAdamides, 2000).The Goodin Inlier remains exposed in the central part of the basin and is in unconformable contact with the Windplain Group.A hiatus ensued, followed by deposition of the younger Mooloogool Group, which was then overlain in the east by the Tooloo Group of the Earaheedy Basin.The density contrast of the lithologies observed in the area range between 0 and 330 kg/m³, making it appropriate for gravity modelling and inversion (Giraud et al., 2019a, Lindsay et al., 2018).While basin rocks exhibit some density contrast, the greenstones are conspicuous in gravity data with a density contrast expected to lie between 190 and 270 kg/m³ making them an attractive subject for gravity inversion.Field geological measurements (orientation data in the form of interfaces and foliations) and petrophysical data were used to build the reference geological model.Airborne geophysical data, Landsat and Aster 8 satellite data were also used to support the interpretation of geological measurements.
The gravity anomaly dataset we consider (Figure 3b) is comprised of a total of 4882 measurement points.The model is discretized into 100×100×42 cells of dimensions 2.335 km ×1.875 km ×1.0475 km down to approximately 44km depth.Weights and parameters used for the inversion of synthetic data follow the settings of Giraud et al. (2019a) on field data.

Geological modelling and synthetic geophysical survey
This subsections introduces the semi-synthetic survey we performed for the training of SOM.
The volume of most probable lithology,   and the starting model are shown in  We use the modelling results shown in Figure 4 to calculate a synthetic geophysical dataset (Figure 5d).The 10 model used to generate the synthetic geophysical measurements is shown in Figure 5a.The corresponding inverted model and its gradients are shown in Figure 5b

Field geophysical data inversion 5
The density contrast model obtained from inversion of field geophysical data and its gradients are shown in Figure 6a and b.Visual comparison of inverted models shown in Figure 6a and Figure 5b reveals that mesoscale structures are similar with the exception of large structures presenting low density contrasts at depth (darker shades of blue in Figure 6a).This is reflected in Figure 6b, which exhibits low gradient values in these areas.

Geological rules for post-regularisation
As mentioned above, for clarity in this demonstration, only adjacency relationships are considered.The utilisation of simple relationships such as adjacency is also chosen because in areas of sparse data, a full description of geological rules (fault relationships with fault and stratigraphy) is often not known.Given the complexity of the Yerrida Basin and its magmatic and deformation history, several base geological rules can be derived to assess the plausibility of recovered lithological models.Using fundamental geological principles (such as uniformitarianism, superposition, Walther's Law, cross-cutting relationships and original horizontality), the two most likely restrictive adjacency rules are as follows.We assume that the mafic greenstone bodies cannot be in contact with the Killara Formation (in the Mooloogool Group) since our field data suggests that the Killara Formation is a volcanic unit that is restricted to the Yerrida Basin and thus not in contact with the mafic greenstones.In addition, we assume that the mafic greenstones cannot be in contact with the Goodin Inlier and background (or basement) as the mafic greenstones are modelled to be enveloped by the felsic component of the greenstone.
The matrix   defining the contacts forbidden by geology as described above is given in Figure 7. Figure 7. Matrix defining forbidding contact between lithologies in the Yerrida basin. 1 means that two units may be in contact with each other, 0 means that they may not and  represents symmetric relationships or when the same unit is adjacent to itself (which geologically may occur across a fault, but cannot be resolved by the geophysical data available).

3.2
Geologically-constrained SOM classification and uncertainty analysis

Training the neural network
In this work, we use approximately 500 neurons (units) for the training of SOM.This number is inferred from the analysis of the elbow curve we calculated using the validation datasets (see Figure 8) and approximately matches the proposed value of 5√ (with  the total number of observation) proposed by Vesanto and Alhoniemi (2000) and commonly used since (Shalaginov and Franke, 2015).The chosen number of units is corroborated by the lithology prediction accuracy (Figure 8) as approximating the point of diminishing returns, i.e., the number of nodes beyond which additional nodes are becoming nearly redundant.The trained map presents a mean quantization Q equal to 0.075, which indicates relatively good approximation of the datasets by the trained SOM.This is illustrated by Figure 8 where all lithologies are recovered with a prediction accuracy superior to 90%.

Classification and post-regularisation
After classification of the recovered model, we performed post-regularisation to remove geologically unrealistic features from the classified lithological volume.Figure 9 show the classification results before and after postregularisation, along with the associated adjacency matrix.As can be inferred from the adjacency matrices plotted in Figure 9, the number of contacts between units with indices 1 and 2, respectively, is reduced by the application of post-regularisation.One reason for this is the presence of a number of inclusions of lithologies 1 in 2 and vice versa; a total of 2561 such inclusions was identified.Overall, the number of contacts between lithologies 5 and 2 increased slightly due to post-regularisation because a large percentage of contacts between lithologies 5 and 1 have been re-assigned as contacts between 5 and 2. The elimination of contacts between 5 and 6 is also visible in Figure 9.

Estimated confidence in recovered lithologies
For each node of the SOM we calculate the prediction accuracy   (equation 3) for the different lithologies observed in the area using the cross-validation dataset.After application of the trained SOM for the classification of inversion results obtained from the inversion of field geophysical data, we obtain a frequentist probability volume for each lithology.Figure 10 shows the resulting frequentist probability volumes for the 6 lithologies present in the Yerrida basin.

Geophysical null space validation and implications for geological interpretation
Applying equation ( 9), we obtain ‖  ∆ ‖ 2 2 = 6 ×10 -4 , indicating that the model can be considered geophysically equivalent overall.This is illustrated by the map of the corresponding data misfits (see Error! Reference source not found.in appendix B), which indicates that we can consider the recovered lithological model after application of post-regularisation as reflective of both geophysical and geological information.
Focusing on the mafic greenstone belts of interest in the area (Figure 11), the classification results allow us to propose the following geological interpretations.

Discussion
The application of the technique presented here is not restricted to usage of the particular geophysical or geological modelling schemes generating the modelling inputs to this study.The methodology we introduced is general and any different standalone geophysical and geological modelling schemes could also be used.
The work presented here relied on SOM, which can be seen as an extension of the k-means and c-means clustering algorithms used for lithological differentiation by (Paasche and Tronicke, 2007, Carter-McAuslan et al., 2015, Sun and Li, 2015, 2016b, Maag and Li, 2018, Ward et al., 2014, and Singh and Sharma, 2018), with which it shares a number of characteristics.We can therefore assume that our findings may hold true for these techniques.
We have shown that the utilisation of post-regularisation can be effective to increase geological realism in the recovered lithological models while preserving the geophysical validity of the corresponding model.The geological principles we used to design our post-regularisation operator apply to lithological topology and focus on adjacency relationship between cells.Ideally, post-regularisation should also consider the surface area of contacts and their topology.This could be followed by, for instance, a 3D extension of the geological modelediting approach of Anquez et al. ( 2019) to produce genuine geological models honouring age relationships, stratigraphic principles, etc. Provided that the resulting models honour equation ( 9), this approach would ensure that while they are geophysically valid, they can be readily used for interpretation or by commercial or noncommercial geological modelling engines, reservoir simulations, etc. without further processing.
We also believe that post-regularisation can be successfully applied to other clustering techniques.In addition, the implementation of post-regularisation presented here can be readily applied to existing classification, regardless of the classification algorithm used as it only adjusts the classification using spatial-contextual features in the classified model, and could assist the geological characterization of inversion results (Melo et al., 2017).
The example we have shown uses a covariance matrix   (equation 1) that results from geological modelling.It is used as a proxy for the uncertainty about our knowledge in terms of structural geology.Such prior information could also be derived from techniques other than geological modelling such as, for instance, prior geophysical modelling, be it using the same or different geophysical methods.While we do not address the uncertainty in the density model directly, we assume that non-uniqueness and measurement uncertainty affect both field data and synthetic data in the same manner due to the noise component and parameterization of each being the same.
An important result produced here involves the identification of regions which do not adequately conform to the initial model parameters (Figure 11).While this issue remains unresolved, the capability of our method to identify problematic regions is useful to drive reinterpretation of data, consideration of additional models and, eventually, increased geological knowledge of the target.
Future work may include the generation of multiple lithological models using the trained SOM and the frequentist probability volume associated to it.By selecting models belonging to the null-space of the geophysical data (i.e., satisfying equation 9), we expect that this would allow the identification of a series of a few archetypes that would be representative of the various datasets used in the geoscientific modelling workflow.

Conclusions
We have introduced a post-inversion classification technique relying on SOM that enables the recovery of lithologies, the corresponding frequentist probability voxet thereby remediating to some of the limitations of deterministic inversion.The proposed technique utilizes a post-regularisation scheme enforcing elementary geological principles to the recovered lithological model while maintaining geophysical validity.We have applied this new methodology to the Yerrida basin (Western Australia) and shown how it improves the geological plausibility of the recovered model.Results allowed us to confirm previous results and to bring new insights into possible reinterpretation of the geometry of prospective greenstone belts.
fashion as for a geophysical feasibility study.The training and tests datasets are comprised of the following variables: 1. Starting model for inversion   ; 2. Inverted model   ; 3. Geological uncertainty   ; 4. Spatial gradient in the inverted model ‖∇  ‖ 2 ; 5. Most likely lithology obtained from geological modelling (training lithological model).The starting model is obtained from prior information.Here, it is the expected petrophysical property model from geological modelling.Each datum from the training and tests datasets is a vector   ℝ   , with a number of variables   = 5, where (5) is the lithology assigned to this unit.This choice of training variables was motivated by the necessity to account for the available information in terms of geological modelling and measurements, uncertainty and structural setting.The starting model for inversion encapsulates the pre-inversion state of knowledge.The inverted model comprises the update of this model using information extracted from geophysical measurements and translated into a 3D model.The lithological model refers directly to the interpreted geological observations of the area.The spatial gradients of the inverted models provide structural information about the location of the geological units that can be recovered by interpretation from the inverted density contrast model.

Figure 1 .
Figure 1.Summary of topological filtering used during post-regularisation.

Figure 3 .
Figure 3. (a, to the left) Geological map of the area and (b, to the right) complete Bouguer anomaly (reproduced from Giraud et al., 2019a).The red dashed line outline the modelled area.Capital letters 'A', 'B' and 'C' symbolize the possible outlines for the greenstone belts in the area.

Figure 4 .
Figure 4a, b, and c are used for the training and validation dataset for SOM training as explained in 2.2.

Figure 5 .
Figure 5. True density contrast model for calculation of synthetic geophysical data (a); inverted model obtained from geophysical inversion of synthetic geophysical data (b), corresponding spatial gradient of density contrast (c) and synthetic geophysical data (d).

Figure 6 .
Figure 6.Inverted model obtained from geophysical inversion of field geophysical data (a), corresponding spatial gradient of density contrast (b).The classification of lithologies using SOM is performed applying the trained network to volumes shown in Figure 4b, c and Figure 6a, b.The next subsections describe the geological laws used for post-regularisation (subsection 3.1.4)and the classification process (subsection 3.2.2).

Figure 8 .
Figure 8.(a) Elbow curve of the quantization error for the determination of the optimum number of neurons (or units) in SOM and (b) Prediction accuracy for the different lithologies present in the training and validation datasets.Note that after 750 units, the quantization error for the different lithologies stabilizes and oscillates around its maximum values.

Figure 9 .
Figure 9. Recovered lithologies before post-regularisation (a) and after (b) next to the corresponding adjacency matrix (right hand side).The indices used to designate lithologies are indicated in the legend of the volume (left hand side).

Figure 10 .
Figure 10.Frequentist probability volumes of recovered lithologies calculated as per equation (3).The arrows are drawn to support interpretation.

Figure 10
Figure 10 exhibit probabilities between 0.3 and 0.6 in the area marked by the two arrows.This suggests that these these zones are the least-well constrained.Note that from Figure 10, we can interpret the presence of mafic greenstone with confidence as it shows high frequentist probability nearly everywhere classification suggest its presence.For completeness, assessment of the prediction accuracy of the different lithologies is shown by the corresponding boxplot in Appendix B.

Figure 11
Figure11shows that the northern portion of the greenstone belts A and B recovered by geophysical inversion and SOM classification is thinner in their Northern part than was proposed by the initial geological model.Likewise,

Figure 11 .
Figure 11.Mafic greenstone belts and their surroundings following geological modelling only (a) and after SOM classification (bottom).The cells shows in (a) and (b) have the same geographical location, and are coloured according to lithology.The vertical arrows show areas where mafic greenstone is thinner than suggested by geology only.The elliptical shapes shows zones where expected non-mafic greenstone is replaced by the background lithology.
is the forward operator calculating the predicted data  produces;   is the prior model.In equation (1), subscripts ,  and  refer to data, model and smoothness, respectively.In this contribution,   is a diagonal matrix where each element is equal to the inverse of the sumof-squares of the geophysical measurements;   and   are diagonal covariance matrices; here,   is the identity matrix.The scalars   and   are weights controlling the relative importance of the different terms in the equation;  is the spatial gradient operator.The last term of the equation (1), the smoothness term, constraints the structural features of the inverted model.The values in diagonal matrix   are determined from prior information.In the presented work,   is obtained from geological modelling results and is a proxy for geological uncertainty