Evaluating porosity estimates for sandstones based on X-ray micro-tomographic images

We compare experimentally determined porosity with values derived from X-ray tomography for a suite of eight sandstone varieties covering a porosity range from about 3 to 25 %. In addition, we performed conventional stereological analysis of SEM images and examined thin sections. We investigated the sensitivity of segmentation, the conversion of the tomographic gray-value images representing the obtained X-ray attenuation coefficients into binary images, to (a) resolution of the digital images, (b) denoising filters, and (c) seven thresholding methods. Images of sandstones with porosities of 15 5 to 25 % exhibit a bimodal intensity distribution of the attenuation coefficients, enabling unambiguous segmentation that gives porosity values closely matching the laboratory values. For samples with lower porosities, pores and grains do not separate well in the skewed unimodal intensity histograms. For these samples, all tested thresholding methods tend to miscalculate porosity significantly. In addition to absolute porosity, the ratio between pore size and resolution, and mineralogical composition of the rocks affect the biases of the global segmentation methods. 10 Copyright statement.


Introduction
Fluid transport and storage properties of rocks are related to their microstructure (e.g. Bear, 1988;Dullien, 2012;Head and Vanorio, 2016). Porosity, the ratio between void volume and total volume, constitutes one of the most important microstructural parameters owing to its relevance for fluid transport and associated processes but also for elastic and inelastic deformation 15 behavior (e.g. Buryakovsky et al., 2005). Non-destructive structural characterization plays a crucial role in the modeling of effective rock properties, such as permeability (Shah et al., 2015), elastic moduli (Madonna et al., 2012), and electric conductivity (Andrä et al., 2013), relevant for numerous geological applications, such as two-phase flow in hydrocarbon reservoirs (Berg et al., 2013;Singh et al., 2016), freshwater provision (Taina et al., 2008;Ott et al., 2012) or geothermal energy recovery (Gualda and Rivers, 2006;Pamukcu et al., 2012). Specifically, digital rock physics relies on performing numerical experiments on dig- 20 itized microstructures of real rocks (Dvorkin et al., 2011). Before it can be applied with confidence any proposed procedure 2013; Wildenschild and Sheppard, 2013, for reviews). These limitations include partial-volume effects, instrumental noise, reconstruction artifacts and effects due to the use of polychromatic X-rays.
The distribution of X-ray attenuation in a sample is computed by combining a number of projection images taken from various angles and using inversion algorithms, such as filtered back-projection, to reconstruct a volume image (Kak and Slaney, 2001). The result is a discretized volume consisting of small volume elements or voxels. An intensity value representing the 5 local X-ray attenuation coefficient is computed for each voxel. Thus, X-ray attenuation coefficients are averaged over small discrete volume elements. A homogeneous material or a mixture of heterogeneous features significantly smaller than the voxel size might therefore yield a voxel with the same intensity. This non-uniqueness is called the partial-volume (PV) effect, which results in a blurring of all phase boundaries in a CT image. The spatial resolution of the tomographic image produced by conventional X-ray CT is limited by the diameter of the X-ray source, the X-ray photon energy, the absorption of the sample 10 as well as the number and size of pixels on the detector. The nominal spatial resolution is usually defined from measurements of the modulation transfer function or from imaging a series of stationary phantoms e.g. lattices with sharp contrast differences and decreasing size under ideal conditions (e.g. Bouxsein et al., 2010). However, the (isotropic) voxel size is the calculated size of a discrete cubic volume element in the reconstructed image, which can actually be lower than the actual spatial resolution of the image. Still we use the isotropic voxel size as resolution measure because the value is independent of system factors, 15 such as detector noise and reconstruction algorithm. The term resolution is gaining significance, when set into relation to the size of features of interest, in our case the pore size. A resolution higher than the size of pores is mandatory a prerequisite for resolving features accurately.
Limitations for the accuracy of X-ray CT images also arise from beam fluctuations or off-focal radiation (Boone et al., 2012), non-trivial interactions of (polychromatic) X-rays with the sample material, such as beam hardening (van de Casteele et al., automatically using statistical information from the image data (Sezgin and Sankur, 2004b;Iassonov et al., 2009;Wildenschild and Sheppard, 2013). Due to the non-physical units of the image data, this process has to be applied to every image individually.
For porosity estimations, usually only two phases are considered -solid material and pore space -separated by a single threshold value.
A number of more complex methods for phase separation incorporate multiple or local threshold values based on e.g. spatial 5 information or gradients of the intensity distribution such as the watershed algorithm (Vincent and Soille, 1991) or converging active contours (Sheppard et al., 2004). To account for the partial-volume effect and sub-resolution porosity, some authors assign multiple thresholds to phases and their mixtures. Then voxels are identified as "sub-voxel porosity" in addition to "pore" and "solid" (Sok et al., 2010;Teles et al., 2016). Advanced image-acquisition techniques, such as dual energy X-ray CT (Alves et al., 2015;Teles et al., 2016) or injection of contrast agents into the sample and multiple scanning (Akin and Kovscek, 2003; 10 Klobes et al., 1997;Lu et al., 2000;Mayo et al., 2015), are used to enhance the phase contrast. The latter technique requires a differential imaging or so called registration and subtraction procedure. In addition, synchrotron radiation facilities offer high resolution, fast image acquisition of a few nanometer resolution, but are usually restricted to small sample sizes in the range of tens of microns.
3 Material and methods 15 We selected eight sandstone (sst) varieties representing a wide range in composition as well as grain and pore sizes ( Table 1).
The set comprises well-characterized, rather homogeneous, and readily available rock types, including Berea sst, Bentheim sst, Crab Orchard sst, Fontainebleau sst, Oberster sst, Pfaelzer sst, Ruhr sst, and Wilkeson sst, mostly devoid of significant macroscopic evidence for bedding or anisotropy. Three cylindrical cores with diameters of about 3, 10 and 30 mm were diamond-drilled from each of the ten sandstone blocks including three different Fontainebleau varieties. The samples were 20 cut and precision ground to a length-to-diameter ratio of about 2.5.

Petrographic and microstructural sample characterization
For petrographic and microstructural characterization, we analyzed polished thin sections ( Figure A1) of all samples using plane and cross-polarized light microscopy as well as scanning electron microscopy, the latter after carbon coating the thinsections ( Figure 1). Backscattered electron (BSE) images were captured with a Zeiss LEO 1530 Gemini field emission scanning 25 electron microscope (SEM). To constrain porosity we performed point counting (Carver, 1971) on an area of 6.6 mm 2 of the SEM images, which included 130 measurement points. The measurements were repeated ten times at several locations. We determined the volumetric fractions of the major constituents from counting at least 1000 points on a regular grid under crosspolarized light. The results presented in Table 1 assume that the samples are isotropic and no additional biases such as plug-outs are introduced during the preparation process. 30 Quartz is the main constituent of all our sandstone samples. Quartz content ranges from less than 50 % for Wilkeson sst to almost 100 % for Fontainebleau sst (Figure 2). Feldspars and clay minerals are the most important secondary components where A denotes the void area. For the calculation, the SEM images were automatically segmented into binary images using the method by Otsu (1979) to determine a global threshold (see Section 3.3.4). Continuous areas of pixels with the void characteristics are labeled and for each of these features (2) is used. We refrained from separating the labeled voids into specific 10 pores and throats for instance by applying a watershed algorithm (e.g. Rabbani et al., 2014). Thus, our void-size distributions provide a suitable basis for the assessment of feature size rather than a precise characterization of the pore space geometry.

Laboratory measurements of porosity
We determined two measures of sample porosity, total porosity and effective porosity. The total porosity φ tot,lab is calculated from the ratio between bulk density ρ b and matrix density ρ m according to Bulk density was derived from the samples with 30 mm diameter by weighing after drying at 60°C and geometrical estimation 5 of bulk volume V b according to where d denotes the sample diameter and h the sample height. Matrix or grain density ρ m was derived from measurements with a capillary pycnometer according to DIN 18124 using powders prepared by crushing and grinding pieces of the sandstone blocks and distilled water as working fluid: where m powder , m pyc , m pyc+fl , m pyc+fl+grain and V pyc denote the masses of crushed sample powder, empty pycnometer, mass of pycnometer filled with water, mass of pycnometer filled with water and powder and the volume of the pycnometer, respectively. We report the average of three measurements for bulk and matrix densities. Effective porosity φ eff,imb was deduced from the mass gain due to water imbibition. We evacuated and then immersed the samples in deionized water for at least 24 hours. We measured the water content by weighing samples after removing excess water from their surfaces. The effective porosity φ eff,imb calculates from the difference in mass before (m dry ) and after saturation (m sat ) according to 5 We also determined the effective porosity using the gas expansion method (Washburn and Bunting, 1922). First samples were placed in a pressure chamber of volume V 2 and the system was evacuated reaching a pressure level p 2,in close to 10 2 Pa.
Pressurized Argon was then released from a second chamber of volume V 1 with a pressure p 1,in of around 1 MPa. After expansion of the gas, a pressure level p final is reached in the two connected chambers. Employing the ideal gas law, the effective matrix volume V eff,matrix is calculated according to The effective porosity then is To assess the quality of our porosity measurements we calculated the propagation of uncertainties (Taylor, 1997). A detailed description of the uncertainties for our laboratory measurements is given in Appendix A2. The tomographic datasets were recorded with a CT scanner manufactured by ProCon X-Ray based on their CT-Alpha model.
The CT scanner is equipped with a multifocal X-ray tube with a maximum acceleration voltage of 225 kV and with a maximum electron-beam power on target of 2 W, 5 W and 50 W for spot diameters of 1 µm, 5 µm, and 25 µm, respectively. The transmitted 20 X-ray intensities are captured using a 190 by 240 mm Varian flat-panel detector with 1516 by 1900 pixels. Each pixel in the detector has an edge length of 127 µm and a gray-value resolution of 14 bit resulting in a nominal system resolution of around 1 µm. In our set-up, the X-ray source and detector are fixed during imaging process, while the sample is continuously rotated.
We recorded a total of 1600 images corresponding to rotation steps of 0.2°.
The sample diameter determines the scanning parameters. We positioned the sample and detector in such a way, that the 25 projection of the object almost completely covers the detector. The resolution is controlled by sample position between X-ray tube and detector. The geometrical magnification of an X-ray system is defined by the source-detector-distance (SDD) divided by the source-object-distance (SOD) (Voland et al., 2010). Consequently, the volume resolution (voxel size) increases with decreasing sample diameter of the cylindrical  (Table B1).
We adapted acceleration voltage, current and exposure times to make optimal use of the detector sensitivity. The measurements were performed at acceleration voltages between 70 to 160 kV and target currents between 25 to 180 µA depending on 5 core size and material (Table B). We implemented metal filters that fit exactly on the collimator cover of the X-ray tube to pre-harden the X-rays and to reduce beam hardening artifacts. We recorded the majority of samples with 3 and 10 mm diameter using an aluminum filter of 1 mm thickness. A 0.5 mm thick copper filter was implemented for all samples with 30 mm diameter. Exposure times were 100 ms, 300 ms, and 450 ms for 3 mm, 10 mm, and 30 mm samples, respectively. Some 3 mm samples were scanned with a focal spot diameter of 1 µm at 80 kV and 25 µA. To counterbalance the reduced beam power, we 10 then increased the exposure time to 1000 ms and omitted the pre-hardening of the X-rays.

Reconstruction and image processing
The reconstruction process for the conical X-ray beam, based on a Radon transform in the form of a convolution and back filter (Feldkamp et al., 1984;Turbell, 2001), was performed with the Volex reconstruction software package (Fraunhofer-Allianz Vision, 2012). The software also corrects for ring-artifacts. The voxel intensities are derived from the loss of energy of the 15 X-rays and are mapped to a 16-bit integer range between zero, corresponding to the background, in our case air, and 65535, corresponding to the detection limits of the CT-Scanner.
After the 3D volume reconstruction, we processed and filtered the images using the software package Avizo Fire, Version 9.0.1 © (FEI Visualization Sciences Group, 2016). In a first step, we divided the reconstructed datasets into nine cuboid subvolumes with a size of 350×400×500 voxels located well inside the sample volume to reduce computation time and to 20 exclude non-representative surface areas and avoid the rims affected most by beam hardening artifacts. We selected the subvolumes to overlap by 50 voxels along one edge ( Figure 3).
We tested the influence of image filtering on segmentation results by processing four different datasets: unfiltered data and data filtered with a median filter (Median), a non-local means filter (NLM), and a combination of both (Median+NLM). The median filter is a morphological filter that replaces a specific entry of the data matrix with the median of specified neighboring 25 entries (Huang et al., 1979;Ohser and Schladitz, 2009). We applied the median filter in three dimensions, comparing each voxel with 26 neighbor voxels by using a 3x3x3 cube as structuring element. We applied the filter in three successions. Apart from random noise, the filter supposedly also reduces ring artifacts originating from the sample's rotation during the scanning process (Ketcham and Carlson, 2001).
The NLM filter employs an algorithm in which a voxel of interest is weighted by the similarity to its surrounding voxels 30 (Buades et al., 2005). We used two different windows, with a diameter of 21 voxels for the determination of the similarity weights and a diameter of 5 voxels for the actual weighted median filtering. x-z (left) and (right) z-direction parallel to core-axis.

Image segmentation and porosity determination
To visualize the distribution of voxel intensities in the image data, we use histograms that represent the frequency of each intensity value of the reconstructed image data. The histogram thus provides a general overview of the image characteristics, such as gray-level distribution, which is a function of the material densities, the average luminance of the images as well as contrast and noise. The intensity distributions can be described using statistical terms including mean, which is the average 5 intensity, median, which divides all intensity values into two equal halves, and mode, which is the most frequently occurring value. When these three measures fall on the same intensity value, the intensity distribution is a normal distribution. In contrast the mean and median are not identical for asymmetrical distributions, which are called skewed. For a dense mono-mineralic, continuous and homogeneous material we would expect a more or less normal distribution of intensities due to image noise.
When several of such materials compose a sample, we might be able to identify several classes of voxel intensities by the 10 occurrence of local maxima, i.e., mulitmodal distributions. These classes can be overlapping yielding skewed distributions and complicating separation of the classes.
For a quantitative characterization of pore space, the intensity image is converted into a binary image. In this binary image all voxels are either assigned to pore or solid (Iassonov et al., 2009;Andrä et al., 2013). This process is called image segmentation.
Thus, quantitative estimates of phase contents rely on an appropriate segmentation of the phases of interest (Stock, 2013). 15 Generally, one distinguishes global and local segmentation methods (Sezgin and Sankur, 2004a;Kaestner et al., 2008;Iassonov et al., 2009;Schlüter et al., 2014). Global segmentation methods define absolute thresholds for the entire image. An initial image Img ini , containing the intensity field I ini (x, y, z), is converted to a segmented or binarized image with the threshold T at a certain voxel intensity. This approach is easily extended to multiple classes by defining multiple thresholds. Thresholds are either determined -albeit somewhat arbitrarily -by visual inspection or by algorithms that are in most cases based on statistical or geometrical evaluation of the image data histogram. In contrast, local segmentation accounts 5 for neighborhood characteristics usually in combination with intensity thresholding as a starting point.

Implemented thresholding algorithms
We apply seven common global segmentation methods to each of the nine subvolumes extracted from the tomographic datasets.
Binarization was carried out for unfiltered and filtered images. Some of the algorithms are designed for bimodal distributions, while others are applicable to multimodal threshold problems with n classes and n-1 thresholds. We expect two distinct classes 10 in the intensity distributions according to the significant density contrast of air with 1.2 kg/m 3 and rock forming minerals e.g.
quartz with a density of about 2500 kg/m 3 (Deer et al., 2011). Open pore space should always be associated with lower intensity values than the minerals composing the investigated rocks. We therefore consider all voxels with intensities below the threshold as pore space and all voxels with higher intensities as solid material. In some samples with significant amount of high density minerals, e.g. oxides and sulfides, the intensity classes of these minerals were much more prominent than the porosity class. 15 Thus, to simplify thresholding, we ignored intensities larger than the intensity with the largest frequency g mode1 , which we attribute to solid quartz because quartz is the major constituent of the examined sandstones.

Fixed ratio thresholding
For fixed ratio (FR) thresholding, we assume that an optimum threshold is fixed relative to the intensity of (quartz) grains (i.e. g mode1 ) and the intensity level g dark separating solid and pore space. We somewhat arbitrarily define the background intensity 20 to be smaller than 0.001 times the counts of g mode1 . The intensity level g dark is then placed at the first intensity value exceeding the background intensities and which can not be attributed to the pore space. The FR algorithm sets the threshold between g dark and g mode1 using a user-defined ratio. From a visual comparison of binarized images and the intensity images, we choose a threshold level of 25

Maximum variance
The maximum variance (MV) algorithm proposed by Otsu (1979) searches for the threshold T that minimizes the variance in each class while maximizing the variance between the classes. For two classes, the intra-class variance σ 2 w is defined as a weighted sum of the variances σ 2 0 and σ 2 1 within the two classes: The weighting factors ω 0 and ω 1 are the probabilities of the two classes separated by the threshold level. The variance between the classes calculates 5 where µ T denotes the average of all values and µ 0 and µ 1 denote the mean values within each class defined. The algorithm to find the threshold. We applied the MV thresholding algorithm only to the intensity range between zero and g mode1 to avoid misclassification by threshold levels higher than g mode1 in samples with significant volume fractions of high gray-value (bright) 10 voxels.

Unimodal thresholding
Unimodal (UM) thresholding was designed for images with a unimodal intensity distribution (Rosin, 2001), i.e., where the intensity values of grains and pores substantially overlap in the histogram. The minor secondary population corresponds to the broad, gentle slope in the skewed intensity histogram. A straight line is drawn from the bin with the highest intensity to the first 15 bin above the background noise level g dark . The threshold is then determined by finding the maximum perpendicular distance between this straight line and the histogram curve.

Minimum error thresholding
Minimum error (ME) thresholding introduced by Kittler and Illingworth (1986) assumes that the histogram is composed of normal distributions, i.e. Gaussian modes with mean µ i and standard deviation σ i (Drobchenko et al., 2011). The criterion is minimized.

Iterative k-means clustering
Like the ME, the iterative k-means clustering (KM) (Ridler and Calvard, 1978) assumes Gaussian modes albeit with uniform 25 standard deviations for every class (Birchfield, 2016). The algorithm partitions the voxels of an image into clusters by using a similarity feature, like intensity of voxels and/or distance of voxel intensities from centroids (Luo et al., 2014;Jaroš et al., 2017). An arbitrary initial threshold, e.g. 50 % of the maximum intensity, defines the initial classes. Then a new threshold T is determined according to where µ 0 and µ 1 denote the mean values of the two classes defined by the initial threshold (Ridler and Calvard, 1978). The optimal threshold is calculated by iteratively repeating this step until the threshold level does not change any more. 5 Shape thresholding and mid-point thresholding Shape (SH) thresholding by Tsai (1995) is a histogram-based approach for the segmentation of multiple phases in multimodal or unimodal histograms. The original histogram is first smoothed with a Gaussian window function, then local peaks and valleys are employed as thresholds. For unimodal histograms, the algorithm detects and selects the point of maximum curvature as threshold level. However, for many of the unimodal histograms recorded here the algorithm selected the point of maximum 10 curvature at a threshold level with an intensity larger than g mode1 . Therefore, it failed to separate pore intensities from grain intensities and we did not employ this algorithm. Instead we slightly modified the SH algorithm using the mid-point (MP) thresholding method of Saenger et al. (2016). While the position of the major and minor mode of a bimodal histogram are determined as described above for the SH algorithm, a threshold T is determined according to 15 4 Results

Laboratory measurements of porosity
Bulk densities of the investigated sandstones range from 2000 to 2600 kg m -3 , while grain densities range from 2600 to 2680 kg m -3 (Table 2). Bentheim sst exhibits the lowest and Ruhr sst the highest bulk density of the sample collection. The uncertainty is in the same range as the standard deviation for the three repeat-measurements of bulk density, while the un-20 certainty is two orders of magnitude lower than the standard deviation for grain density. The mismatch between uncertainty and standard deviation in case of grain density might indicate user-dependent inaccuracies in the experimental procedure such as unrecorded changes in the environmental conditions or insufficient evacuation of the pycnometers, due to e.g. clumping of sample powder.
Total porosity φ tot,lab covers a range of about 3 to 25 % ( Figure 4, Table 2). In most cases, effective porosity φ eff,imb derived 25 from water saturation and effective porosity φ eff,gas derived from gas pycnometer measurements either match total porosity φ tot,lab within uncertainty or fall below it (Figure 4). The effective porosities φ eff,gas deduced from gas pycnometer measurements are greater than the effective porosity φ eff,imb from water imbibition except for FS9. Standard deviations for several measurements are usually between 0.01 and 0.5 % for total porosity while error propagation yields absolute errors of ± 1.9 %.
Total porosity derived from analyzing SEM images, φ tot,SEM , shows large deviations compared to φ tot,lab and generally also 30 13  (Table B1). The standard deviations, deduced from analyzing ten randomly selected images of each sample, range from 0.7 to 4.6 %. Based on the results of the laboratory measurements we classify the investigated sandstones in three groups with respect to porosity and mineralogical composition: A. Highly porous (15 -25 %

Voxel-intensity distribution in raw images
Images obtained using X-rays pre-hardened by filters have a higher signal to noise ratio than those taken without a filter. The quality of images taken with a focal spot diameter of 1 µm and without X-ray filters ( Figure 6, FS9) is similar to images recorded with a focal spot diameter of 5 µm using aluminum filters (Figure 6, FS8).  Some general trends emerge from the intensity histograms of the reconstructed images ( Figure 6). As all histograms in Figure 6 are scaled equally, higher average voxel intensities indicate higher signal to noise ratios. In general, images with a voxel size of 2.5 µm have a lower signal to noise ratio than images with lower resolutions due to reduced X-ray energies. The low-resolution images of FS yield the highest signal to noise ratios. Because the absolute voxel intensities are not normalized to a common range of values but depend on the CT system configuration, including sample size and X-ray beam energy, the 5 absolute voxel intensities cannot be directly related to physical units.
We can distinguish four classes of voxel-intensity distributions for the images: 1. Distinctly bimodal distributions, which occur only for the high-resolution images of group A rocks.
2. Bimodal distributions with a negative skewness of the major peak and a rather small secondary minor peak. These distributions are characteristic for CT images of group A at intermediate resolution as well as high-resolution images of 10 rock samples from group B.

Low resolution images of group A rocks as well as some intermediate resolution images of group B rocks yield unimodal
distributions with negative skew and long tails.
4. All images of group C rocks as well as some low-resolution images of group B rocks correspond to unimodal intensity distributions without a distinct skewness. 15 Filtering generally alter these voxel-intensity histograms. The class modes increase and the widths at half maximum decrease ( Figure 7). When the distributions of multiple classes overlap, filters separate these classes in some cases, i.e. single skewed peaks in the unfiltered data are then transformed to multiple peaks (Figure 7a,b). However, small classes as well as minor local maxima are usually smoothed out and disappear from the histograms. In addition, median filtering results in blurred phase boundaries in the images (Figure 7f). The combined Median+NLM filter yields nearly identical results to the median filter 20 only.
Compared to the median filter, the NLM filtering increases the contrast at phase boundaries significantly. The filter removes statistical noise without considerably altering the geometry of small features (compare Figure 7e and 7g). Valleys between modes become more pronounced in the intensity histogram (Figure 7c).
The appearance of histograms, (1) distinct bimodal, (2) skewed bimodal with small secondary peak , (3) skewed unimodal 25 and (4) symmetric unimodal depends on the resolution of the image as well as the texture and mineralogy of the sample (see Figure 8). A symmetrical peak appearance is characteristic for FS samples and GBS sample predominantly composed of quartz. The major mode is represented by a normal distribution of voxel intensities, which are attributed to solid grains i.e.
quartz. The smaller, secondary peak in the bimodal intensity histograms is considered to represent the pore space. However, in most cases the different phases are not represented by well separated, normal distributed X-ray coefficient classes. Instead, 30 the histograms represent a superposition of overlapping distributions. A distinction between the voxel intensities of pore space and solid grains is not straightforward for multi-phase samples like GR and PS characterized by a prominent skewness of the maximum peak to the right, which seems to reflect compositional variability. The significant amount of high intensity voxels size (Figure 8). The amount of pores larger than 5 µm is limited in the samples of group C resulting in unimodal distributions ( Figure 8). Hence, at a voxel size of 2.5 µm a large amount of unresolved pore space is incorporated as partial-volume (PV) voxels in the reconstructed gray-value images.

Global thresholding and porosity calculation
For comparison, all thresholding algorithms were applied to the unfiltered and filtered images. Images with distinct bimodal We calculated the mean value for total porosity φ tot,CT and the corresponding standard deviations σ tot,CT by averaging over 10 nine subvolumes for each of the applied thresholding methods (Figure 9). In most cases, porosity φ tot,CT derived from the CT images significantly deviates from porosity determined in the laboratory (Figure 9). Moreover, the different thresholding algorithms yield highly variable porosity estimates. The discrepancy between laboratory measurements and CT-derived porosities is up to more than δ φ = 20 %. Images of GBS (φ tot,lab = 22.5 %) yield the highest variability in φ tot,CT ranging from around 37 down to 2 % at resolutions of about 8 and 22 µm. Rocks of group C yield φ tot,CT close to zero comparable to φ tot,lab from 15 laboratory measurements. The higher degree of secondary quartz cementation in the monomineralic samples of Fontainebleau In general, the difference between lab and CT porosity increases with increasing sample size and decreasing porosity.
Particularly, the total porosities φ tot,CT of high porous sandstones such as GBS, PS and B with a diameter of 3 mm are in good agreement with experimentally determined porosities for either FR, MV, UM or ME algorithm. Especially, the MV method yields the overall smallest average deviation from the total porosities measured in the laboratory. Binarization using the SH and MP methods underestimates porosities φ tot,CT for the majority of samples, while the KM algorithm tends to overestimate

Discussion
The link between the shape of the intensity distribution and the resolution observed by us has previously been inferred from numerical binning (e.g., Guan et al., 2018). In image processing, binning refers to a technique for combining a cluster of several pixels to a single pixel with an averaged intensity value. The amount of unresolved pore space incorporated in the reconstructed gray-value images as partial-volume (PV) voxels increases for decreasing resolutions. By reducing the sample 5 diameter we decreased the voxel size (for a given material) and thereby increased the ratio between pore size and voxel size allowing us to monitor the influence of the partial-volume effect on the intensity distribution. Results for the monomineralic rocks, such as GBS, B or the iron bearing PS, illustrate that a reduction of resolution directly corresponds to an increasing number of PV voxels associated with a transition from bimodality to (left) skewed unimodality (Figure 8). The increasing number of PV voxels tends to smear out the modes of bimodal distributions associated with the solid grains and pores resulting 10 in skewed, unimodal intensity histograms or a plateau bridging both peaks in the bimodal intensity histograms. These skewed unimodal intensity distributions are provoked by insufficient resolution rather than by micro-structural characteristics of the solid phases in the rocks. Fitting two normal distributions, one for pores and one for the solids, to such bimodal histograms gives a rough estimate of the amount of PV voxels ( Figure 11). As evidenced by FS samples, overall porosity plays a subordinate role for the uncertainty associated with porosity estimates using tomographic images for a given void size distribution; the shape of 15 intensity distributions ( Figure 6) remains nearly identical for the porosity values of 9.9, 6.9 and 5.0 % that do not differ in their pore-size distributions ( Figure 5). Only the height of the secondary mode associated with open pores is reduced as a function of decreasing porosity within the intensity distribution (see also Lindquist et al., 2000).
Global thresholding methods can only assign one phase to each voxel, which inherently limits the accuracy of these methods (van Leemput et al., 2003;Kato et al., 2013). Accordingly, the uncertainty for porosity increases with the amount of pore space 20 imaged in PV voxels. In contrast to the tomographic images, our highly resolved SEM images have a maximum resolution of around 20 nm per pixel and enable the identification of features small compared to the X-ray resolution, but at the disadvantage of being two-dimensional and with a limited field of view. Ratios of pore to voxel size close to or even smaller than one result in large measurement errors of about 20 % when porosities are determined by global thresholding algorithms. Porosity estimates are reliable when the size of a statistically representative quantity of pore-space features exceeds the resolution by at least 25 an order of magnitude (compare Figure 5 with Figure 9). However, the exact limit for the ratio of pore to voxel size varies depending on factors such as total porosity, mineralogical composition, shape of pore space features (e.g. elongated, spherical) etc.
Performing laboratory measurements such as mercury intrusion to determine the amount of pore-throats below the resolution limit, which can be regarded as an upper limit for unresolved, effective porosity has previously been recommended (Lu et al.,30 2000; Tanino and Blunt, 2012;Saenger et al., 2016). Mercury intrusion porosimetry measurements can be used as an additional indication for increased uncertainties of CT based porosity estimates. Generally, any comparable laboratory measurements should be performed on the samples being scanned to validate the quantitative porosity estimates, but should not be used to constrain segmentation methods. Filtering generally introduces a bias due to the removal of minor features resulting in a higher chance of underestimation of porosity. We therefore conclude that filtering generally does not improve porosity determination. Typical reservoir rocks such as carbonates, shales and breccias were also analyzed in recent studies (Table B2). These rocks are characterized by multimodal pore-size distributions and complex pore geometries. In agreement with our observations, these previous studies found that porosity estimates based on the analysis of tomographic data clearly differ from the labora-5 tory measurements by relative deviations up to more than 50 % (e.g. Fusi and Martinez-Martinez, 2013;Lai et al., 2015;Voorn et al., 2015;Teles et al., 2016). A great majority of the applied evaluation methods for tomographic images tend to underestimate porosity compared to laboratory measurements. Only minor differences between the different lithologies exist. Absolute deviation of porosity is largest for dual energy CT (Teles et al., 2016). Porosity estimates based on tomographic images using synchrotron radiation are slightly better than those based on single energy CT, due to the increased resolution. However, the 10 performance of methods is not only determined by the CT system itself, but is also influenced by the subsequent global and adaptive segmentation methods. The diversity of applied methods hampers the comparison between different studies. Segmen- tation workflows based on registration and subtraction techniques designated to identify sub-resolution pore space outperform the other methods slightly, but porosity results are still erratic with a high uncertainty (e.g Mayo et al., 2015;Lin et al., 2016).

Conclusions
We determined the porosity of eight different sandstones using standard laboratory experiments and tomographic images.
We acquired the images with varying resolution and pre-processed the data with denoising filters. Image segmentation was 5 performed using seven different tomography-based thresholding methods. We show by the aid of reconstructed gray-value images and corresponding intensity histograms, that the tomographic pore-volume quantification is unreliable in the light of values gained from conventional laboratory methods. Even for the highest resolution, porosity quantification is imprecise due to (partly) unresolved pore structures, as indicated by analyzing SEM images. Consequently, every digital dataset deduced from X-ray CT is characterized by rock and resolution specific uncertainties, that are usually greater than the uncertainties of 10 comparable laboratory measurements.
Fairly accurate quantification of porosity from tomographic images can be expected for a ratio of at least 1:10 between voxel size and the size of features of interest, here void size. However, even a relatively high resolution cannot overcome the fundamental shortcomings concerning the determination of porosity from computerized X-ray imaging methods. For example, polychromatic X-rays, X-ray scattering and system noise contribute and interact with each other to a variable degree in de- 15 pendence of X-ray energy and material properties. The complex, non-linear relationship of these physical processes hampers an analytical conversion of the measured intensities to density values. The inaccuracy of porosity determination due to technical limitations can be reduced in part by using advanced experimental methodologies such as monochromatic synchrotron radiation or optimized noise reduction methods. These methods are however time-consuming and cost expensive as well as challenging concerning technical implementation and/or computing power. 20 In addition to these technical difficulties, general physical characteristics, such as partial-volume effects and sample inhomogeneity, cause errors. It is not possible to distinguish between a non-resolved boundary between two phases or an intermediate material density of a single phase. The uncertainty related to partial-volume effects could only be effectively minimized by increasing the resolution as our data indicates, but the ambiguity of remaining partial volumes remains. All subsequent segmentation and evaluation techniques for determining geophysical rock properties quantitatively are based on certain assumptions 25 about the amount and manifestation of these partial volumes. Generally, the applied global segmentation methods result in a fairly good accuracy for porosity estimates with relative deviations of about 10 % compared to laboratory measurements in case of distinct bimodal intensity distributions. The uncertainty is increased for multiphase systems with overlapping distributions.
The shape of the intensity distributions is determined by the overall porosity and its size e.g. macro-, micro-or nano-porosity and could be used as an indicator for the accuracy of porosity determinations by X-ray computed tomography. 30 Though quantitative investigations are erroneous when resolution limits are reached, X-ray computed tomography yields unprecedented constraints on the spatial distribution of pore-space features useful for the evaluation of anisotropy, for example.
It is recommended to complement the evaluation of tomographic images with (FIB-) SEM analysis to ensure a match between sylvanian age. The sandstone consists of quartz (approx. 80 to 85 %), feldspar and some lithic fragments. Mica and sericitic clay minerals formed during early diagenesis largely destroyed the primary porosity (Benson et al., 2005).  French and Worden, 2013). It is fine grained, grain supported, very well sorted and was only buried to shallow depths. The sub-angular grains have grain diameters of about 150 µm. Rarely, kaolinite occurs and secondary amorphous or microcrystalline quartz can be distinguished from primary quartz (French and Worden, 2013). Our suite includes samples with various degrees of siliceous cementation and differing porosities.
Bentheim sst (GBS) is a homogeneous, well sorted, usually yellow colored quartz arenite from the Romberg quarry next 15 to Bad Bentheim (Germany). The average size of the sub-rounded grains is 250 µm. The sandstone was deposited during the lower Cretaceous in a basin north of the London-Brabant Massif and the Rhenish Massif in a shallow marine environment (Kemper, 1976). It consists mainly of quartz (around 90 %), minor K-feldspar or microcline and rare clay minerals. Siliceous cement occurs between quartz grains. Rare opaque minerals occur representing iron-rich minerals, such as hematite or pyrite.
The gray Ruhr sst (GR) is an arkose of upper carboniferous age, deposited in the Variscan foredeep in a fluvio-deltaic 20 environment, and buried and folded during Variscan orogeny (Karg et al., 2005;Jasper et al., 2010).

A2 Error propagation and calibration details
The uncertainty of a function q results from the uncertainties from any number of independent and random measured quantities x, ..., z according to The uncertainty of the laboratory based measurements of total porosity φ tot,lab and effective porosity φ eff,imb was calculated 5 using an instrument and machining precision of 0.10·10 -5 m for the sample dimensions and a weight uncertainty of 3·10 -7 kg.
In addition, we assumed an uncertainty of 30 kg/m 3 for fluid density and of 5·10 -10 m 3 for the pycnometer volume, respectively.
We used argon porosimetry for the determination of effective porosity, which requires an accurate determination of the volumes of the two pressure vessels. Several gas-tight acrylic volumes were used for the calibration procedure. The linear system of equations for the volumes V 1 and V 2 can be deduced from relation for pressure after connecting both pressure where p 1,in , p 2,in and p final denote the initial pressure in vessel one, the initial pressure in vessel two and the final pressure of the connected system, respectively. Measurements at various combinations of the standard acrylic volumes placed inside the vessels were performed to solve the over-determined linear equation system. This method enables the calculation of the 15 dimensionless ratio of pressure changes P K0 according to Plotting P K0 as a function of volume enables the calculation of the slope m and intercept ∆y ( Figure A2), which can be used to calculate the volume of the two pressure vessels The reported uncertainties () Table 2