Explosive volcanic eruptions are commonly characterized based on a thorough analysis of the generated deposits. Amongst other characteristics in physical volcanology, density and porosity of juvenile clasts are some of the most frequently used to constrain eruptive dynamics. In this study, we evaluate the sensitivity of density and porosity data to statistical methods and introduce a weighting parameter to correct issues raised by the use of frequency analysis. Results of textural investigation can be biased by clast selection. Using statistical tools as presented here, the meaningfulness of a conclusion can be checked for any data set easily. This is necessary to define whether or not a sample has met the requirements for statistical relevance, i.e. whether a data set is large enough to allow for reproducible results. Graphical statistics are used to describe density and porosity distributions, similar to those used for grain-size analysis. This approach helps with the interpretation of volcanic deposits. To illustrate this methodology, we chose two large data sets: (1) directed blast deposits of the 3640–3510 BC eruption of Chachimbiro volcano (Ecuador) and (2) block-and-ash-flow deposits of the 1990–1995 eruption of Unzen volcano (Japan). We propose the incorporation of this analysis into future investigations to check the objectivity of results achieved by different working groups and guarantee the meaningfulness of the interpretation.

Pyroclast density and porosity are commonly used to reconstruct eruptive
dynamics and feed numerical models. The pyroclast density

The mass of a pyroclast

If the density DRE (dense rock equivalent,

It is important to note that measuring the density and the porosity of
irregularly shaped pyroclasts is not straightforward. In particular, the
parameter

Another important aspect of density/porosity analysis is that pyroclastic
deposits commonly present a large range of density values, so sample sets
must comprise a large number of clasts. Additionally, the results must be
checked for a low amount of bias due to preferential sampling during
fieldwork. Then the density and porosity results are generally treated
statistically using frequency analysis including average and distribution
histograms. These analyses are often used to interpret volcanic structures
or explosivity (Kueppers et al., 2005, 2009; Belousov et al., 2007;
Shea et al., 2010; Mueller et al., 2011; Farquharson et al.,
2015). The main issue in this approach is that density and porosity are
considered thermodynamically as intensive properties that are not additive
unlike extensive properties such as mass or volume (White, 2012). As a
consequence, if it cannot be added, it should not be possible to average
(sum divided by number of measurement) intensive properties. Pyroclast
density is size dependent even for samples with a homogeneous bubble
distribution (increase in density for particles smaller than the average
bubble size, e.g., Eychenne and Le Pennec, 2012). This effect can be even
stronger for heterogeneous pyroclastic material that commonly shows bubble
gradients. Therefore, the average density

The non-additive property of density and porosity also limits the use of frequency histograms. For statistical analysis on the density/porosity distribution, the measurements must be weighted adequately to be physically meaningful.

The purpose of this paper is to present a simple method to obtain weighted
statistics in order to analyse density and porosity data. We also propose a
stability analysis that allows the quantification of the quality of the
sampling and the relevance of the results. In order to standardize the
description of grain-size distribution of sediments, Inman (1952) proposed a
set of graphical parameters based on statistical analysis. The new
parameters such as graphical standard deviation and graphical skewness
allowed quantifying descriptive terms such as poor or good sorting. Few
years later Folk and Ward (1957) proposed revised parameters that better
describe natural material, in particular polymodal distributions. They also
introduced the kurtosis that helps to describe the shape of the mode. These
parameters have been used ever since to characterize and distinguish
volcanic deposits (Walker, 1971). We propose to adapt those equations to
describe density and porosity distribution. This methodology is incorporated
in an open-source R script (

We chose two large data sets from different pyroclastic deposits in order to assess the validity of our approach. The Chachimbiro data set (Bernard et al., 2014) is made of 32 sample sets from different outcrops of the 3640–3510 BC directed blast from Chachimbiro volcano, Ecuador (Supplement A). Each sample set contains between 15 and 103 clasts of the 16–32 mm fraction measured using the methodology of Houghton and Wilson (1989). The Unzen data set (Kueppers et al., 2005) is made of 31 sample sets from block-and-ash-flow deposits from the 1990–1995 eruption of Unzen volcano, Japan (Supplement B). Each sample set contains 24–33 large pyroclasts (> 64 mm) measured according to the methodology presented in Kueppers et al. (2005).

Abundance histograms (

In order to perform a thorough statistical analysis of density and porosity
data, each clast measurement in a sample set with a number of measurements

Here we chose to present the weighting by volume but the same resolution can
be used to weight by mass. Equation (1) can be reformulated as follows:

Then Eq. (6) can be inserted in Eq. (4):

Using Eqs. (5) and (7):

Abundance histograms and cumulative plots are typical graphical
representations of density and porosity data (Fig. 1). The
representativeness can be used to create weighted graphs. For the abundance
histogram, in each interval we sum the representativeness of the
measurements instead of counting the number of measurements and dividing it
by

One of the main questions when performing a density and porosity analysis on
pyroclastic deposits is the following: how many measurements are required to have a
statistically representative sample set? The sample set size, here expressed
as the number of measurements

Each run with random ordering leads to a different AE after a certain number
of measurements. We chose to represent the 95th quantile (

As the frequency analysis is not suitable for density and porosity data,
some interesting statistical parameters, such as the standard deviation, are
difficult to obtain. Based on the work achieved to characterize better
grain-size distribution (Inman, 1952; Folk and Ward, 1957), we propose for
the first time a similar approach to calculate the graphical statistics of
density and porosity using the cumulative plots (Fig. 1b and d). The main
difference between graphical statistics for grain-size distribution or for
density data is not the equations but the data itself. Grain-size data
obtained through sieving are partial data as the grain-size distribution
inside each size class (

Stability curves obtained after 1000 runs for two samples from Chachimbiro and Unzen data sets. Note the constant slope below the 5 % threshold.

Inman (1952) defined three parameters:

the graphical median Md is a proxy of the average:

where

the graphical standard deviation

the graphical skewness Sk characterize the asymmetry of the data distribution:

Folk and Ward (1957) proposed different parameters supposed to be more
representative of natural distributions, in particular for bimodal or
polymodal distributions. The main difference with Inman's parameters is the
inclusion of a 1-

the graphical mean Mz:

the inclusive standard deviation

the inclusive skewness SkI:

the graphical kurtosis K:

It is important to note that the values of graphical median and mean should be relatively close to the weighted average. Nevertheless, as the weighted average is physically the most accurate value, we propose to use it for graphical representation. Standard deviation, skewness and kurtosis are yet to be used to characterize density and porosity distributions, but they are useful.

An open-access R code has been created to automate the calculations
presented above. Additionally it facilitates the automatic creation of
abundance histograms, cumulative plots, and stability curves. The input file
must be in the format

first column: pyroclast mass (in kg or g);

second column: pyroclast volume (in m

third column: pyroclast density (in kg m

fourth column: pyroclast porosity (in decimal from 0 to 1).

The columns should have a header. All the values must have the decimal point
separator for the R code to run properly. The name of the file should
correspond to the name of the sample set to avoid confusion when compiling
large data sets. The R code is provided in the Supplement
(Supplement C) and to run the code only three commands are required in R:

set the working directory where the R code and the input file are
located: setwd(“

load the code:

run the code:

Comparison between frequency and weighted analyses.

For large data sets it is possible to create a list of csv files and treat
them with a loop:

create the list:

run the code for the list:

The R code generates a text file with the statistical results and the
figures in pdf format. Compiling the Chachimbiro (33 sample sets, 1492 clasts)
and Unzen (32 sample sets, 922 clasts) data sets with the R code with
1000 runs for the stability analysis of each sample set take respectively 36
and 22 s on a 4 GB RAM computer (

The absolute difference between frequency and weighted density/porosity
averages for Chachimbiro and Unzen data sets is up to 4 and 2 % (Fig. 3a,
Supplement D), respectively; that is close to the analytical error
(< 5 %). This difference is not as important as the relative
difference between individual sample sets per volcano. To highlight this we
chose two sample sets from the Chachimbiro, 021-B and 089-A. These samples
have almost the exact same frequency density average (1961 and 1960 kg m

Results of the stability analyses for the Chachimbiro and Unzen data sets. Note that there is a large scattering for Chachimbiro data set below 40 measurements while the Unzen data set has much less dispersed values.

Graphical parameters for the Chachimbiro and Unzen data sets. Only
high stability (slope < 0.5 %) sample sets are used in this
figure. Note that the two data sets show lower superposition with the Folk
and Ward parameters than with the Inman parameters, in particular when using
the Skewness

For both of our study cases, the number of measurements and the number of samples per deposit is large enough for the effect of one method compared to the other to be minimum (few percent of deviation), even though laboratory experiments have shown that porosity is one of the main parameters that controls fragmentation during explosive eruptions under the presence of bubbles with gas overpressure (Alidibirov and Dingwell, 1996; Spieler et al., 2004). Therefore a change of only few percent of porosity might induce a large error on the calculation of pre-eruptive conditions such as overpressure, fragmentation depth, permeability and rock strength (Mueller et al., 2005; Heap et al., 2014). It is difficult to assess the effect of the statistical method based on literature as most of the publications only provide the final density and porosity data sets and not the raw data (mass and volume).

The stability analysis (see Sect. 2.3) can be used to assess the quality of the sampling and also to estimate the minimum number of measurements required to obtain meaningful results. When comparing the slope of the stability curve below the 5 % threshold and the number of measurements from the Chachimbiro data set, it appears that sample sets with more than 40 clasts have a high stability (Fig. 4, Supplement D). Below 40 measurements there is scattering in the results (from high to low stability) probably associated with differences in the standard deviation. The Unzen data set exhibits a much smaller spread with a high stability for most of the sample sets. This difference indicates that natural heterogeneity of pyroclasts and eruption, transport and deposition dynamics require a deposit-adapted sampling strategy. Houghton and Wilson (1989) propose a minimum of 30 clasts per sample set. Our analysis shows that the minimum number of measured clasts per sample set must be established according to the characteristics of the deposit itself. When more raw data are available on different deposits, stability analysis results could be used to suggest a minimum number of measurements for future investigations. Moreover, the stability analysis might be used to select only high stability, ergo more representative samples for further analysis such as laboratory experimentation or permeability measurements (Fig. 5).

Graphical statistics for grain-size analysis have been commonly used to
identify the nature of volcanic deposits (Walker, 1971). The same might be
applied for density analysis. Figure 5 highlights the differences between
the Chachimbiro and Unzen data sets. For similar values of density/porosity
averages, the Chachimbiro data set shows almost systematically a higher
standard deviation than the Unzen data set (Supplement D). The two data sets
also display a small degree of overlap when looking at skewness and kurtosis
parameters. The Unzen deposits have principally a symmetric porosity
distribution (SkG and SkI around 0), while the Chachimbiro deposits have a
clear asymmetric distribution (SkG and SkI mostly positive and up to 0.4).
The porosity distribution for Unzen deposits is typically mesokurtic (KG

This study presents a new methodology to treat density and porosity measurements from pyroclastic deposits. It presents weighting equations that allow a more robust statistical analysis. The evaluation of Chachimbiro and Unzen density/porosity data sets indicates that frequency analysis alone can lead to misinterpretations and that weighted analysis should be used to avoid analytical bias. The stability analysis provides a tool to assess the quality of the sampling while the graphical parameters allow for a better characterization of the deposits than the classical approach using only averages and histograms. The results obtained show that for small numbers of measurements the Chachimbiro sample sets are less stable than the Unzen ones. This can be interpreted as being due to either the sampling method or due to the deposit density/porosity distribution. Finally we propose the use of graphical statistics to represent density/porosity data. The differences observed between the two data sets indicate that such representations can be useful to distinguish pyroclastic deposits.

The authors thank J. Anzieta and S. Hidalgo for useful discussions on the methodology and E. Gaunt for English proofreading. U. Kueppers acknowledges funding from DFG grant KU2689/2-1 that allowed for personal discussion during fieldwork in Ecuador. This research has been completed in the framework of the Laboratoire Mixte International “Séismes et Volcans dans les Andes du Nord” which link the IGEPN and three research entities in France including the Laboratoire Magmas et Volcans (Blaise Pascal University, Clermont-Ferrand). The authors thank the reviewers Jamie Farquharson and Thomas Giachetti and the editor Michael Heap for their comments that helped to improve the original manuscript. Edited by: M. Heap