The stochastic quantization method and its application to the numerical simulation of volcanic conduit dynamics under random conditions

The stochastic quantization method and its application to the numerical simulation of volcanic conduit dynamics under random conditions E. Peruzzo, M. Barsanti, F. Flandoli, and P. Papale Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126, Pisa, Italy Dipartimento di Matematica Applicata, Università di Pisa, Via Buonarroti 1c, 56127, Pisa, Italy Istituto Nazionale di Geofisica e Vulcanologia, Sezione di Pisa, Via della Faggiola 32, 56126, Pisa, Italy Invited contribution by E. Peruzzo, one of the EGU Young Scientists Outstanding Poster Paper (YSOPP) Award winners 2009. Received: 16 February 2010 – Accepted: 24 February 2010 – Published: 4 March 2010 Correspondence to: E. Peruzzo (e.peruzzo@sns.it) Published by Copernicus Publications on behalf of the European Geosciences Union.


Introduction
Since the demand for eruption scenario forecast in the world is pressing, there is a strong need for using complex physical models and numerical codes in order to get information about the possible eruptive conditions at many hazardous volcanoes (e.g., Sparks, 2003;Neri et al., 2007).Such models can describe volcanic processes thoroughly, but this ability often results in high computational costs: a single simulation can require a time of the order of days to weeks to be completed.
Correspondence to: E. Peruzzo (e.peruzzo@sns.it)On the other hand, since volcanic systems are largely inaccessible to direct observation, the models which describe them often involve intrinsically uncertain quantities.As a consequence, some of the input data required by the numerical codes should be considered as random variables rather than as fixed parameters.Therefore, the most direct way of obtaining information about the probability of possible eruptive scenarios would be to implement a Monte Carlo (MC) method.As a drawback, this would require a large number of numerical simulations (e.g. 10 4 ), while often a number of the order of 10 simulations cannot be exceeded.
It is thus fundamental to be able to choose the input data values in such a way that the simulations provide the maximum amount of information possible about the output quantities.Furthermore, the selection of the "best" sets of values of input data should be guided by fairly general principles, which could be applied to a large class of models and numerical codes and are not devised for a particular situation.The strategy presented in this paper is indeed general, since the choice of the optimal sets of input data does not involve the numerical code at all.In this respect, a different (and somewhat complementary) approach to the problem would be that of using the simulations to construct a function which is a reasonably good approximation to the complex code and, at the same time, can be evaluated with a low computational effort (Sacks et al., 1989;Currin et al., 1991).The simplified function produced could then be used in a Monte Carlo simulation.In this case, the choice of the sets of input data used Published by Copernicus Publications on behalf of the European Geosciences Union.
E. Peruzzo et al.: The SQ method for the numerical simulation of volcanic conduit dynamics under random conditions to find the approximating function is less critical, while the focus is on the approximation of the numerical code.A possibility that has been explored in the mathematical literature is that of employing Bayesian inference to select a function with the required properties (Kennedy and O'Hagan, 2001).Our approach, on the other hand, aims at defining an optimal set of input data, that sufficiently describes the output distribution.Future developments may involve a mixed approach, whereby some gross properties of the numerical code are exploited to guide the choice of the optimal sets of input data and corresponding output distribution.
In the first section of this paper we present our approach to the problem, showing that the stochastic quantization (SQ) method provides a possible solution.In Sects. 2 and 3 we briefly sketch the basic principles of one-dimensional and multi-dimensional quantization.In Sect. 4 we show the application of our strategy to an ideal case, while in Sect. 5 we turn to a more realistic situation, involving the onedimensional steady model of magma flow in a volcanic conduit described in (Papale, 2001).We show that, given a maximum number N of simulations, the results obtained using the SQ method to choose N sets of values of input data are better than those produced by N MC simulations.
The origins of the theory of quantization of probability distributions date back to the three fundamental articles (Oliver et al., 1948;Bennett, 1948;Panter and Dite, 1951); the field of application was that of signal processing, in particular modulation and analog-to-digital conversion.See (Gray and Neuhoff, 1998) for a survey of the method and of its engineering applications, besides a huge list of references (mainly in the field of engineering); the book (Graf and Luschgy, 2000) gives a mathematically rigorous treatment of the subject.

Outline of the strategy
In order to illustrate our strategy, consider a practical situation: suppose that a numerical code for the simulation of some volcanic processes has the random variable X among its input data.Likewise, X can be a collection of input random variables (X 1 ,...,X d ), i.e. a d-dimensional random vector.Let Y be one relevant output quantity of the numerical code.We denote by f (x 1 ,...,x d ) and g(y) the probability density functions of X and Y , respectively; f is assumed to be known, while nothing is known about g.We also suppose that the numerical code has such a high degree of complexity that the maximum number N of affordable simulations is very small, of the order of 10.It is thus not possible to collect information about g using a MC method.
Our strategy (see Fig. 1) consists in three main steps and an optional fourth step.
1. Find N values of the random vector X, and N corresponding weights w (1) ,...,w (N ) , with N i=1 w (i) =1, such that the discrete probability distribution f which assigns the weight w (i) to the point (for i=1,...,N) is the optimal approximation of f , among all the discrete probability distributions concentrated in N points.The meaning of optimality is to be precised later.2. Perform N numerical simulations to compute the N corresponding values y (1) ,...,y (N ) of the random variable Y .We represent the action of the numerical code on the input data through a function ϕ, so that for i=1,...,N.
3. Build a discrete approximation ĝ of g by assigning the weight w (i) to the point y (i) for i = 1,...,N.
4. Use ĝ to build a continuous probability distribution.
The core of the problem is thus to find the "best" discretization of a fixed continuous probability distribution f (step 1).Stochastic quantization is a mathematical theory which allows to accomplish this task by giving a definition of the optimal discretization and providing an algorithm to find it.Once this is done, a discrete approximation of the unknown output probability distribution is automatically produced (steps 2 and 3).Concerning step 3, it is a rigorous fact that the transformation of a discrete probability distribution by a function ϕ is the discrete probability distribution having the same weigths on the image points.Thus step 3 is the most natural choice.If we would know the output density g, a better choice would be the weigths given by the SQ algorithm applied to g and the given output points.But we do not know g.It is an open problem to improve step 3 in this direction.
Though the discrete probability distribution ĝ can be used to estimate the values of some parameters of g (e.g. its mean, its variance, some quantiles), its graphical representation gives poor insight into the main qualitative features of g.This is one of the motivations of step 4, which can be carried out through a kernel smoothing algorithm (e.g., Wand and Jones, 1995).Moreover, it may be useful for other purposes, like random number generation, to have a continuous distribution in output.The main idea of kernel smoothing algorithms is to smear out each weight w (i) around the corresponding point y (i) according to a fixed rule and then to sum up all the contributions.This leads to the continuous probability distribution x (1)  x (10) y (1) y (10)   . . . . . . . . . . . .
w (10)  w (1) w (10)   x  where K is a smooth probability distribution and h is a measure of the width of the interval over which each weight is spread.h is chosen according to an optimality criterion, based on the minimization of some kind of error resulting from the substitution of g with ĝKS .K is usually chosen to be a unimodal probability density symmetric about zero, but its exact expression does not affect very much the result.

Quantization of univariate probability distributions (d=1)
As a first step, in order to give a precise meaning to the expression "best approximation", we introduce a distance between probability distributions.Consequently, the optimal discretization of f can be defined as the discrete probability distribution f which has the minimum distance from f .Since we have to compare discrete and continuous probability distributions, it is easier to rely on the cumulative distribution functions, especially in the case d=1, in which there is only one input random variable X.Let F be the cumulative distribution function associated with the density f and F the one associated with f : namely (see Fig. 2), where x min and x max are the minimum and maximum values of X, x (i) are the points in which f is concentrated and w (i)  are the corresponding weights.
For instance, we can define a distance between f and f as (see Fig. 3).Therefore, the procedure consists in searching for the discrete probability distribution f which minimizes the quantity d(f, f ), among all the discrete distributions concentrated in N points.

Quantization of multivariate probability distributions (d>1)
A more complicated situation arises when there are several random variables X 1 ,...,X d in input, or equivalently a single random vector X=(X 1 ,...,X d ).If X 1 ,...,X d are independent and f 1 (x 1 ),...,f d (x d ) are their probability density functions, then f (x 1 ,..., w (10)   x x  probability density function of X.If X 1 ,...,X d are not independent, the probability density function f (x 1 ,...,x d ) of X is not related to f 1 (x 1 ),...,f d (x d ) in an obvious way.However, the independency hypothesis is not necessary for the SQ algorithm.
A definition of distance similar to the one in Eq. 2 could be given, but it is easier and more appropriate to use another definition of distance, based on the random variables rather than on their cumulative distribution functions.
Let X be a discrete random vector with probability distribution f , approximating the continuous random vector X; it seems quite natural to choose, as a measure of the distance between f and f , the mean value of the error X− X resulting from the substitution of X with X: (3) The distance defined by Eq. ( 3) could be computed numerically, but the calculation is easier and faster (especially in high dimension) via a MC method; Appendix A shows in detail how it can be performed.The MC simulations involved in the algorithm use only samples of X and X, which can be generated easily, and not samples of Y , whose generation is out of reach.The randomness on the value of d(f, f ) (and, as a consequence, on the values of the "optimal" points x (1) ,...,x (N ) ) due to the choice of a stochastic algorithm can be made negligible, provided that the number M of MC simulations is large enough.Moreover, the stochastic algorithm easily provides the weights w (1) ,...,w (N ) (see Appendix A).
In appendix A we also show that, for univariate distributions, the distances in Eqs. ( 2) and (3) lead to the same optimal discretization: therefore the definition given by Eq. ( 3), besides having an intuitive motivation, is a natural generalization of that given by Eq. ( 2).

Testing SQ in a simple case
In order to assess the quality of the approximation of the output probability distribution given by the SQ, we first consider a simple test case.
-There is a 2-dimensional random vector X=(X 1 ,X 2 ) in input; X 1 and X 2 are independent and their probability distributions are both gaussians, truncated at 0 and at 1; the distribution of X 1 has mean 0. deviation 0.2, that of X 2 has mean 0.6 and standard deviation 0.1.
-The relation between X and the output random variable Y is known and has a simple analytical expression: In such a simple case it is possible to implement a MC method with very high numerosity (e.g. 10 5 ), producing an estimate of g which can be considered exact for practical purposes.This version of g can be directly compared to the results produced by the SQ method and by MC methods with variable numerosity, in order to establish which one gives the best approximation of g: for instance, the SQ method can be compared to a MC with the same numerosity, or to other low numerosity MC simulations.For each numerosity, the MC can be repeated several times, thanks to the simplicity of the function ϕ; consequently, it is possible to estimate the probability that the performances of the SQ are better than those of the MC (see Fig. 5). Figure 4 compares some quantiles of g, whose values are very well known thanks to the high numerosity MC, to their estimates produced by the SQ method.These estimates are obtained via a linear interpolation of the cumulative distribution associated to ĝ. Figure 4 shows that, as the numerosity N of the SQ grows, the estimates of the percentiles of g globally improve.N=15 is a first good compromise; then the improvement is not so strong, until N =40 or 50, where the result is almost perfect.On the other hand, the central quantiles do not become better and better but fluctuate around the true values in an unpredictable manner.This is the reason why the estimate of a parameter sometimes gets worse even if N increases, as is shown in Fig. 5: for instance, the estimate of the median of g (red curve) gets worse as N passes from 5 to 10.This makes it possible for MC simulations to rapidly achieve better estimates of the median than 10 points SQ as their numerosity increases.Comparison between the SQ and MC simulations.On the x axis there is the numerosity of the MC, in logarithmic scale; on the y axis there is the probability that err SQ <err MC , where err SQ = p−p SQ and err MC = p−p MC , p is the real value of a parameter of the output probability distribution g, p SQ and p MC are its estimates given by SQ and MC, respectively.The curves represent the results for various parameters: mean, standard deviation and the 5%, 25%, 50%, 75%, 95% quantiles.The horizontal dashed line marks the value 0.5 of probability.Different graphs refer to different values of the numerosity N of the SQ.In each graph, a vertical dashed line is drawn in correspondance of the MC numerosity equal to N.
However, Fig. 5 also shows that there is a general tendency to improvement as N grows.The whole bundle of curves, in fact, moves gradually towards the upper right corner of the graph: this means that the probability of the estimates given by the SQ being better than those given by the MC is generally increasing as N grows.Furthermore, it is evident that, if N is sufficiently high (N>10 in this case), the SQ method gives better results than MC simulations with the same numerosity with probability greater than 0.5.

Application of SQ to volcanic conduit dynamics
As a test application of the SQ approach to a volcanologically relevant case, we consider a one dimensional steady model of magma flow in a cilindrical conduit with fixed diameter and uniform temperature (Papale, 2001).This model is ideal for testing SQ, since it provides a set of volcanologically relevant, strongly non-linear equations relating input and output distributions in a complex, unpredictable way, despite keeping the computational time small enough (order of minutes for each simulation) to allow a MC simulation with N=10 3 .The output distribution given by this MC is reasonably close to the exact one and can be used for comparison with SQ.Hence, this first application of SQ to a volcanologically relevant case is also a further test of the method.
Among the several input quantities that are intrinsically uncertain we choose two of them, namely, the diameter D of the conduit and the total mass fraction of water w H 2 0 .These two quantities are known to largely control the conduit flow dynamics and the associated mass flow-rate (e.g., Wilson et al., 1980;Papale et al., 1998).D and w H 2 0 are therefore considered as random variables.We assign to each of them a probability distribution (truncated gaussian) and we study the corresponding probability distribution of the logarithm of the mass-flow rate ṁ.The latter is a volcanologically relevant quantity, as it defines the intensity of an eruption and as it largely affects the impact on the surroundings (Valentine and Wohletz, 1989;Todesco et al., 2002).
Figure 6a shows the assumed continuous distribution of the two random variables w H 2 0 and D, while Fig. 6b illustrates the result of the application of the SQ method to discretize the distribution in 20 w H 2 0 -D pairs.In order to test the SQ method, the discretization has been done also with 15 and 10 w H 2 0 -D pairs.The results have been compared, in terms of output probability density (Fig. 6c) and cumulative probability distribution (Fig. 6d).
The three SQ cases with N=10, 15 and 20 reproduce well the mode of the distribution at a mass flow-rate of about 10 8 kg/s, but fail in predicting correctly the shape of the distribution, resulting in a larger and less skewed curve with respect to the MC case.The larger values of density predicted by SQ at the high mass flow-rate tail largely explain the lower values of the mode with respect to the MC case.On the contrary, the left tail of the distribution, corresponding the minimum mass flow-rates, predicted accurately by the SQ.The improvement due to increased numerosity of SQ from 10 to 20 is clearly visible from the cumulative plots in Fig. 6d. Figure 7 compares the estimates of the quantiles of the output distribution of mass flow-rates given by MC and SQ with N =10, 15 and 20.The quantiles are obtained by means of a linear interpolation of the cumulative distributions in Fig. 6d.As for the analogous Fig. 4 (referring to the polynomial map at Eq. 4), the bulk of the distribution is predicted well by the SQ method, but the tails of the distribution are not.While a significant improvement clearly emerges from N =10 to N =15, there is no significant gain in accuracy when moving Table 1.Comparison between the estimates of some parameters of the output distribution given by the SQ with N =10,15,20 points and by a MC with 10 3 simulations.The considered parameters are the mean, the standard deviation and the 5%, 25%, 50%, 75%, 95% quantiles.mean st.dev.
q 5 q 25 q 50 q 75 q 95 MC 7.74 0.50 from N=15 to N=20.Table 1 shows the same results numerically for a few selected quantiles.
As for the ideal case discussed above, it is possible to compare the performances of the SQ with those of some low numerosity MC simulations (Fig. 8).This time, the discrete distributions generated by the SQ and by the low numerosity MC simulations are both compared to the one obtained from the MC simulation with N=10 3 .In all cases, and for any quantity investigated, the SQ provides a better approximation of the output distribution than the MC with equal numerosity.For many quantities a MC with at least hundreds of simulations is required in order to exceed the accuracy given by SQ simulations with numerosity up to 20.

Conclusions
The performance of the SQ method has been analyzed both for an artificial polynomial map with random input and for a more complex set of non-linear equations.The latter case also represents an application of the SQ method to the volcanologically relevant case of steady multiphase magma flow along a volcanic conduit.Our analysis includes both a comparison between SQ and pure MC (Monte Carlo) method, and the accuracy of SQ in itself.In both cases the SQ method Solid Earth, 1, 49-59, 2010 www.solid-earth.net/1/49/2010/E. Peruzzo et al.: The SQ method for the numerical simulation of volcanic conduit dynamics under random conditions 57 provides substantially better estimates of output distributions than the MC method with the same number of simulations.This property is already clear with values of N around 15-20, and it becomes stronger with the increase of N. The results are less definite for very small N, like N=5 and sometimes 10, where further ideas and research are needed.In general, MC gives results comparable to SQ only when the number of MC simulations is much higher, sometimes by one or more orders of magnitude, than that of SQ.Therefore, the SQ method results in a considerable computational saving for the same degree of accuracy of the estimates.With values of N of 15 or 20, in general the estimates obtained by SQ are very close to the true ones (or to our better estimates of the true ones).
In conclusion, the SQ method allows the introduction of uncertainties in the deterministic approach without requiring exceeding CPU time.This result is promising for the capability of estimating future volcanic scenarios and volcanic hazards by means of a merged deterministic-probabilistic approach, whereby complex deterministic models are employed by taking into account the intrinsic uncertainties involved in the definition of the conditions characterizing the volcanic systems.

The numerical algorithm
This appendix describes in detail how the distance d(f, f ), defined in Eq. (3), is approximately computed via a stochastic algorithm.Morevorer, it shows that the optimal discretization of f is found by moving the points x (1) ,...,x (N) in such a way that d(f, f ) is minimized, i.e. by minimizing a function of N d-dimensional vectors (function h in Eq.A3 below).
The starting point is represented by a fundamental result of the SQ theory (Graf and Luschgy, 2000, Lemma 3.1), which states that, if the N possible values of the random vector X, i.e. the N points in which f is concentrated, are fixed, then the corresponding optimal weights w (1) ,...,w (N) are uniquely determined.
More precisely, the weights are defined as follows.
-For i=1,...,N, let V i be the region of the d-dimensional space such that where for k=1,...,N.V i is called the Voronoi region of x (i) with respect to the set x (1) ,...,x (N ) ; it contains the points which are closer to x (i) than to any other element of the set x (1) ,...,x (N ) (see figure A1).
-The optimal approximation X of the random vector X is defined as follows: X=x (i) if and only if the value of X belongs to V i .Namely, X is obtained by rounding off X to the nearest vector among x (1) ,...,x (N ) .
-Correspondingly, w (i) (i.e. the probability that X=x (i) ) is the weight assigned to V i by the probability distribution f : If X is defined as just described, it can be shown (Graf and Luschgy, 2000, Lemma 3.4) that, in the case d=1, moreover, if X is another random variable, with the same possible values x (1) ,...,x (N ) but defined in whatever way, and F is its cumulative distribution function, it can be shown that This means that, in the case d=1, minimizing x max x min F (x)− F (x) dx is the same as minimizing E |X− X| , so that the criterion used when there are several parameters in input is indeed a generalization of that used when there is only one parameter.
If the points x (1) ,...,x (N ) are fixed, an approximate calculation of the minimum value of the distance in Eq. (3) and of the corresponding optimal weights w (1) ,...,w (N ) can be carried out through the following steps (see Fig. A1): 1. generate a large number M (e.g.M=10 5 ) of ddimensional random vectors z 1 ,...,z M with probability distribution f ; 2. for each vector z j , select the index i j such that z j belongs to V i j , i.e. z j −x (i j ) =min k=1,...,N z j −x (k) ; 3. for k = 1,...,N, assign to x (k) the weight x (3) x (4) x (5)   x (6)   x (7)   Fig. A1.Implementation of the stochastic quantization method with = 2 and N=7.The blue points are a sample of the input random vector X=(X 1 ,X 2 ); the orange points x ,...,x (7) are the possible values of X, which is a discrete approximation of X.The orange lines define the Voronoi regions generated by the set x (1) ,...,x (7) : the region associated to x (i) contains the sample points which are closer to x (i) than to any other of the orange points.
where m (k) is the number of vectors z j into the Voronoi region V k , i.e. the number of indexes j such that i j =k; In our situation, in which the points (x (1) ,...,x (N ) ) are not fixed, the function h x (1) ,...,x (N) = 1 M M j =1 z j − x (i j ) (A3) must be minimized.The minimization is performed using Powell's method (Powell, 1964;Press et al., 2001), which moves the points x (1) ,...,x (N ) , starting from an initial guess; for each new choice of x (1) ,...,x (N ) , the algorithm evaluates h x (1) ,...,x (N) going through the steps 1-4 above.The set of points which produces the minimum value of h is just the optimal set of points we are searching for.In order to minimize the risk of finding local minima, the minimization is repeated 10 times, varying the initial guesses, and the lowest minimum is taken as the best estimate of the true minimum.Note that the error in the estimate (Eq.A2) of d(f, f ) is proportional to 1 √ M , so that, for sufficiently high values of M, it becomes negligible and minimizing is the same as minimizing d(f, f ).

Fig. 1 .
Fig.1.Graphical representation of the first three steps of our strategy, with d=1 and N =10.The lower graphs represent the discrete approximations of the input and output probability distributions produced by the SQ method.The weight w(i) is the area of the rectangle associated to the i-th point for i = 1,...,N.

Fig. 3 .
Fig.3.The distance between f and f is defined as the shaded region area in the graphs.Optimal discrete distributions look like the one in graph (a), while graph (b) represents a sub-optimal discrete distribution.

Fig. 4 .
Fig.4.Quantile-quantile plots.On the x-axis there are the true values of the quantiles of g, on the y-axis their estimates given by the SQ.The orange daggers represent the percentiles, from 1 to 99.If a dagger is close to the dashed line y=x, the SQ gives a good estimate of the corresponding percentile.Different graphs refer to different numerosities of the SQ.

E
Fig. 5.Comparison between the SQ and MC simulations.On the x axis there is the numerosity of the MC, in logarithmic scale; on the y axis there is the probability that err SQ <err MC , where err SQ = p−p SQ and err MC = p−p MC , p is the real value of a parameter of the output probability distribution g, p SQ and p MC are its estimates given by SQ and MC, respectively.The curves represent the results for various parameters: mean, standard deviation and the 5%, 25%, 50%, 75%, 95% quantiles.The horizontal dashed line marks the value 0.5 of probability.Different graphs refer to different values of the numerosity N of the SQ.In each graph, a vertical dashed line is drawn in correspondance of the MC numerosity equal to N.

Fig. 6 .
Fig.6.Graph (a) represents the probability distribution of the input random vector D,w H 2 O , while in graph (b) there is its discretization produced by the SQ with N=20.Graph (c) shows a comparison between the approximation of g resulting 10 3 MC simulations and ĝKS , which is obtained through the application of a kernel smoothing algorithm to the discrete distribution ĝ given by the SQ with N=10,15,20.Graph (d) represents a comparison between the cumulative distribution function resulting from 10 3 MC simulations and those resulting from the SQ with N=10,15,20, without application of kernel smoothing.

Fig. 7 .Fig. 8 .
Fig. 7. Quantile-quantile plots, analogous to Fig. 4. On the x-axis there are the estimates of the quantiles of g given by a MC simulation with N = 10 3 , which are close to the true values.