Improving the quality of empirical Green’s functions, obtained by cross-correlation of high-frequency ambient seismic noise

. Studying the uppermost structure of the subsurface is a necessary part of solving many practical problems (ex-ploration of minerals, groundwater studies, geoengineering, etc.). The practical application of active seismic methods for these purposes is not always possible for different reasons, such as logistical difﬁculties, high cost of work, and a high level of seismic and acoustic noise. That is why developing and improving passive seismic methods is one of the important problems in applied geophysics. In our study, we describe a way of improving the quality of empirical Green’s functions (EGFs), evaluated from high-frequency ambient seismic noise, by using the advanced technique of cross-correlation function stacking in the time domain (in this paper we use term “high-frequency” for frequencies higher than 1 Hz). The technique is based on the global optimization algorithm, in which the optimized objective function is a signal-to-noise ratio of an EGF, retrieved at each iteration. In comparison to existing techniques, based, for example, on weight stacking of cross-correlation functions, our technique makes it possible to signiﬁcantly increase the signal-to-noise ratio and, therefore, the quality of the EGFs. The technique has been tested with the ﬁeld data acquired in an area with a high level of industrial noise (Pyhäsalmi Mine, Finland) and in an area with a low level of anthropogenic noise (Ku-usamo Greenstone Belt, Finland). The results show that the proposed technique can be used for the extraction of EGFs from high-frequency seismic noise in practical problems of mapping of the shallow subsurface, both in areas with high and low levels of high-frequency seismic noise.

Abstract. Studying the uppermost structure of the subsurface is a necessary part of solving many practical problems (exploration of minerals, groundwater studies, geoengineering, etc.). The practical application of active seismic methods for these purposes is not always possible for different reasons, such as logistical difficulties, high cost of work, and a high level of seismic and acoustic noise. That is why developing and improving passive seismic methods is one of the important problems in applied geophysics. In our study, we describe a way of improving the quality of empirical Green's functions (EGFs), evaluated from high-frequency ambient seismic noise, by using the advanced technique of crosscorrelation function stacking in the time domain (in this paper we use term "high-frequency" for frequencies higher than 1 Hz). The technique is based on the global optimization algorithm, in which the optimized objective function is a signal-to-noise ratio of an EGF, retrieved at each iteration. In comparison to existing techniques, based, for example, on weight stacking of cross-correlation functions, our technique makes it possible to significantly increase the signal-to-noise ratio and, therefore, the quality of the EGFs. The technique has been tested with the field data acquired in an area with a high level of industrial noise (Pyhäsalmi Mine, Finland) and in an area with a low level of anthropogenic noise (Kuusamo Greenstone Belt, Finland). The results show that the proposed technique can be used for the extraction of EGFs from high-frequency seismic noise in practical problems of mapping of the shallow subsurface, both in areas with high and low levels of high-frequency seismic noise.

Introduction
Seismic methods as tools for studying the shallow subsurface structures in exploration geophysics have been developed for many years. Traditionally, seismic surveys (reflection and refraction) have been carried out using active sources. The reflection and refraction controlled-source seismic sounding methods are widely applied in exploration for oil and gas but less commonly in mineral exploration in crystalline bedrock areas. The reasons for this have been the traditionally high cost of seismic surveys and logistical difficulties (Malehmir et al., 2012). Seismic methods as a mineral exploration tool are very good for delineation of the boundaries of certain types of mineral deposits as well as for estimating their ore potential (Kukkonen et al., 2009;Malehmir et al., 2012). There are, however, challenges in exploration of new deep targets in the vicinity of active mines, that is, in brownfield exploration. In our paper, brownfield means exploration near active mines or at the previously studied area with the purpose of getting new mineral reserves, while greenfield means exploration of new mineral deposits. Due to the large amount of heavy machinery, the active mines themselves produce strong seismic and acoustic noise. This continuous noise overlaps in frequencies with the signals of the controlled seismic sources, creating a problem for the highresolution active-source seismic experiments in a brownfield exploration (Place et al., 2015).
In our paper, we describe results of investigating the possibility of using passive ambient seismic noise interferometry with the noise with frequencies higher than 1 Hz (hereafter, we use the term "high-frequency" for this seismic noise) for extracting information about the shallow subsurface in greenfield and brownfield exploration projects. In our study, the shallow subsurface means depths from ground surface down to several hundreds metres. For this, we develop a new method of improving the quality of empirical Green's function (EGFs) evaluated from high-frequency industrial, anthropogenic or natural seismic noise. We partly use algorithms described in Shapiro et al. (2005), Campillo (2006), Bensen et al. (2007), Groos et al. (2012), Poli et al. (2012aPoli et al. ( , b, 2013, and Afonin et al. (2017) for ambient noise preprocessing and implement a new algorithm of stacking crosscorrelation functions in the time domain.
At present, there are several advanced algorithms, working in the time domain, in the frequency domain or in the time-frequency domain. One group of algorithms tries to improve the quality of the resulting EGFs using the evaluation of cross-correlation functions according to certain criteria prior to stacking them. For example, in the methods described in Baig et al. (2009), a denoising procedure, based on S transform, is applied to cross-correlation functions before their stacking. In the "time-frequency domain phaseweighted stacking" method, which may use either S transform (Schimmel et al., 2011) or wavelet transform (Ventosa et al., 2017), phases of signals are analysed prior stacking them. The errors that inverse S transforms may introduce to subsequent phase-velocity measurements were analysed in Li et al. (2018). Another approach, based on stacking only cross-correlation functions of highly coherent signals, was used in global-scale coda wave interferometry studies (Boué et al., 2014). These algorithms do not use signal-to-noise ratios (SNRs) of cross-correlation functions for improving the final EGF, and it is assumed that signal coherence by itself is a guarantee that all non-suitable cross-correlation functions are either excluded from the final stack or minimized by using weights, and hence the SNR is automatically improved. This may be true for teleseismic coda wave interferometry (Phm et al., 2018), in which source location is a priori known and it is easy to control that only signals within the so-called "stationary phase" area are cross-correlated (Wapenaar et al., 2010). However, in the ambient noise studies with noise sources that are stochastically and non-evenly distributed in time and space their azimuthal distribution is not known a priori. In this case, one would need to take into account this distribution, in order to satisfy the stationary phase condition. The other group of methods, such as "root mean square stacking" or weight stacking (Shirzad and Shomali, 2015;Nakata et al., 2015;Cheng et al., 2015;Li et al., 2018) are aiming mainly at increasing a signal-to-noise ratio of the resulting EGF, but they do not take into account the coherence of the cross-correlation functions in the stack. That is why incoherent cross-correlation functions are not totally excluded from the stack in these algorithms, and this can decrease the quality of evaluated EGFs.
To overcome the limitations of existing techniques, we develop a new algorithm that makes it possible not only to ex-clude incoherent cross-correlation functions from the EGF stacking process but also to keep control of the azimuthal distribution of noise sources and the condition of the stationary phase. In our paper the term "coherent" is used to define cross-correlation functions with the same time lags of signal maxima and the same dominant frequency. We do not use this method in the frequency domain because for stationary phase condition to be satisfied, it is important to stack crosscorrelation functions with the same time lags and dominant frequencies, in other words, functions that are coherent with each other. As a main criterion for selecting cross-correlation functions to stack, we use an increase in the SNR of extracted EGFs after stacking. Moreover, we use the global optimization algorithm for obtaining the best solution for the SNR. In this case, an SNR, calculated on each iteration, is an objective function that is optimized.
In our paper, we present details of this algorithm and illustrate its performance using passive seismic ambient noise data acquired in two areas of Fennoscandia: Pyhäsalmi mine (as an example of area with a high level of industrial noise) and the Kuusamo Greenstone Belt area (a quiet area prospective for new mining projects (Weihed et al., 2005;Lehtonen and O'Brien, 2009).

Advanced technique of cross-correlation function stacking
For solving the problems described in the "Introduction" section, we suggest our method of time-domain stacking of cross-correlation functions calculated for different time windows. We call this method SNR stacking. The general purpose of this method is to select for stacking only those crosscorrelation functions that are not only coherent with each other but also correspond to the stationary phase area. Let us assume that ambient noise in a particular frequency band is recorded simultaneously at two different points with Cartesian coordinates r 1 and r 2 . For each frequency, the stationary phase area for the receiver located at the point r i , i = 1, 2 corresponds to the Fresnel zone of the wave propagating from the source to the receiver with a particular apparent velocity. In this case, the maximum of the crosscorrelation function at a particular time lag would correspond to the minimum of apparent velocity, and hence, the cross-correlation function would be close to the "true" EGF. We assume that noise sources are partly located in a stationary phase area while other noise sources are distributed outside it. For the selection of cross-correlation functions corresponding to the stationary phase area, it is possible to use criteria of minimum apparent velocity and of the signal-tonoise ratio increasing after stacking. We consider the SNR of the EGF after stacking as a particular, generally non-linear function of apparent velocity and a back-azimuth of noise sources and an initial time window used to start the selection of cross-correlation functions to the stack. In this case, the global optimization of this objective function would allow us to retrieve EGFs of high quality.
We assume again that the ambient seismic noise is recorded simultaneously at two different points with Cartesian coordinates r 1 and r 2 , r = [xyz], and continuous recordings are split into n time windows with the same durations. Let a i (r 1 r 2 t) be the cross-correlation function of these seismic records for the time window i, i = 1. . .n, where t is a time lag of the seismic records. Let t m be the maximum time lag in a cross-correlation function (length of crosscorrelation); t ds is a maximum time of wave propagation between the two points; |t m | |t ds | and −t m ≤ t ≤ t m . Let −t ds ≤ t e ≤ t ds be the time lag on the cross-correlation function corresponding to the expected seismic phase (body or surface wave) and t e = t e ± T , where T is the period of expected signal. Negative values of the time lags correspond to the anti-casual part of the evaluated EGF. In this case, the selection of t ds and t e is based upon a priori information about seismic velocities in the studied area. The value of t e is at least two periods of the expected signal dominant frequency. In the case of the evaluation of surface wave parts of EGFs, this frequency usually corresponds to the frequency of noise with the largest amplitude that can be estimated by timefrequency analysis of the seismic noise records.
Let a max i (r 1 r 2 t e ) be the maximum value of crosscorrelation function in the time interval t e . Then, the signalto-noise ratio of the cross-correlation function calculated for the ith time window (SNR(a i (r 1 r 2 t))) is . (1) Let a i (r 1 r 2 t) and a j (r 1 r 2 t) be cross-correlation functions calculated for two different time windows i ∈ (1. . .n), and j ∈ (1. . .n) and c (r 1 , r 2 , t) = a i (r 1 , r 2 , t) + a j (r 1 , r 2 , t) are an EGF retrieved from these two cross-correlation functions. If a i (r 1 , r 2 , t) and a j (r 1 , r 2 , t) are coherent with each other and i = j , then the expressions SNR(a i (r 1 , r 2 , t)) < SNR(c (r 1 , r 2 , t)) and SNR(a j (r 1 , r 2 , t)) < SNR(c (r 1 , r 2 , t)) have to be true, according to the principle of interference. The condition i = j is necessary in order to avoid stacking of functions with themselves. Therefore, increasing the SNR of the retrieved EGF after stacking with a particular cross-correlation function can be used as a criterion for the selection of this function to the stack, excluding incoherent functions from the stack and building up the EGF with a high signal-to-noise ratio.
Based on the criteria described above, an expression for the calculation of the EGF for the kth iteration can be written as where k = 1. . .n is the number of initial function; n is the number of time windows; i = 1, . . ., n; G k i (r 1 , r 2 , t) is the EGF corresponding to kth initial function and evaluated in previous iterations: The operator of selection can be written as . (4) As a result of this algorithm we obtain n candidates for the EGF that can be considered as solutions to the optimization problem in a particular parameter space. Let us consider the signal-to-noise ratio as a particular function f (k), where k is the index of initial functions: SNR(G k (r 1 , r 2 , t)) = f (k) , k = 1, . . ., n. Then the condition for the final EGF selection can be written as m = argmax(f (k)), where m denotes the index of the EGF selected to the stack. Following this condition, the EGF with the maximum signal-to-noise ratio will be selected as the final one. As the function f (k) may have several local maxima in the parameter space k, k = 1, . . ., n, the condition for the final EGF selection ensures that the global maximum of this function is obtained in the parameter space considered.
In the proposed algorithm, maximizing the signal-to-noise ratio of the retrieved EGF is ensured by stacking only crosscorrelation functions coherent with each other and the selection of the EGF with the maximum signal-to-noise ratio from all calculated candidate EGFs. In other words, the proposed algorithm is analogous to the direct search methods of global optimization. It is necessary to remember, however, that the EGF with the maximum signal-to-noise ratio does not correspond to a true EGF if the dominant noise sources are located outside the stationary phase area. Therefore, it is important to use the system of observations that allows estimating the azimuthal distribution of noise sources. Moreover, the method is based on the assumption that sources of the ambient seismic noise produce a signal with relatively broad bandwidth and cannot produce an ideal harmonic signal of single frequency.
The method also makes it possible to keep control over an a priori unknown azimuthal distribution of noise sources.
For this, a 2-D array of seismic recording stations is necessary. In this case, the time lags, corresponding to the expected signal t e in Eq. (1) have to be a function of apparent seismic velocity and back-azimuth: t e = f (vϕ). Then the signal-to-noise ratio for each pair of stations of the array is the function of the initial function index, velocity, and back-azimuth: Limits of apparent velocities have to be calculated according to a priori information about seismic velocities in the studied area. A global maximum of the function corresponds to the strongest or the most coherent wave field. Therefore, the method allows estimating azimuth to the strongest source of noise wave field.
We suggest that this method can be used for the extraction of EGFs from high-frequency industrial, anthropogenic, or natural seismic noise. Moreover, this method does not require that only a diffuse field is used for calculating EGFs. Therefore, the application of this method to the data of optimally selected seismic recording array might significantly decrease the time necessary for the registration of ambient seismic noise, which is very important for practical applications of passive seismic interferometry. For studying the possibilities of using this method for the extraction of EGFs from high-frequency seismic noise, we use the data from two passive seismic experiments carried out in areas with different seismic noise characteristics. The first area is characterized by a high level of industrial noise (Pyhäsalmi underground mine site) that is usually observed in brownfield exploration areas, while the second area is seismically very quiet and is characterized by a limited amount of local anthropogenic (roads) and natural (rivers) high-frequency seismic noise sources. Such noise characteristics are typical of greenfield exploration areas.
3 Experimental data 3.1 Pyhäsalmi mine area As an example of using a high-level industrial seismic noise for the estimation of EGFs, we used the seismic noise at the site of Pyhäsalmi mine, Finland. For this purpose we installed 24 three-component DSU-SA MEMS (microelectromechanical system) seismic sensors with the autonomous RAUD eX data acquisition units manufactured by Sercel Ltd (France). The instruments were installed along a 10 km long line crossing the mine area with interstation distances of about 100 m (for PLB03-PLB13 and PLB14-PLB22) and 2 km (PLB01, PLB02, PLB23, PLB24) (Fig. 1). The seismic stations recorded continuous seismic data from 1 to 5 November 2013 with a sampling frequency of 500 samples per second (sps).
The profile configuration was selected on the basis of the test measurements of ambient noise in the Pyhäsalmi area made by the authors. These studies showed that the mine is the main source of seismic high-frequency noise at distances about several kilometres from the mine.
The profile crossing the mine area consists of two parts, and each of these consists of 12 sensors: the western part has direction from the mine to the west (PLB01-PLB13), and the eastern part has direction from the mine to the east (PLB14-PLB24). Each part of the profile includes one sensor closest to the mine (PLB13 and PLB14). The horizontal components were oriented to the true north and east (N-S and E-W components, respectively). Thus, rotation of the horizontal components before seismic noise analysis was not necessary.
For testing our method of cross-correlation function stacking, we use the data collected during a passive seismic experiment in the KuGB area in August and September 2014. One of the aims of this experiment was to investigate the possibility of high-frequency EGF extraction from anthropogenic or natural seismic noise in regions with a low ambient noise level.
The temporary seismic array (Fig. 2) consisted of five three-component velocimeters Trillium Compact produced by Nanometrics (Canada) and 24 three-component accelerometers DSU-SA MEMS with autonomous RAUD eX data acquisition units manufactured by Sercel Ltd.
As one can see in Fig. 2, the seismic array represents a triangle. The sides of this triangle are about 4-6 km long. The broadband (BB) sensors were installed in the vertices of this triangle and collocated with MEMS accelerometers. In addition, each of these large triangle vertices was surrounded by a circular array with a small aperture (about 1400-1500 m), consisting of six accelerometers. The whole array recorded continuous seismic data from 28 August to 10 September 2014 with a sampling rate of 500 sps. Such an array configuration makes it possible to estimate the azimuthal distribution of the high-frequency noise sources and also to extract high-frequency EGFs from records of small aperture arrays. One of the most important steps of the data preparation before the extraction of EGFs is the time-frequency analysis. It is necessary for the selection of a frequency band with high amplitudes of ambient noise. For this, we analysed characteristics of seismic noise recorded at different distances from the potential noise sources. In the Pyhäsalmi experiment, the most probable noise sources are located inside the underground mine and in the open pit. For the time-frequency analysis of the seismic noise, we used records of sensors installed at different distances from the mine and from the open pit (PLB24 and PLB14; Fig. 1). Figure 3a, b show the results of this analysis.
From Fig. 3a, b, one can see two main frequency bands with high amplitudes of the seismic noise recorded closest to the mine: about 3-4 and about 10-100 Hz, respectively. Moreover, the amplitudes of the noise in these frequency bands decrease with distance from the mine. Therefore, we can assume that the sources of the noise for these frequency bands are located inside the underground mine and in the open pit. Based on this analysis, we selected the frequency band of 1-100 Hz for pre-filtering of the noise prior to the calculation of cross-correlation functions.
In the KuGB experiment, a temporary seismic network was installed in a quiet area without any significant industrial activity; therefore, we can assume that the high-frequency seismic noise might be produced by multiple natural (for example, rivers) and/or anthropogenic (for example, roads) sources. In this case, analysis of time-frequency characteristics of the seismic noise is a necessary step. For this, we calculated time-frequency diagrams in the frequency band of 0.1-100 Hz, and examples of these diagrams for two stations are presented in Fig. 3c, d. Figure 3c, d show that noise records of both stations have amplitude maxima in the frequency band of 0.1-1 Hz. Seismic noise recorded by the KU05 station is also characterized by periodically high amplitudes in the frequency band of 40-100 Hz (Fig. 3c). This noise may be caused by anthropogenic (transport) or natural (for example, wind) sources. Station KU02 is located close to the river that can be a source of con-   tinuous seismic noise with high amplitudes in the frequency band of 40-80 Hz (Fig. 3d). Therefore, for the estimation of a high-frequency EGF, we pre-filtered the data with the bandpass filter of 1-100 Hz.

Analysis of the azimuthal distribution of the noise sources
Classical methods of passive seismic interferometry are based on diffuse field approximation (Wapenaar et al., 2008(Wapenaar et al., , 2010. One of the most important conditions for using this approximation is an isotropic and homogeneous azimuthal distribution of noise sources (Mulargia, 2012). That is why the second important procedure of data preparation before the estimation of EGFs is analysis of the azimuthal distribution of the noise sources during the experiment's period. In our study, we considered two cases. In the case of the Pyhäsalmi area, the main sources of high-frequency seismic noise are most probably located inside the mine and in the open pit. Thus, the assumption about an isotropic and homogeneous azimuthal distribution is not valid. As shown in Wapenaar et al. (2010), in such cases one cannot assume diffuse field approximations. That is why the measurements of the noise were made along a profile (linear array) consisting of two parts crossing the mine site and oriented E-W. However, signals from other noise sources outside the stationary field area can also be present in the wave field ac-quired during the data acquisition period. That is why we carried out additional analysis of the azimuthal distribution of noise sources. For the calculation of the azimuthal distribution, the well-known methods are frequency-wavenumber (f -k) analysis (Neidell and Turhan Taner, 1971;Douze and Laster, 1979) and beam forming in the time domain (Rost and Thomas, 2002;Schweitzer et al., 2012). The linear configuration of the Pyhäsalmi array does not allow the application of the f -k analysis and beam forming, however. To understand the directivity of the seismic noise wave field in different frequency bands, we applied the horizontal-to-vertical ratio rotate method proposed in Nakamura (1989), investigated in Barazza et al. (2009), and implemented into Geopsy software (http://www.geopsy.org, last access: 31 January 2018). In our study, we analyse records of seismic noise with a duration of 10 min for each hour of records. We applied this procedure to records from stations which are the most distant from the mine and located in both parts of the profile (PLB01 and PLB24). We have selected two frequency bands (2-5 and 5-10 Hz) for analysis because they correspond to strong and stable seismic noise, from which it is possible to retrieve surface waves. The result is shown in Fig. 4 as a percentage of record time during which the recorded wave fields approached from certain azimuths with respect to the total time of the record. In Fig. 4, the azimuth of 0 • corresponds to the true north and shadowed sectors denote the azimuths to the noise sources. Radial sizes of these sectors are propor- Figure 5. Azimuths to main noise sources: dots -stations of the temporary seismic array; black arrows show azimuths to noise sources in the frequency band 10-50 Hz; grey arrows show azimuths to noise sources in the frequency band 5-10 Hz.
tional to the relative source-acting time calculated as a percentage of the total measurement time while angular sizes of the sectors correspond to errors of the azimuth calculation.
In Fig. 4a, b one can see strong directivity of the noise wave fields from the east. This proves that the main noise source for the eastern part of the profile and for the frequency bands of 2-5 and 5-10 Hz is the mine. Considering the western part of the profile, there is no such clear directivity of the noise wave fields as revealed for the eastern part. One can see a near-homogeneous azimuthal distribution of the noise sources for azimuths between about 250 and 300 • . This could be explained by the location of the profile close to the open pit that occupies a larger area than the underground mine. Because of this, the point-source approximation of noise sources is not valid. From these results we can conclude that if we simply stack all calculated cross-correlation functions for a pair of stations (in particular, in the eastern part of the profile), the final EGF would be biased. Therefore, for the estimation of the EGF with minimum bias, we need to apply the advanced method of stacking described above.
In the second case, we considered the KuGB area with a low level of high-frequency noise. In order to investigate the spatial and azimuthal distribution of the strongest noise sources, we applied the procedure described above to the data of each of the small-aperture arrays. The cross-correlation functions were calculated between the central sensor and the other sensors of the array. Figure 5 presents results of the calculations of the azimuths to the strongest seismic noise sources.
In Fig. 5, one can see that for the different small-aperture arrays there are also different azimuths to the sources in the different frequency bands and the directions to the sources depend on frequency. Taking into account the size of our temporary array (aperture of the large array is 3 km and apertures of each small array are 0.7 km), we can assume that the sources of high-frequency seismic noise are located at distances larger than 0.7 km but less than 3 km from the centres of the small-aperture arrays.

Empirical Green's functions estimation
For the estimation of EGFs, it is necessary to apply a procedure for data preparation. This procedure includes several steps, such as spectral whitening, removing parts of records with earthquakes, blasts, and missed data. This procedure is applied to the data of both experiments in our study. In the previous parts of our paper, we have demonstrated that the Pyhäsalmi mine is the source of continuous and strong seismic noise in the frequency band of 1-10 Hz. Therefore, we extract EGFs separately for the eastern and western parts of the profile.
Each part of the profile includes one sensor installed in the closest vicinity of the mine, and we calculated crosscorrelation functions between those sensors and each of the other sensors in both parts of the profile. Industrial seismic noise may consist of surface and body waves because of different types of noise sources.
There are several methods of stacking the cross-correlation functions in the time domain, for example, the root-meansquare method of Shirzad and Shomali (2015) and the weighted stack by Cheng et al. (2015). We compare the SNR of EGFs estimated by our method for the Pyhäsalmi experiment to the SNR of EGFs estimated by root-mean-square and weighted stacking methods, respectively. The SNR was calculated with respect to the surface wave signal seen in EGFs. Results of this comparison are presented in Fig. 6.
In Fig. 6, one can see that after the application of the SNR-stacking method we obtained the EGF with the highest signal-to-noise ratio of surface waves, compared to the other two methods of stacking. This is because we used only cross-correlation functions coherent with each other in our stacks. As one can see from Fig. 7, the algorithm selects only about 10 % of the total number of cross-correlation functions for the final stack. Nevertheless, it does not mean that there are only few cross-correlation functions coherent with each other. It means that after some iterations the signal-to-noise ratio did not increase any more by adding new functions to the stack. In other words, the algorithm has found a global maximum of the objective function described in Sect. 2. We analysed the apparent velocities obtained from the maxima of each of the cross-correlation functions and the apparent velocities from the cross-correlation functions selected by our algorithm of stacking (Fig. 8). This figure shows that most of the retrieved EGFs have group velocities of about 4500 m s −1 . After applying a simple stacking procedure to these cross-correlation functions, the group velocity of the surface wave part of the resulting EGF is about 4500 m s −1 . This cannot be true velocity, as it is too high for surface waves propagating in the uppermost bedrock. As can be seen from Fig. 8, our SNR-stacking algorithm has selected only EGFs with group velocity of about 3400 m s −1 . This velocity is close to the minimal value from all group velocities, and it is in agreement with group velocities of surface waves and S-wave velocities in the uppermost part of the bedrock in Fennoscandia (Kobranova, 1986;Dortman, 1992;Silvennoinen and Kozlovskaya, 2007;Janik et al., 2009;Poli et al., 2013, etc.). Therefore, after applying our stacking method, we can retrieve EGFs with true group velocity and a maximum SNR.
We apply our method of stacking to the cross-correlation functions calculated for the eastern and western parts of the profile for the frequency bands of 2-5 and 5-10 Hz separately. After stacking, we analysed particle motion diagrams of the waves retrieved from the seismic noise. Figure 9 shows the result of stacking and particle motion analysis of EGFs. In Fig. 9, we present only EGFs that probably also contain body waves because other EGFs, namely those calculated for the western part in the band of 5-10 Hz and for the eastern part in the band of 2-5 Hz, contain only surface wave parts. Figure 9a shows that the seismic noise recorded in the western part of the profile retrieves mainly Rayleigh waves with group velocity of about 3400 m s −1 . The other wave is marked in Fig. 9a as an S wave because the particle motion diagram corresponds to this type of a wave. Nevertheless, this wave has an apparent velocity of 5700 m s −1 , which is too high. Therefore, we speculate that this can be an artefact and that phase cannot be used for further analysis. In the frequency band of 5-10 Hz, the EGFs calculated for the eastern part of the profile (Fig. 10b) consist of Rayleigh wave. The other arrivals could correspond to one reflected P wave and three reflected S waves. Apparent velocities of reflected P, S1, S2, and S3 waves are about 4480, 3192, 3261, and 2543 m s −1 , respectively. Our assumption that these phases may correspond to retrieved body waves is based solely upon a comparison of their travel times with the travel times of body waves recorded during previous active source experiment in Pyhäsalmi (Heinonen et al., 2012). Alternatively, the extracted waves may correspond to other phases, for example, to direct waves generated by sources inside the mine. Unfortunately, these assumptions cannot be proved using our data, and it would be necessary to use the higher-density array for the precise phase identification of body waves. In our study, the error in velocity estimation is assumed to be equal to a quarter of the wavelength of an extracted signal. The error of the polarization calculation is about 1-3 • .
For the KuGB experiment, we calculate cross-correlation functions for each small-aperture array and apply the SNRstacking algorithm for the EGF evaluation. Cross-correlation functions are calculated between the central sensor and each other sensor of the corresponding small-aperture array. In Fig. 10b, we present results of the EGF calculation by the SNR-stacking method for one of the small-aperture arrays (SK1-SK8 in Fig. 2).
In Fig. 10, one can see that after the application of simple stacking, there are many implicit maxima in the retrieved EGFs. Due to this, it is not possible to calculate the azimuth to noise sources and apparent seismic wave velocities. However, the application of SNR stacking allows the retrieval of the EGFs with maxima corresponding to surface waves propagating from a virtual source with an apparent velocity of about 320-350 m s −1 . These waves could be Rayleigh waves or acoustic waves propagating in the air. This assumption is based on the fact that a velocity of 350 m s −1 is close both to the velocity of sound in the atmosphere and to the veloc-ity of surface waves propagating in the shallow quaternary sediments in the uppermost subsurface. For a precise determination of the wave type, it would be necessary to have a more dense observation network. Nevertheless, using our SNR-stacking algorithm we extracted surface waves from the high-frequency seismic noise. As we noticed in the previous section, the noise sources were distributed stochastically, both in space and in time, and the intensity of these noise sources was small. The body waves are not seen in Fig. 10b because a higher density array is necessary for their proper identification.

Discussion
Classical passive seismic interferometry is based on diffusefield approximation because of the equivalence of correlation properties of the multiple scattering and resulting wave fields. Therefore, it is possible to evaluate EGFs from averaged cross-correlation functions (Campillo, 2006;Shapiro and Campillo, 2004). In practice, one needs to average over long time intervals (more than 1 year) because of heterogeneous and anisotropic distributions of ambient seismic noise sources during short time intervals (Wapenaar et al., 2010). This is a serious limitation for the practical application of passive seismic interferometry as a method of applied geophysics because it is not always possible to have long-term data acquisition experiments for solving applied problems (mining exploration, microseismic zonation, etc.). In such applied problems, the alternative may be to use ballistic waves, not scattered from heterogeneities but produced by some localized sources of seismic noise (Mulargia, 2012). The major challenge in this case is retrieving body waves from seismic noise. Recently, some techniques of body-wave extraction have been proposed in Vidal et al. (2014) and Panea et al. (2014). The main idea of these techniques is separating ambient seismic noise into a body-wave part and a surface wave part. One could expect that a combination of these separation methods with our technique of stacking would significantly increase the quality of the retrieved body waves. Nevertheless, for this it would be necessary to have the data of dense high-resolution seismic arrays, so making new experiments would be a next step in the development of our technique.
As discussed in the "Introduction" section, there are several methods based on weighted stacking of cross-correlation functions in the time, frequency, and time-frequency domains, which allow us to increase the quality of extracted EGFs (Schimmel et al., 2011;Cheng et al., 2015;Liu et al., 2016;Li et al., 2018, etc.). We showed in previous sections that our time-domain algorithm based on the global optimization of the signal-to-noise ratio makes it possible to exclude incoherent cross-correlation functions from stacking and generally allows obtaining EGFs of even better quality. Of course, it is possible to use the signal-to-noise ratio in-crease criterion in the frequency or in the time-frequency domains, but this would make the algorithm significantly more complicated. It is necessary to remember, however, that this algorithm can be applied because ambient noise sources are generally characterized by a relatively wide frequency band. In this case one can expect that increasing the signal-to-noise ratio for the dominant frequency would result in an increase in the signal-to-noise ratio of all other frequencies of the signal, as shown in Bensen et al. (2007).
The algorithm proposed in this paper has several limitations and drawbacks and hence has the potential for improvement. Our technique of stacking allows increasing the signalto-noise ratio, but it has to be applied in combination with array configuration, which allows us to keep control over the azimuthal distribution of noise sources and to guarantee that the stationary phase condition is satisfied. It can be also envisaged that using the mode of noise level distribution instead of the average would make the algorithm more robust with respect to outliers. Another important limitation is the relationship between the time of seismic wave propagation between neighbouring sensors and the dominant period of the retrieved EGFs. If the time of surface wave propagation is about one or two periods of this wave, then it would not be possible to separate body and surface waves, similar to the situation in seismic experiments with active sources. Moreover, the increase in signal-to-noise ratio of one event, for example, a retrieved surface wave, might lead to a decrease in the signal-to-noise ratio of other phases, in particular, body waves.
In certain situations, increasing the SNR after the addition of a new function to the stack is not always achieved due to the coherence of the stacked functions. For example, if the coda wave part in a cross-correlation function gets smaller, then the SNR increases, although stacked functions might not be coherent with each other. The results of testing the algorithm with real data demonstrated, however, that the algorithm is robust and works fine with the high-frequency seismic noise acquired in two completely different areas. The results obtained for the KuGB area demonstrate that the SNRstacking method might be useful for building up an EGF by stacking ballistic surface wave signals retrieved from the ambient noise. In our study the high quality of surface waves in EGFs was achieved both for brownfield and greenfield exploration areas. Experimental data used in our study are insufficient to make a detailed evaluation of how this technique works with body-wave signals, and it will be the subject of our research in the future.

Conclusion
Results of our study suggest that classical approaches for the EGF evaluation from ambient seismic noise (Campillo, 2006) cannot be considered as a universal tool for extracting high-frequency EGFs. In particular, in quiet areas with a low level of anthropogenic and industrial noise the method would require a long registration time because sources of high-frequency wave fields are weak and their distribution is non-stationary both in space and time. One of the ways to treat the problem is to use ballistic waves and develop and improve methods for the selection of coherent parts of the ambient noise wave field. The study of the azimuthal distribution of ambient noise sources using array techniques is necessary prior to passive seismic experiments, both in greenfield and brownfield exploration areas.
The presented algorithm of cross-correlation function stacking in a time domains allows us to significantly increase the signal-to-noise ratio of retrieved EGFs. In our study we demonstrated that under certain conditions the body waves could be extracted from high-frequency industrial seismic noise using the proposed algorithm. This was illustrated with the data collected during a passive seismic experiment near the Pyhäsalmi underground mine. Nevertheless, for more detailed testing of the possibility of extracting body waves, it would be necessary to analyse the data collected with a higher-density seismic array near the mine.
The presented algorithm of stacking makes it possible to extract EGFs from ambient seismic noise with frequencies higher than 1 Hz recorded in quiet areas without strong sources of industrial noise using 2-D seismic arrays. This has been demonstrated by the application of our technique to the data collected in the Kuusamo Greenstone belt area that is characterized by a low level of anthropogenic seismic noise and has no industrial sites located nearby.
Data availability. The raw data and processed data of the SEIS-LAB project are embargoed until 31 December 2019 and will be published in open access after the end of embargo period. Please contact Elena Kozlovskaya (elena.kozlovskaya@oulu.fi) concerning access to the data.
Author contributions. EK and NA have written the majority of the text: introduction, algorithm description, description of results, and discussion conclusions sections; JaN, JoN, and EK planned the experiments in Pyhäsalmi and Kuusamo, organised field work and data acquisition, and performed raw data processing.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Advances in seismic imaging across the scales". It is not associated with a conference.