Constraints on Fracture Distribution in the Los Humeros Geothermal Field From Beamforming of Ambient Seismic Noise

. Faults and fractures are crucial parameters for geothermal systems as they provide secondary permeability allowing fluids to circulate and heat up in the subsurface. In this study, we use an ambient seismic noise technique referred to as three-component (3C) beamforming to detect and characterise faults and fractures at a geothermal field in Mexico. We perform 3C beamforming on ambient noise data collected at the Los Humeros Geothermal Field (LHGF) in Mexico. The LHGF is situated in a complicated geological area, part of a volcanic complex with an active tectonic fault system. Although 5 the LHGF has been exploited for geothermal resources for over three decades, the field has yet to be explored at depths greater than 3 km. Consequently, it is currently unknown how deep faults and fractures permeate, and the LHGF has yet to be exploited to its full capacity.

3D) recording at 100 Hz, and was sub-divided into two sub-networks. An inner and denser (∼1.6-2 km inter-station distance) pseudo-rhomboidal array, consisting of 27 stations, was laid out over the producing zone to retrieve the local seismicity mainly 145 associated with the injection and production operations (Toledo et al., 2019). An outer and sparser array was also developed; however, this array was not used for the 3C beamforming technique. The resulting data that was collected was waveform data and associated metadata, available from the GEOFON data centre under network code 6G (Toledo et al., 2019).
Following Löer et al. (2020), up to 17 stations of the dense broadband (DB) array were used with a frequency sensitivity down to below 0.01 Hz and a sampling rate of 200 Hz. These stations were all centred around the previously located micro-150 seismic events within the inner caldera (Löer et al., 2020). The data was pre-processed following Riahi et al. (2013) and Löer et al. (2018); it was downsampled to 10 Hz, bandpass filtered between 0.01-1 Hz, and cleared from linear trends. Spectral whitening and one-bit normalisation were applied in the time domain, normalising the frequency spectrum and retaining the phase information (Nakata et al., 2019), both of which are necessary for the beamformer, whilst absolute amplitudes are not required. A single time window's length corresponds to four times the minimum period; this was rounded up to the next power 155 of two to speed up Fourier transformation (Löer et al., 2020).
However, not all of these 17 DB stations were used for each day due to a lack of data availability. Only days with 8 stations or more were used, resulting in a total of 65 days between 27 Oct 2017 and 30 Dec 2017; a longer period of time was not used due to computational ability. Over this time range, the sources of ambient noise were some local microseism events, the Caribbean Sea and injection/production activity of the geothermal field. The array can potentially have an effect on the results; however, 160 the array response ( Fig. 2) suggests this is not the case and that the array has minimal effect on the results.

Three-component Beamforming
3C beamforming is an array method proposed by Riahi et al. (2013) and used here to measure the polarisation, phase velocity and azimuth of the seismic noise wavefield as a function of frequency. Therefore, retro-and prograde Rayleigh waves, as well as Love waves, can be identified, as polarization defines the particle movement with respect to the propagation direction (Löer 165 et al., 2018). Love waves have particle movement in the horizontal plane or orthogonal to the propagation direction (Löer et al., 2018) whereas Rayleigh waves are described as being an ellipse in the vertical plane, parallel or antiparallel to the direction of propagation. Therefore, Rayleigh waves are recorded on the vertical and radial components (Riahi et al., 2013). Anisotropy parameters were then fitted to the velocity versus azimuth histograms for both Love and Rayleigh waves.
A bootstrap algorithm (Riahi et al., 2013) was used to evaluate the uncertainty in the fit of these parameters to the histograms 170 that were plotted. These were assessed along with their statistical significance by recomputing the fitting process on bootstrap resamples of data. The bootstrap attempts to estimate the sampling distribution of the actual anisotropy parameters depending on the observed variability in the velocity estimates (Riahi et al., 2013). By starting from N azimuth and phase velocity pairs at a given frequency bin and polarization and randomly sampling (with replacement) an equally large set of N data points.
(number of resamples) times (Riahi et al., 2013). The exact methodology can be found in appendix A.
The 3C algorithm was later tested using ambient seismic noise to extract information about both isotropic and anisotropic surface wave velocities (Löer et al., 2018). Dispersion curves for Rayleigh and Love waves were computed, and anisotropy parameters were estimated for Love waves; the azimuthal source coverage was too limited to perform anisotropy analysis for Rayleigh waves (Löer et al., 2018).

Anisotropy curves as a Function of Depth
3C beamforming was performed on ambient seismic noise data recorded on 65 days in 2017. This provided seismic surface wave velocities for retro-and prograde Rayleigh waves as well as Love waves. However, analyses were not done on prograde Rayleigh waves, which are of a higher mode and have a much lower signal compared to Love and retrograde Rayleigh waves, giving a worse source coverage.

185
The beamforming algorithm was used to detect velocity variations with azimuth in the ambient noise wavefield. The data in the resulting histogram was fitted with an anisotropy curve, as in Riahi et al. (2013) and Löer et al. (2018). More details are provided in the Appendices. The bootstrap resampling was done 1000 times, allowing for a curve to be plotted for each resample in the background of the overall mean anisotropy curve, acting as an uncertainty, and improving the reliability of the results. This was plotted as both a histogram with the number of detections conveying the direction of the noise sources and as 190 polar plots to better visualise the fastest directions (fastest velocities' corresponding direction, which was extracted from the anisotropy curves). The direction of propagation is anti-clockwise from east, making an azimuth of 90 degrees equal to North, which was required due to the beamforming method.
Frequency has been related to depth before (Löer et al., 2020) using sensitivity kernels for different wave types obtained using Computer Programmes in Seismology (CPS) surface-wave inversion kernels (surf96) by Herrmann (2013). These kernels 195 indicate the depth that is predominantly sampled by a surface wave at a given frequency. The velocity model from Löer et al.
(2020) (shown in Appendices) and CPS was used to produce sensitivity kernels against depth for both Rayleigh and Love waves, thus, allowing us to relate differing frequencies of seismic waves to depth (Fig. 3). The depth values were picked where the kernel has half the amplitude of the peak (Rayleigh waves having two such depths and Love waves having one value being the surface), and then the middle of the two depths was taken to be the peak sensitivity for that frequency. The fastest directions 200 were then related to these depths along with the magnitude of apparent anisotropy (a mag ), which indicates how large the variability of the velocity is over the whole range of azimuths. For example, 5% anisotropy states that the fastest velocity is 5% larger than the average velocity. Most anisotropy values tend to be below 5%.

Estimating array-induced anisotropy in seismic noise beamforming
We measure velocity versus azimuth in surface wave ambient noise to investigate the azimuthal anisotropy of the subsurface.

205
However, as indicated by Löer et al. (2018) and Lu et al. (2018), for example, these anisotropy measurements can be affected by the geometry of the array used to perform the beamforming analysis.
We use a workflow that estimates the effect of the array geometry on the observed anisotropy by modelling a synthetic, isotropic wavefield in terms of phase shifts corresponding to waves propagating in different directions across the array. Because velocities are isotropic in our model (synthetic wavefield), that is, they do not vary with azimuth, any anisotropy that we observe 210 is a result of the distribution of stations or, in fact, sources. The effect of an uneven source distribution is mitigated by using a large number of time windows, each containing a different normal distribution of sources with a random mean but constant standard deviation as defined in the following. Considering each frequency of interest individually, we superimpose waves travelling in different directions by summing the corresponding phases at each station of the array. The resulting synthetic noise wavefield is then analysed using beamforming, and the detected velocities (v = f/k) are plotted against their azimuth 215 in a histogram. This is repeated n = 10000 times (emulating 10000-time windows) to populate the histogram sufficiently.
Every time, the dominant direction d of the wavefield is chosen randomly from between 0 and 360 degrees. We then use a normal distribution of m = 90 sources with d as the mean of the distribution and a standard deviation of w = 45 degrees. All these sources are assumed to act simultaneously in one-time window, generating plane waves that superimpose at the receiver locations ( Fig. 4a).

220
This synthetic wavefield is analysed following the beamforming method for real data, with the only difference that we consider phase shifts across stations only (not components), like in conventional (1C) beamforming, as these are decisive for measuring horizontal wave velocities across the array (and we are not modelling different wave types simultaneously). More details are provided in appendix A. For each time window, the maximum of the beam response is extracted (Fig. 4b), converted from wavenumber to velocity and plotted in a histogram that shows velocity as a function of azimuth. The anisotropy curve (1) (Smith and Dahlen, 1983) is fitted to the histogram (Fig. 5), using least absolute deviation. Because the initial synthetic wavefield model was isotropic; the resulting curve should be flat with b 0 corresponding to the isotropic model velocity and the anisotropy parameters b 1 to b 4 being equal to zero. If these are non-zero, however, and the curve is not flat, apparent anisotropy has been introduced due to the array geometry (or source distribution). The magnitude and fast direction of apparent 230 anisotropy for a given frequency and (isotropic model) velocity can be estimated from the fitted curve. As for the real data, to get an estimate of uncertainty, the histogram is resampled with a bootstrap algorithm B=1000 times, and a curve is fitted to the resampled data. In the end, the mean from all bootstrap curves is taken as an estimate for apparent anisotropy.
We repeat this process for all frequency-velocity pairs obtained from real data histograms to get the apparent anisotropy curve for each such pair. Finally, the anisotropy parameters b 1 to b 4 are subtracted from those found in real data a 1 to a 4 to 235 correct for the effect of the array: ( Fig. 6) 3 Results

Anisotropy with depth 240
In this study, a frequency range of 0.05 Hz to 0.5 Hz, with a frequency step of 0.05 Hz, was used. The lower limit was chosen due to the spatial aliasing limits of the array, and the upper limit was set to allow for investigation at greater depths, thus not looking at higher frequencies. Hence the following results are only within this frequency range; however, some exclusions were made due to a poor fit between the histogram and the anisotropy, especially at some low frequencies.
The sensitivity kernels for different frequencies of Rayleigh (Fig. 3a) and Love waves (Fig. 3b) show how sensitive the 245 different frequencies are to shear-velocity changes at different depths; the peaks being the peak depth sensitivity for that frequency. Rayleigh waves have sensitivity peaks at a significant range of depths, which is likely due to the elliptical particle motion of Rayleigh waves (Haldar, 2018). Referring to Yin et al. (2014), fundamental mode Rayleigh waves tend to perceive information at depths 1.3 -1.4 times deeper than Love waves; thus, Love waves are sensitive to shallower depths than Rayleigh waves ( Fig. 3b) which is likely due to the horizontal particle motion of Love waves. Yin et al. (2014) also suggests Love waves 250 should be sensitive at deeper depths than what we perceive from the Sensitivity kernels. Therefore, the estimated depth of penetration for Love waves should be deliberated with caution.
Because of their larger and better-constrained depth sensitivity, Rayleigh waves are more beneficial for this study and will be the focus of our analysis.
The initial histogram for retrograde Rayleigh waves at a frequency of 0.25 Hz is shown in Fig. 6. The direction of propagation 255 is anti-clockwise from East; 90 • is North. Different curves show the mean anisotropy (black), the anisotropy for each resamples of the bootstrap (grey) and the anisotropy corrected for the effect of the array (red). There are minimal changes in the anisotropy curve after correction, decreasing the velocity of two troughs at 120 • and 310 • and slightly broadening the azimuth of the two fastest directions at 45 • and 210 • , thus making these the clear fastest directions at this frequency. These slight corrections suggest a marginal interference of the array when source distribution is not accounted for.

260
To better visualise the fast directions, a comparison of the corrected curve, mean anisotropy and uncertainty are shown as polar plots (Fig. 7). The mean anisotropy has been corrected for the array interference (red), whereas the mean anisotropy (black) and the uncertainty (grey) have not been corrected for the effect of the array. It is evident that although the effect of the array is minor, the correction focuses on the fast direction of a Rayleigh wave at a frequency of 0.25 Hz to the NE-SW direction, therefore, indicating a potential anisotropic structure at that strike.

265
The fastest directions for each depth (gained from the sensitivity kernels for the frequencies) for both Retrograde Rayleigh and Love waves are shown in Fig. 8. The fastest directions have been corrected (Fig. 8) with respect to the array using the synthetic wavefield that was generated in 2.4. The black arrows convey the directional of the regional stress acting on the LHVC and thus altering strikes of faults/folds that the fast directions may be corresponding to. The fast directions for Rayleigh waves (Fig. 8a) tend towards the NE-SW strike at depths of 1.5 -2.5 km as well as 3.5 km, while depths of (Fig. 8b) have a 270 similar trend, but for shallower depths.
The stratigraphy of the LHGF is extremely complex and varies with depth with differing anisotropic signals; thus Fig. 9 shows the fast direction and apparent magnitude at different depths superimposed on the known lithology. Fig. 9 focuses on the North orientated fast directions. There are clear changes in the fastest direction with depth, which also correlates with the changes of magnitude. The fastest directions and apparent magnitude have been corrected for the effect of the array. The clear 275 dramatic shifts in fastest direction, especially for the shallower depths shown by Love waves, may be attributed to the differing known lithologies at varying depths, which will be explored further in the discussion.
These shifts in fast direction relate to the different zones (1 and 2), which clearly suggests (Fig. 9a) that zone 1 dominant anisotropy permeates to deeper depths than zone 2. However, Fig. 9b also shows a similar pattern of the shallower depths corresponding to zone 2 and the greatest depth matching the orientation of zone 1; this is rather unusual, which causes speculation 280 in the reliability of the Love waves results. Nonetheless, there is a greater interest in the deeper depths seen by Rayleigh waves.
Consequently, for discussion, analysis of Rayleigh waves will be focused on.

Discussion
The LHGF is a geologically complex area with a variety of features impacting surface wave propagation and possibly causing anisotropic behaviour (e.g., fault systems, magmatic intrusions, lithology, stress history, and hydrothermal flow, among other 285 plausible causations). In our discussion, we try to match our observations with results from other studies focusing on dominant contrasts that are likely to have a strong effect on propagation velocities.

Limitations
The results were corrected for any potential array effect, which, based on the corrections, had a minimal effect on the fast directions. The correction does not consider the effects introduced by the source distribution of the real data; instead, it assumes 290 a random source distribution (as described in section 2.4) to minimise the effect of sources on the apparent anisotropy. We acknowledge that the source distribution can have a similar effect as the station distribution (Lu et al., 2018), and future work will improve the synthetic wavefield generation and investigate source effects further. The lower frequencies were also not used because of the histograms' lower resolution, thus leading to high uncertainties.
Interpretation of the results assumes the seismic velocity is faster along the orientation of a fault based on the fact that 295 seismic waves will slow down when travelling across boundaries of different material through a fault rather than along it.
Another speculation can be linked to different temperature variations within the geothermal field, thus affecting the velocity of the seismic waves; however, this is very speculatory. Furthermore, the thermal state of the field is higher along the fault planes within the LHGF (Norini et al., 2019), more specifically, the Maxtaloya-Los Humeros fault swarm plane (Norini and Groppelli, 2020). Thus suggesting the seismic velocity is slower along faults rather than faster. Therefore, the opposite assumption may 300 be made. The thermal state of the field is higher along the fault planes in the Maxtolya-Los Humeros fault swarm (Norini et al., 2019), suggesting seismic velocity is slower along faults, so the opposite assumption may be made (Norini and Groppelli, 2020). There is also the potential that the fluid state, the known presence of hydrothermal waters flowing through the fractures, slows down the seismic velocity because of the shear component of Rayleigh and Love waves (Haldar, 2018), contradicting the assumption that seismic waves running along faults are faster (Telford et al., 1990). Testing this hypothesis would require 305 detailed numerical studies, which we aim to address in our future work.
The beamforming method takes the average over the whole area covered by the array. To observe lateral variations, for example, across the three main fault swarm sectors mentioned by Norini et al. (2019) and Rodríguez et al. (2012), it would be both interesting and beneficial to apply beamforming on smaller subarrays in the different sectors. However, these subarrays will suffer from poorer azimuthal and velocity resolution due to smaller apertures and the reduced number of stations. Additionally,

310
there are plans to apply this technique to other geothermal-related data sets to further improve the beamforming method and to test the sensitivity of the beamforming method on numerical, anisotropic earth models to better image the geothermal reservoir.
There was a degree of uncertainty when looking at the results for Love waves, as it was originally expected to have a lower phase velocity than Rayleigh waves, yet the results show similar velocities. To investigate this further theoretical dispersion curves for both Rayleigh and Love waves were computed using the 1-D shear velocity model from Löer et al. (2020) and CPS 315 from Herrmann (2013), using a Vp value of 1.73 for Rayleigh waves, which were then plotted alongside the observed dispersion curves, of 23-days worth of data (Fig. 10). Initially, this was done for lower velocity values in the uppermost crust, using the reference profile from Löer et al. (2020); however, this did not fit the data well, thus higher velocities (maximum likelihood from the inversion of Löer et al. (2020)) were used instead (Fig. B1). This indicates that the velocities in the uppermost crust may be high.

320
The days that were included for the observed dispersion curves (multi-coloured) were based on what days out of the 65 days used initially had the best source coverage due to a larger number of stations on those days thus giving more reliable dispersion curves. Meanwhile, some days that did have a large number of stations present were excluded due to erratic fluctuation of velocity at the low frequencies of the surface waves; which did not provide an accurate representation of the dispersion relation between velocity and frequency.

325
Comparing Fig. 10a and b to each other and the other results agree (to some degree of ambiguity, based on uncertainty (black bars) that the Love waves have, on average, a higher phase velocity than the Rayleigh waves do, thus supporting the fast Love waves velocities shown in the main results. Furthermore, the theoretical dispersion curves (red) for both Fig. 10a and b are within the range (to some extent) of the observed data, having an overall slightly lower phase velocity until a frequency of ∼0.34 Hz where the phase velocity of the red curve crosses over the mean observed data (black line), thus having a higher 330 overall phase velocity for the larger frequencies. This overall pattern of the theoretical curve in comparison to the observed data suggests that the upper crust may be fast, which might explain why the Love waves are so fast overall. However, the synthetic data does not fit the observed data that well, yet, the absolute velocities are not altering the anisotropy results. Hence, this is beyond the scope of this paper.
Nevertheless, it is evident that the degree of uncertainty for Love waves is greater than that of Rayleigh waves, due to the 335 lower number of detections for Love waves. This similarity in velocity further indicates that the depth difference between Rayleigh and love waves is not as large as seen in Fig. 9.

Fault Systems and Magmatic Intrusions
As the beamforming method takes the average of the whole area, orientations of faults and fractures for the whole LHGF may be detected. The changing fastest directions at varying depths match the orientation of known surface faults in different 340 sectors (Fig. 1b) as mentioned in 1.1. Two of these sectors' prominent orientations correspond to the fastest directions at certain depths, zone 1 from 2.8 -5.8 km (apart from at 3.5 km) and zone 2 from 0.1 -2.8 km, shown in Fig. 9. Rodríguez et al. (2012), also shows these sectors; their shear-wave splitting method provides lateral zoning; however, it lacks depth resolution. Using beamforming, we retrieve anisotropy related to depth. Combining this new information with Rodriguez/Norini's zones, gives us an indication at which depths which fracture orientations are dominant. Fig. 9 shows a connection between changes in the 345 fastest direction and changes in lithologies within the LHGF, which (as established earlier) have different fracture networks due to their differing compositions and formation stages which will affect the azimuthal anisotropy.
Zones 1 and 2 trends' can be clearly seen at different depths in Fig. 9, with the trends of the faults parallel to the TMVB respectively. The conventional beamforming response function for a three-component case is Equation (A2) (Riahi et al.,460 2013): S3C is the cross-spectral density matrix of the data, † represents the transpose and R(k, ξ) is the beamforming response (Riahi et al., 2013). S3C was calculated using the Fast Fourier Transform (Riahi et al. (2013); Press et al. (2007)).

A2 Surface wave azimuthal anisotropy 465
The surface wave phase velocities were found to vary with azimuth over a wide range of frequencies. This is explained by the anisotropy in the seismic parameters of the subsurface. The surface phase velocity varies due to anisotropy, which is defined in Equation (A3) and originally defined by Smith and Dahlen (1973).
v(θ) = a 0 + a 1 cos(2θ) + a 2 sin(2θ) + a 3 cos(4θ) + a 4 sin(4θ), v is the phase velocity in kms −1 , θ direction of propagation, which was measured as anti-clockwise from East, and a i being the 470 anisotropy parameters that depend on the subsurface. The uncertainty of the anisotropy parameters was then evaluated using bootstrap resamples of the histogram and re-fitting the curve (2.2).

A3 Synthetic Wavefield
The synthetic frequency domain signal at receiver r i corresponding to a single source at x l and wavenumber k l is given by where dependency on frequency has been omitted for brevity. Summing over all sources and wavenumbers provides the final signal at receiver i With this synthetic wavefield, we now follow the beamforming algorithm for real data: we compute the cross-spectral density matrix (SDM) S ij of the synthetic data by cross-correlating (multiplying with complex conjugate in the frequency domain) the 480 data at all stations: Looking at S for the case of a single source l = 1, it becomes clear that S contains the phase shifts between the stations i and j: The SDM is analysed using the conventional beamforming approach (Riahi et al., 2013), resulting in the beam response for 485 wavenumber k 0 (and frequency ω) where, is the array response vector as a function of (tested) wavenumber k 0 and the (2xn)-vector r contains the station coordinates of 490 all n stations. Note that we only consider phase shifts across stations (not components), like in conventional (1C) beamforming, as these are decisive for measuring horizontal wave velocities across the array.
Appendix B: Additional material B1 Velocity Model Fig. B1 shows the 1-D shear wave velocity model produced in Löer et al. (2020).

495
Code and data availability. Waveform data and associated metadata are available from the GEOFON data center under network code 6G (https://geofon.gfzpotsdam.de/doi/network/6G/2017, last accessed October 2021) and are embargoed until January 2023.
The code used to produce the synthetic wavefield and synthetic anisotropy for the array effect can be found at the following URL: https://github.com/HeatherKennedy21/Synthetic_Histograms Author contributions. Design, data processing, analysis, interpretation of data and writing was composed by Heather Kennedy. The original 500 concept and writing/reviewing of substantial sections of the paper was completed by Katrin Löer. Further analysis and interpretation of data was contributed by Amy Gilligan.
The interpretations and analyses were undertaken in the research facility at the University of Aberdeen, the underpinning financial and computer support for which is gratefully acknowledged. We acknowledge the GEMex project for access to and permission to publish examples from their proprietary data on which these interpretations and analyses were made and are grateful to MathWorks for providing academic licenses for their MATLAB software which was used to visualise and interrogate the seismic data.

510
A special thanks to Gianluca Norini, Institute of Environmental Geology and Geoengineering IGAG, for advising us with his expertise on the Los Humeros Geothermal Field and for contributing Fig.1b. Thank you to David Healy, Lecturer at the University of Aberdeen, for providing geological guidance and new ways of looking at the results. I would also like to thank David Cornwell, Lecturer at the University of Aberdeen, for his critical insight and comments, and Sophia Baker, PhD student at the University of Aberdeen, for proofreading the manuscript.      Observed dispersion curves (multi-coloured), mean observed curve (black) and theoretical curves (red), black bars represent the degree of error for the real data dispersion curves.