A systems-based approach to parameterise seismic hazard in regions with little historical or instrumental seismicity: active fault and seismogenic source databases for southern Malawi

Seismic hazard is commonly characterised using instrumental seismic records. However, these records are short relative to earthquake repeat times, and extrapolating to estimate seismic hazard can misrepresent the probable location, magnitude, and frequency of future large earthquakes. Although paleoseismology can address this challenge, this approach requires certain geomorphic setting, is resource intensive, and can carry large inherent uncertainties. Here, we outline how fault slip rates and recurrence intervals can be estimated by combining fault geometry, earthquake-scaling relationships, geodetically derived regional strain rates, and geological constraints of regional strain distribution. We apply this approach to southern Malawi, near the southern end of the East African Rift, and where, although no onfault slip rate measurements exist, there are constraints on strain partitioning between border and intra-basin faults. This has led to the development of the South Malawi Active Fault Database (SMAFD), a geographical database of 23 active fault traces, and the South Malawi Seismogenic Source Database (SMSSD), in which we apply our systems-based approach to estimate earthquake magnitudes and recurrence intervals for the faults compiled in the SMAFD. We estimate earthquake magnitudes of MW 5.4–7.2 for individual fault sections in the SMSSD and MW 5.6–7.8 for wholefault ruptures. However, low fault slip rates (intermediate estimates ∼ 0.05–0.8 mm/yr) imply long recurrence intervals between events: 102–105 years for border faults and 103– 106 years for intra-basin faults. Sensitivity analysis indicates that the large range of these estimates can best be reduced with improved geodetic constraints in southern Malawi. The SMAFD and SMSSD provide a framework for using geological and geodetic information to characterise seismic hazard in regions with few on-fault slip rate measurements, and they could be adapted for use elsewhere in the East African Rift and globally.

The attributes associated with each fault in the SMAFD are listed and briefly described in Table 1. 295 These resemble the attributes in the GEM GAF-DB that describe a fault's geomorphic attributes and 296 confidence that it is still active (Styron and Pagani, 2020). To incorporate the multidisciplinary 297 approach we have used to map faults in southern Malawi, we also include a 'Location Method' 298 attribute, which details how the fault was mapped (Table 1). Some fault attributes used in the GEM 299 GAF-DB such as slip rates are not included in the SMAFD, as these data have not been collected in 300 southern Malawi. We instead derive these attributes as outlined in Sect. 4 and incorporate them 301 separately into the SMSSD (Table 2). However, within each database, a numerical ID system is used 302 make the two databases compatible (Tables 1 and 2 In the absence of direct on-fault slip rate estimates, we suggest that they can be estimated through a 317 systems-level approach in which geodetically derived plate motion rates are partitioned across faults 318 in a manner consistent with their geomorphology and regional tectonic regime. Although such an 319 approach has been used before over small regions (Cox et al., 2012;Litchfield et al., 2014), it has 320 not been applied to an entire fault system. In addition, we outline how the uncertainties and 321 alternative hypotheses that are inherent to this approach can, in common with seismic hazard 322 practice elsewhere, be explored with a logic tree approach ( Fig. 6  Faults that are closely spaced across strike, but not physically connected, may also rupture together 345 through 'soft linkages' (Childs et al., 1995;Wesnousky, 2008;Willemse, 1997;Zhang et al., 1991). 346 In the SMSSD we follow empirical observations and Coulomb stress modelling that suggests that 347 normal fault earthquakes may rupture across steps whose width is <20% of the combined length of 348 the interacting sections, up to a maximum separation of 10 km (Biasi and Wesnousky, 2016;Hodge 349 et al., 2018b), and we use this as a criteria to assign whether two en echelon faults in the SMSSD 350 may rupture together. 351 352 A number of geometrical attributes are then assigned to both individual sections and whole faults in 353 the SMSSD (Table 2). Section length (Lsec) is defined as the straight-line distance between section 354 end points (Fig. 4b). This approach avoids the difficulty of measuring the length of fractal features, 355 and accounts for the hypothesis that small-scale (<km scale) variations in fault geometry in southern 356 Malawi may represent only near-surface complexity (depths <5 km), and that the faults are relatively 357 planar at depth (Hodge et al., 2018a). However, it only provides a minimum estimate of section 358 length. For segmented faults in the SMSSD, fault length (Lfault) is the sum of Lsec, otherwise Lfault is 359 the distance between its tips (Fig. 4b). Since each GIS feature in the SMSSD represents a distinct 360 earthquake source, we consider that Lsec and/or Lfault must be >~5 km, except in the case of linking where Lfault > 5km, and where C1 and β are empirically derived constants and equal 17.5 and 0.66 389 respectively for interplate dip-slip earthquakes (Leonard, 2010). As shown in

Estimating fault slip rates 396
For a narrow amagmatic continental rift such as the EARS in southern Malawi, the first step to 397 estimate slip rates is to divide the rift along its axis into its basins (Fig. 2b)

407
(3) 408 where (i) is the fault or fault section slip azimuth, v and  are the horizontal rift extension rate and 409 azimuth, α is a weighting applied to each fault depending on whether it is a border (αbf) or intrabasin 410 (αif) fault, and it is divided by the number of mapped border faults (nbf) or intrabasin faults (nif) in 411 each basin (Fig. 6). Though Eq. 3 is specific for rifts, it could be adapted in other tectonic settings 412 where there is an a priori understanding of the rate and distribution of regional strain, for example to 413 distribute regional strain between the basal detachment and thrust ramps in a fold and thrust belt 414 (Poblet and Lisle, 2011), between multiple subparallel faults in a strike-slip system, or assess more In the SMSSD, the horizontal extension rate, v, is taken from the plate motion vector between the 444 Rovuma and Nubia plates at the centre of each individual basin (Table 3,  with  set to 061° and 085° respectively). An example of these slip rate calculations for the central 471 section of the Chingale Step fault is provided in Fig. 7. 472 473

Earthquake magnitudes and recurrence intervals 474
We estimate earthquake magnitudes in the SMSSD by applying empirically derived scaling 475 relationships between fault length and earthquake magnitude. Scaling relationships between fault 476 length and average single event displacement (D ̅ ) can then be combined with slip rate estimates to 477 calculate earthquake recurrence intervals (R) through the relationship R=D ̅ /slip rate (Wallace, 1970). 478 To select an appropriate set of earthquake scaling relationships for the SMSSD, we consider three 479 previously reported regressions, and apply them to its mapped faults: (1) between normal fault Where upper estimates of R are calculated by dividing the upper estimate of ̅ by the lowest 509 estimate of fault/section slip rate and vice versa (Fig. 6). An example of these earthquake source 510 calculations for the central section of the Chingale Step fault is provided in Fig. 7. 511

Key features of the SMAFD and SMSSD 513
In this section, we briefly describe the fault mapping collated in the SMAFD, and then the present 514 fault slip rates, earthquake magnitudes, and recurrence intervals in the SMSSD as estimated by our 515 systems-based approach. 516 517

Border and intrabasin faults in southern Malawi 518
The SMAFD contains 23 active faults across five EARS basins. The northernmost faults lie in the 519 NW-SE trending Makanjira Graben, a full graben where two border faults, the Makanjira and 520 Chirobwe-Ncheu, clearly define either side of the rift (  (Fig. 9). Slip rates tend to be 561 relatively fast in the Makanjira Graben (Fig. 9c), as the extension rate is higher (Table 3), and its 562 NNW-SSE striking faults are more optimally oriented to the regional extension direction (Fig. 2). 563 The difference between upper and lower slip rate estimates in the SMSSD logic tree is two orders of 564 magnitude; ~0.05-5 mm/yr for the border faults and ~0.005-0.5 mm/yr on the intrabasin faults ( intrabasin faults in the SMSSD (Fig. 9). 596 597

Sensitivity analysis 598
Upper and lower estimates of R differ by up to three orders of magnitude in the SMSSD (Fig. 10). 599 To investigate these uncertainties, we performed a multi-parameter sensitivity analysis following the 600 methods presented in Box et al. (1978) and Rabinowitz and Steinberg (1991). Full details of this 601 analysis are given in Appendix A. In summary, 7 parameters that contribute to uncertainty in R for 602 the central section of the Chingale Step fault are considered (Table 5). By exploring all possible 603 combinations in which these 7 parameters are set at their upper or lower estimates, 128 (i.e. 2 7 ) 604 different values of R can be calculated. However, we instead considered 64 parameter combinations 605 that were chosen following a fractional factorial design (Table S1; Box et al., 1978). In this way, 606 parameter combinations that offer little insight into how a system works are omitted, thereby 607 increasing the efficiency of this analysis at minimal cost to its validity (Rabinowitz and Steinberg, 608 1991). From these combinations, the natural log of the average value of R when a parameter (k) is 609 set at its upper (lnR ̅̅̅̅ (k+)) and lower (lnR ̅̅̅̅ (k-)) value is calculated and the difference between these This analysis indicates that R is most sensitive to uncertainties in the partitioning of strain between 615 border and intrabasin faults in the rift (i.e. αif/nif), the rift extension rate (v), and the C2 parameter in 616 Eq. (5), and least sensitive to uncertainties in the rift's extension azimuth, and the C1 parameter in 617 Eq. (5) ( Table 5). If, however, v and its associated uncertainties were estimated using a different 618 Nubia-Rovuma Euler pole solution (Fig. A1, Table 3; Stamps et al., 2008), R estimates are least 619 sensitive to v and most sensitive to C2 (Table 5). There are no interaction effects between two 620 separate parameters that may influence their effect on R (Table S2). ~0.8 mm/yr) Zomba and Thyolo faults is at a particularly high risk (Fig. 11a).  (Table 5). This factor 664 controls the amount of displacement for a given rupture area (Leonard, 2010). It is therefore likely 665 related to earthquake stress drops, and uncertainty in C2 in southern Malawi will only be reduced by 666 recording more events here or in similar tectonic environments (i.e. normal fault earthquakes in 667 regions with low (~1-10 mm/yr) extension rates and thick (20-35 km) seismogenic crust). 668 669

Incorporation of the SMSSD into Probabilistic Seismic Hazard Analysis 670
The SMSSD contains the attributes (earthquake magnitudes and R estimates) that allow it to be used to allow each fault to host a range of earthquake sizes that follow a frequency-magnitude distribution 686 that is consistent with its moment rate (Youngs and Coppersmith, 1985), with this moment rate 687 derived from the instrumental record and data incorporated into the SMSSD. analysis on the logic tree approach used to calculate these recurrence intervals (Fig. 6), it is possible 729 to determine which parameters contribute most to this uncertainty, and therefore guide future 730 research directions that will help constrain them in future iterations of the SMSSD. This analysis is 731 briefly described in the main text (Sect. 5.4, Table 5), and is documented fully below. 732 733 Here, we follow the multiparameter sensitivity analysis presented by Rabinowitz and Steinberg 734 (1991). This study conducted sensitivity analysis for the parameters that feed into PSHA, where the 735 output metric is the probability of exceedance of a given level of ground shaken. For the SMSSD, 736 we adapt this method to test the sensitivity of seven parameters that are used to calculate earthquake 737 recurrence intervals (R, Eq. A1, Table 5). This metric is chosen as it fully incorporates the aleatory 738 uncertainties in rupture length, and epistemic uncertainties in fault slip rates and the Leonard (2010) 739 scaling relationships (Fig. 6). This analysis is performed for the Chingale Step fault central section 740 (Fig. 4) Eq. A1 is essentially a combination of Eqs. 3, 5, and 6 in the main text, and its application with the 750 SMSSD logic tree to calculate R for the Chingale Step fault central section is shown in Fig. 7. There 751 are 5 intrabasin faults in the Zomba Graben where the Chingale Step fault is situated (Fig. 2), and in 752 this analysis, this parameter is not treated as an uncertainty. However, for simplicity, it is combined 753 with αif to give the 'component of rift extensional strain' parameter, which is defined by αif/nif (Table  754   5 ruptures or just the central section (Fig. 7). Hence uncertainity in this parameter is not considered 757 here, and it is set at 290°, which is the average value for these two rupture scenarios. When assessing 758 the influence of v, we consider two geodetic models ( Fig. A1; Saria et al., 2013;Stamps et al., 759 2008), and perform this sensitivity analysis for both. 760

761
The method presented by Rabinowitz and Steinberg (1991) involves a two-level fractional factorial 762 multiparameter design, where each parameter is restricted to the two levels which will give lower or 763 upper estimates of R (Table 5). Ideally, these levels would be symmetric about the intermediate case, 764 however, in the SMSSD this is not possible for the v, L, and C2. Compared to a 'one at-a-time 765 (OAT)' parameter analysis, a multiparameter analysis allows us to assess how different parameters 766 interact with each other, and so more fully explore the parameter space (Rabinowitz and Steinberg, 767 1991). This is achieved through a factorial design, which for the seven parameters (k) tested here 768 would generate 128 (i.e. 2 7 ) possible combinations in a full two-level factorial approach. However, 769 in a fractional factorial design, just a subset of these combinations is assessed. This approach 770 recognises that many of the combinations in a full factorial design offer little insight into how a 771 system works, and that this can instead be achieved at minimal cost to the results by considering a 772 carefully selected subset of these combinations (Box et al., 1978;Rabinowitz and Steinberg, 1991). 773 In this analysis, 2 k-p combinations are assessed where p is the number of generators and is set at 1. 774 This results in the assessment of 64 combinations (Table S1) and a 'resolution' of 5, which means it 775 is possible to estimate the main effects of each parameter (Eq. A2), interactions between two 776 parameters (Eq. A3), but not interactions between three parameters (Box et al., 1978). 777

778
The main effect (A) of one parameter (e.g. fault dip, ) is quantified from the difference between the 779 average of the natural log of recurrence interval (lnR ̅̅̅̅ ) for the 32 combinations in Table S1  By applying a multiparameter approach it is also possible to the quantify parameter-parameter 786 interaction effects, for example, if the effect of  depends on the choice of rift extension azimuth (). 787 To do this, the results in Table S1 can be divided into two sets with 2 k-p-1 combinations each 788 depending on which level of  was applied. If there is no interaction effect between these two parameters, then  is 0. Otherwise, the size of the 797 effect is proportional to the magnitude of . In addition, we demonstrate our results in terms of an 798 empirical cumulative distribution function for the values of lnR reported in Table 1 (Fig. A2a), and 799 following Rabinowitz and Steinberg (1991), values of A in a normal probability plot (Fig. A2b). parameter that contributes most to uncertainties of R in the SMSSD is the component of regional 803 extensional strain that each fault accommodates (A = 3.05, Table 5). This essentially means that lnR 804 is higher by 3.05 when this component is set at its high value compared to its lower, or that R is ~21 805 times (e 3.05 ) higher when 10% of regional extensional strain is assigned to the Chingale Step fault as 806 opposed to 2%. The importance of this parameter is also demonstrated by the fact that it does not 807 plot close to the normal distribution line in Fig. A2b. The parameters with the next highest main 808 effect on R are v and C2, whilst estimates of R are least sensitive to uncertainties in  (Table 5) considerably less sensitive to uncertainites in rift extension rates, and the C2 parameter has the 811 biggest influence on R (Table 5). Multiparameter effects are all equal to zero (Table S2)  The results of the sensitivity analysis reported here are specific to estimates of R for the Chingale 816 Step fault central section, however, results should be broadly applicable to all other faults in the 817 SMSSD as R was calculated following the same steps. There will, however, be differences for faults 818 that are not segmented (where L is not an uncertainty) or that have more than the three sections 819

Author Contributions 831
JW and LW led the fault mapping from TanDEM-X data, and HM led the fault mapping using 832 aeromagnetic data. All authors participated in the fieldwork. LW conducted analysis of geodetic 833 data. JW designed the method to obtain fault slip rates and earthquake source parameters with input 834 from all co-authors. JB and AF secured the funding for this project. All authors contributed to 835 manuscript preparation, but JW had primary responsibility. 836

Competing interests 837
The authors declare that they have no conflict of interest.     (Fig. 6), performed for the central 1417 section of the Chingale Step Fault (Fig. 4b). This is an intrabasin fault in the Zomba Graben, where 1418 the number of intrabasin faults (nif) is five. A multiparameter sensitivity analysis for these 1419 calculations is documented in Appendix A. 1420   Table S1 (blue line). This CDF is compared to a standard normal CDF (red line) with the same 1460 mean value and standard deviation as the values in Table S1. (b) Normal probability plot of the 1461 parameter effects assessed in the sensitivity analysis and reported in Table 5. The most important 1462 effects are those that plot above a standard normal distribution (red line). Line is solid when within 1463 first and third quartiles of data and dashed when outside. 1464 1466   Table 1 1467    (Fig. 1b). The velocity, azimuth, and uncertainties of each vector is 1478 also reported given the Nubia-Rovuma Euler poles reported in Saria