Impact of upper mantle convection on lithosphere hyper-extension and subsequent convergence-induced subduction

We present two-dimensional thermo-mechanical numerical models of coupled lithosphere-mantle deformation, considering the upper mantle down to a depth of 660 km. We consider visco-elasto-plastic deformation and for the lithospheric and upper mantle a combination of diffusion, dislocation and Peierls creep. Mantle densities are calculated from petrological phase diagrams (Perple_X) for a Hawaiian pyrolite. The model generates a 120 Myrs long geodynamic cycle of subsequent extension 5 (30 Myrs), cooling (70 Myrs) and convergence (20 Myrs) in a single and continuous simulation with explicitly modelling convection in the upper mantle. During lithosphere extension, the models generate an approximately 400 km wide basin of exhumed mantle bounded by hyper-extended passive margins. The models show that considering only the thermal effects of upper mantle convection by using an effective thermal conductivity generates results of lithosphere hyper-extension that are similar to the ones of models that explicitly model the convective flow. Applying a lower viscosity limit of 5×1020 Pa s 10 suppresses convection and generates results different to the ones for simulations with a low viscosity asthenosphere having minimal viscosity of approximately 10 Pa s. During cooling without far-field deformation, no subduction of the exhumed mantle is spontaneously initiated. Density differences between lithosphere and mantle are too small to generate a buoyancy force exceeding the mechanical strength of the lithosphere. The extension and cooling stages generate self-consistently a structural and thermal inheritance for the subsequent convergence stage. Convergence initiates subduction of the exhumed 15 mantle at the transition to the hyper-extended margins. The main mechanism of subduction initiation is thermal softening for a plate driving force (per unit length) of approximately 15 TN m−1. If convection in the mantle is suppressed by high effective thermal conductivities or high, lower viscosity limits, then subduction initiates at both margins leading to divergent double-slab subduction. Convection in the mantle assists to generate a single-slab subduction at only one margin, likely due to mantle flow which exerts an additional suction force on the lithosphere. The first-order geodynamic processes simulated in the geodynamic 20 cycle of subsequent extension, cooling and convergence are applicable to orogenies that resulted from the opening and closure of embryonic oceans bounded by magma-poor hyper-extended passive margins, which might have been the case for the Alpine orogeny. 1 https://doi.org/10.5194/se-2020-88 Preprint. Discussion started: 25 May 2020 c © Author(s) 2020. CC BY 4.0 License.


Convection in the Earth's mantle
In general, the term convection can be used to describe any motion of a fluid driven by external or internal forces (Ricard et al., 1989). Prout (1834) derived this term from the Latin word "convectio" (to carry, or to convey) to distinguish between advection 25 dominated heat transfer and conduction, radiation dominated heat transfer.
On Earth, heat transfer through the lithosphere is dominated by thermal conduction while heat transfer through the underlying mantle is dominated by advection of material (e.g. Turcotte and Schubert, 2014). Convection can occur over the whole mantle, down to the core-mantle boundary, or it can be layered. At temperature and pressure conditions corresponding to a depth of about 660 km, the mineralogy of peridotite changes from γ-spinel-olivine to perovskite + magnesiowüstite. This phase 30 transition is endothermic, which means it has a negative pressure-temperature, so-called Clapeyron slope. This phase transition likely impedes the penetration of cold slabs subducting into the lower mantle and hot plumes rising into the upper mantle . The 660-km phase transition can therefore represent a natural impermeable boundary, that separates two convecting layers. Laboratory experiments, tomographic images and calculations on the Earth's heat budget deliver evidence that a mixed mode of both types best explains convection in the present-day Earth's mantle (Li et al., 2008;Chen, 2016). 35 Any convecting system can be described by a dimensionless number, the so-called Rayleigh number. It is defined as the ratio of the diffusive and the advective time scale of heat transfer (see also, eq. B1). The critical value of the Rayleigh number necessary for the onset of convection is typically in the order of 1000 . Convection in the Earth's mantle can occur at Rayleigh numbers in the range of 10 6 -10 9 depending on the heating mode of the system and whether convection is layered or it includes the whole mantle . The higher the Rayleigh number, the more vigorous is 40 the convection, i.e. advection of material occurs at a higher speed. Vigour of both whole and layered mantle convection is inter alia controlled by the effective viscosity of the mantle. However, compared to the density and thermal structures, the viscosity structure of the mantle is subject to large uncertainty. Viscosity is not a direct observable and can only be inferred by inverting observable geophysical data such as data for glacial isostatic adjustment (e.g. Mitrovica and Forte, 2004)) or seismic anisotropy data (e.g. Behn et al., 2004). Especially at depths of ca. 100-300 km, in the so-called asthenosphere, the inferred in the appendix A. A convection benchmark of the applied algorithm is presented in appendix C. The applied algorithm has already been used to model processes at different scales, such as deformation of eclogites on the centimetre-scale (Yamato et al., 2019), crystal-melt segregation of magma during its ascent in a meter-scale conduit (Yamato et al., 2015), rifting of continental lithosphere (Duretz et al., 2016b;Petri et al., 2019), stress calculations around the Tibetan Plateau  and within and around the subduction of an oceanic plate (Bessat et al., 2020), as well as modelling Precambrian 100 orogenic processes (Poh et al., 2020).

An approach to model coupled lithosphere-mantle deformation
We test the algorithm for capability of reproducing first order features of a long-term (>100 Myrs) geodynamic cycle namely: (i) formation of hyper-extended magma-poor rifted margins during a 30 Myrs extension period applying an absolute extension velocity of 2 cm yr −1 . (ii) Separation of the continental crust and opening of a ca. 400 km wide marine basin floored by 105 exhumed mantle material. (iii) Generation of upper mantle convection during a 70 Myrs cooling period without significant plate deformation. (iv) Subsequent convergence for a period of 20 Myrs and horizontally forced subduction initiation in a selfconsistent way, that means without modifying the simulation by, for example, adding ad-hoc a prominent weak zone across the lithosphere. During convergence, the self-consistently evolved passive margin system is shortened applying an absolute convergence velocity of 3 cm yr −1 .

110
For simplicity, we consider lithosphere extension that generates magma-poor hyper-extended margins and crustal separation leading to mantle exhumation. This means that we do not need to model melting, lithosphere break-up, mid-ocean ridge formation and generation of new oceanic crust and lithosphere. Such Wilson-type cycles, involving only embryonic oceans, presumably formed orogens such as the Pyrenees, the Western and Central Alps, and most of the Variscides of Western Europe (e.g. Chenin et al., 2019). Values for deformation periods and rates in the models presented here are chosen to allow for 115 comparison of the model evolution to the Alpine orogeny.
Further, tomographic images from the Mediterranean show large p-wave anomalies in the transition zone (Piromallo and Morelli, 2003) indicating that the 660-km phase transition inhibits the sinking material to penetrate further into the lower mantle. Therefore, we do not include the lower mantle into the model domain and assume that mantle convection is layered. Figure 1 shows the initial configuration of the models. 120 We present 2D thermo-mechanical numerical simulations. We model a long-term (>100 Myrs) geodynamic cycle of subsequent lithospheric extension, cooling and horizontally forced subduction coupled to upper mantle convection in a single continuous simulation.
As mentioned above, the viscosity structure of the mantle is far from being well constrained. Therefore, we test (1) models with different upper mantle viscosity structures. In model M1, the reference model, the asthenosphere is assumed to be weak 125 with values for viscosity in the order of ≈ 10 19 Pa s resulting from the applied flow laws. In models M2 and M3 the asthenosphere is assumed to be stronger. Values for viscosities in the asthenosphere are limited by a numerical cut-off value of 1×10 20 Pa s in M2 and 5×10 20 Pa s in M3. Coupling of lithosphere-mantle deformation is achieved by resolving numerically both lithospheric deformation and upper mantle convection in M1-3. (2) We further test the impact of parameterizing convection on the cycle by scaling the thermal conductivity to the Nusselt number of upper mantle convection (see app. B). The effective 130 conductivity approach has been used, for example, in mantle convection studies for planetary bodies when convection in the mantle is too vigorous to be modelled explicitly (e.g. Zahnle et al., 1988;Tackley et al., 2001;Golabek et al., 2011). Also, it has been used in models of back-arc lithospheric thinning through mantle flow that is induced by subduction of an oceanic plate (e.g. Currie et al., 2008) and in models of lithosphere extension and subsequent compression (e.g. Jammes and Huismans, 2012). In these models, we also assume both a weak asthenosphere (as in M1) in model M4 and a strong asthenosphere (as in 135 M2) in model M5. (3) We finally investigate the role of the olivine rheology. To this end, we perform an additional model M6 in which the material parameters of the dislocation and diffusion creep mechanism of a dry olivine rheology are replaced by the parameters of a wet olivine rheology (table A1). In M6, values for all the other parameters, both physical and numerical, are initially equal to those set in M1. Within the error range of values for activation volume and energy of the wet olivine rheology, the viscosity is calibrated to the data obtained by Ricard et al. (1989). However, using the highest possible values for the wet 140 olivine flow law parameters, the maximum viscosity in the upper mantle is initially one order of magnitude lower compared to models M1-5. A summary of all simulations is given in table 1 and all material parameters are summarised in table A1.

Model configuration
The model domain is 1600 km wide and 680 km high and the applied model resolution is 801×681 grid points ( fig. 1).
Minimum z-coordinate is set to -660 km and the top +20 km are left free to allow for build-up of topography. The top surface 145 (initially at z = 0 km) is stress free. Thus, its position evolves dynamically as topography develops. Mechanical boundary conditions on the remaining boundaries are set to free slip at the bottom and constant material inflow/outflow velocities at the left and right boundary. The boundary velocity is calculated such that the total volume of material flowing through the lateral boundary is conserved. The transition between inflow and outflow occurs at z = -330 km and not at the initially imposed lithosphere-asthenosphere boundary (LAB). Values for deviatoric stresses at this depth are significantly lower compared to 150 those at the base of the lithosphere. This choice avoids boundary effects close to the mechanical lithosphere and the LAB can develop freely away from the lateral model boundaries. We use material flow velocity boundary condition rather than bulk extension rates to deform the model units. Applying bulk extension rates and deforming the model domain would change the height of the model domain which has strong control on the Rayleigh number of the system (eq. B1).
Also, evolution of passive margin geometries becomes dependent on the model width when using bulk extension rates as 155 mechanical boundary condition (Chenin et al., 2018). It is therefore more practical to use material flow velocity boundary conditions in the type of models presented here.
Initial temperature at the surface is set to 15°C and temperatures at the crust-mantle (Moho) and at the LAB are 600°C and 1350°C, respectively. Assuming an adiabatic gradient of 0.49°C km −1 (see appendix B), the temperature at the model bottom is 1612°C. Thermal boundary conditions are set to isothermal at the bottom and at the top of the domain and the left and right 160 boundaries are assumed to be insulating (i.e. no heat flows through lateral boundaries). Model units include a 33 km thick, mechanically layered crust which overlies an 87 km thick mantle lithosphere on top of the upper mantle. The resulting initial thickness of the lithosphere is thus 120 km. The crust includes three mechanically strong and four mechanically weak layers.
The thickness of the weak layers is set to 5 km each, the thickness of the uppermost and lowermost strong layer is set to 4 km, whereas the strong layer in the middle is 5 km thick. This thickness variation allows to match the total 33 km thickness of the 165 crust without introducing an additional vertical asymmetry. Mechanical layering of the crust was chosen, because it is a simple way of considering mechanical heterogeneities in the crust. The layering leads to the formation of numerous structural features observed in natural hyper-extended passive margins (Duretz et al., 2016b) without relying on pre-defined strain softening.
We consider viscous, elastic and brittle-plastic deformation of material in all models presented here. Viscous flow of material is described as a combination of several flow laws. We use dislocation creep for the crustal units and dislocation, diffusion and 170 Peierls creep for mantle units (see also appendix A). The initial viscosity profile through the upper mantle is calibrated to match viscosity data obtained by Ricard et al. (1989). The applied flow law parameters lie within the error range of the corresponding laboratory flow law estimates (see tab. A1).
The difference between mantle lithosphere and upper mantle is temperature only, i.e. all material parameters are the same.
Density of the crustal phases is computed with a simple equation of state (eq. A3), whereas density of the mantle phase is pre- A detailed description of the phase transitions and how the initial thermal field is calibrated is given in appendix B. Surface processes (e.g., erosion and sedimentation) are taken into account by a kinematic approach: if the topography falls below a level of 5 km depth or rises above 2 km height, it undergoes either sedimentation or erosion with a constant velocity of 0.5 mm yr −1 . In case of sedimentation, the generated cavity between the old and corrected topographic level is filled with sediments, 180 alternating between calcites and pelites every 2 Myrs. This simple parameterization is employed to avoid exaggeratedly high and low topography.  The overall evolution of the extension period in M5 is more similar to M2 than to M1. In M1-5, the 1350°C isotherm does not come closer than 30 km to the surface. Horizontally-averaged vertical temperature profiles are similar in M1-5 ( fig. 2(a) & ≈ +300 km in fig.6(b)). Magnitude of material flow speed is in the order of 1.5 cm yr −1 . The average Rayleigh-number of the convecting system in M2 is approximately 3.92×10 6 and is about a factor 2.5 lower compared to M1. M3 develops four large convection cells, two below each margin, that are active up to depths of approximately 600 km (see arrows fig. 6(c)).

250
Downward directed movement of material occurs with ca. 2 cm yr −1 (darker blue regions at x ≈ -500 km and x ≈ 500 km in fig.6(c)). The average Rayleigh number of the system is ca. 1.18×10 6 , which is about a factor 8.4 smaller compared to M1. In    Like in M1, one shear zone forms below the right margin in M4. At this early stage of subduction initiation, the strain rate in the shear zone is in the order 10 −14 -10 −13 s −1 . In the region of the shear zones, the temperature is increased, which is

Estimates for plate driving forces
The vertically-integrated second invariant of the deviatoric stress tensor,τ II , is a measure for the strength of the lithosphere and twice its value is representative for the horizontal driving force (per unit length, F D hereafter) during lithosphere extension and Currently, the processes and tectonic settings leading to subduction initiation remain unclear (Stern, 2004;Stern and Gerya, 2018;Crameri et al., 2019). Stern (2004) proposed two fundamental mechanisms for subduction initiation: (1) Spontaneous (or vertically forced, Crameri et al. (2020)) subduction initiation occurs, for example, due to densification of the oceanic lithosphere during secular cooling. Cloos (1993) proposed that the density increase of cooling oceanic lithosphere compared to the underlying asthenosphere is in the order of 80 kg m −3 . According to Cloos (1993), this difference is sufficient 305 to initiate subduction spontaneously by negative buoyancy of the oceanic lithosphere (see Stern (2004); Stern and Gerya (2018, & references therein) for detailed explanation). However, McKenzie (1977) and Mueller and Phillips (1991) showed that the forces acting on the lithosphere due to buoyancy contrasts are not high enough to overcome the strength of the cold oceanic lithosphere. Observations of old plate ages (>100 Myrs) around passive margins in the South Atlantic indicate long-term stability of old oceanic lithosphere.

310
In the models presented here, the applied thermodynamic density of the Hawaiian pyrolite leads to density differences between the exhumed mantle in the basin and the underlying asthenosphere of ca. 10-20 kg m −3 after 99 Myrs in model history. These buoyancy contrasts are ca. 4× smaller than the contrast proposed by Cloos (1993). Boonma et al. (2019) report a density difference of +19 kg m −3 for an 80 km thick continental lithosphere and +17 kg m −3 for an oceanic lithosphere of 120 Myrs of age. These values are in agreement with the values we report in our study. In the 2D models presented here, these 315 buoyancy contrasts are insufficient to overcome the internal strength of the cooled oceanic lithosphere and initiate subduction spontaneously. Our results are, hence, in agreement with the stability of old oceanic lithosphere in, for example, the South Atlantic, where even an additional ridge push of approximately 3.9 TN m −1 (Turcotte and Schubert, 2014) was not sufficient to initiate subduction. Figure 7(a) shows the topographic elevation at the end of the cooling period. In M1-5, the difference between the lowest point in the basin and the highest point of the margins is ca. 5 km which is predicted by the principle of 320 isostasy for a ca. 30 km thick crust with an average density of 2800 kg m −3 (Turcotte and Schubert, 2014). Therefore, the topography of the models at the end of cooling is close to the topography predicted by isostatic equilibrium between continent and exhumed mantle.
(2) Induced (or horizontally forced, Crameri et al. (2020)) subduction initiation occurs, for example, due to far-field plate motion. In fact, many numerical studies that investigate subduction processes do not model the process of subduction initiation.

325
In these studies, a major weak zone across the lithosphere is usually prescribed ad-hoc in the initial model configuration to enable subduction (Ruh et al., 2015;Zhou et al., 2020). Another possibility to model subduction is to include a prescribed slab in the initial configuration. This means that subduction has already initiated at the onset of the simulation (Kaus et al., 2009;Garel et al., 2014;Holt et al., 2017;Dal Zilio et al., 2018). In our models, subduction is initiated self-consistently, which means here: (i) we do not prescribe any major weak zone or an already existing slab, and (ii) the model geometry and temperature 330 field at the onset of convergence (100 Myrs) were simulated during a previous extension and cooling phase with the same numerical model and parameters. Subduction is initiated during the initial stages of convergence and subduction initiation is, hence, induced by horizontal shortening. Notably, Crameri et al. (2020) analysed more than a dozen documented subduction zone initiation events from the last hundred million years and found that horizontally forced subduction zone initiation is dominant over the last 100 Myrs. During convergence in our models, shear heating together with the temperature and strain 335 rate dependent viscosity formulation (dislocation creep flow law, eq. A6) causes the spontaneous generation of a lithospherescale shear zone that evolves into a subduction zone (Thielmann and Kaus, 2012;Kiss et al., 2020). However, shear heating is a transient process which means the increase in temperature is immediately counterbalanced by thermal diffusion. Efficiency of shear heating is restricted to the first ca. 2-3 Myrs after shear zone formation in the presented models. After this time span, heat generated by mechanical work is diffused away. Therefore, thermal softening is unlikely the mechanism responsible for 340 stabilisation of long-term subduction zones, but likely important for initiating and triggering subduction zones.
The value of F D (see appendix D) represents the plate driving force. In M1, the maximal value of F D just before stress drop caused by subduction initiation at ≈103 Myrs is ≈17 TN m −1 (3 in fig. 7(d)). However, F D ≈ 2 TN m −1 at the end of the cooling stage, resulting from stresses due to mantle convection and lateral variation of GPE between continent and basin.
Therefore, this value can be subtracted from the ≈17 TN m −1 . The required plate tectonic driving force for convergence- initiation during convergence of a passive margin, whose geometry and thermal structure was generated ad-hoc as initial model configuration. Their initial passive margin structure was significantly less heterogeneous than ours. To initiate subduction, they needed a driving force of ≈37 TN m −1 which is significantly larger than the ≈15 TN m −1 required in our model M1. We suggest that mechanical and geometrical heterogeneity inherited by previous deformation periods and thermal heterogeneity 350 due to mantle convection reduces the required driving force necessary for subduction initiation by thermal softening. We suppose that even more heterogeneous models with higher numerical resolution than ours would require even less driving force than ≈15 TN m −1 to initiate subduction by thermal softening. This is relevant, because the main argument against thermal softening as an important localization mechanism during lithosphere strain localization and subduction initiation is commonly that the required stresses, and hence driving forces, are too high. In nature, more softening mechanisms act in 355 concert with thermal softening, such as grain damage (e.g. Bercovici and Ricard, 2012;Thielmann and Schmalholz, 2020), fabric and anisotropy evolution (e.g. Montési, 2013) or reaction-induced softening (e.g. White and Knipe, 1978), likely further reducing forces required for subduction initiation. Furthermore, Mallard et al. (2016) show with 3D spherical fullmantle convection models that a constant yield stress between 150 and 200 MPa in their outer boundary layer, representing the lithosphere, provides the most realistic distribution of plate sizes in their models. The yield stress in their models corresponds 360 to a deviatoric, von Mises, stress which is comparable to the value of τ II calculated for our models. If we assume a 100 km thick lithosphere, then F D = 15 TN m −1 yields a vertically-averaged deviatoric stress, τ II , of 75 MPa for our model lithosphere.
Therefore, vertically-averaged deviatoric stresses for our model lithosphere are even smaller than deviatoric, or shear, stresses employed in global mantle convection models. Based on the above-mentioned arguments, we propose that ≈15 TN m −1 is a feasible value for the horizontal driving force per unit length. The average Rayleigh number (see appendix eq.B1) in M1 is ca. 9.95×10 6 (see tab. 1) which is close to estimated values for upper mantle convection (Torrance and Turcotte, 1971;Schubert et al., 2001). Potential lateral heterogeneity caused by the convecting cells in the upper mantle is inherited from the cooling period. The vertical velocity field at the end of the cooling period of M1 ( fig. 6(a)) reveals one convection cell at x≈+350 km with flow speeds in the order of the convergence velocity 375 applied later. This convection cell is active directly below the passive margin at which subduction will be initiated later in the model and probably provides an additional suction force on the lithospheric plate. Conrad and Lithgow-Bertelloni (2002) quantified the importance of slab-pull vs. slab-suction force and showed that the slab pull force and the suction force of a detached slab sinking into the mantle induces similar mantle flow fields. They argued that a detached fraction of a slab sinking into the mantle can exert shear traction forces at the base of the plate and drive the plate. Baes et al. (2018) showed with 380 numerical simulations that sinking of a detached slab below a passive margin can contribute significantly to the initiation of subduction. We suggest that the circular convection cell observed in M1 and the associated mantle flow also generates a force similar to a slab suction force. To quantify the suction force induced by the down-welling in the convection cell of M1, we calculated the difference of the density in this region with respect to the corresponding initial model density and integrated the density difference spatially over this region. The resulting buoyancy force per unit length is ca. 40 TN m −1 , which is in the 385 order of the slab pull force (Turcotte and Schubert, 2014). It is therefore likely that the suction force induced by the convection assists to initiate the subduction and to stabilise the single-slab subduction in simulation M1.
Values for the Rayleigh number in M2 and M3 are lower (3.92×10 6 and 1.18×10 6 respectively) compared to M1 (see tab. 1), because of the relatively higher viscosity (see eq. B1). With decreasing Rayleigh-number the asymmetry of convection also decreases . In consequence, the size of the convecting cells becomes larger and more elliptic (M2 390 compared to M1) and the number of active cells decreases (M3 compared to M1). Enhancing the thermal conductivity (included in the denominator in eq. B1) by ca. one order of magnitude in models M4 and M5 also decreases the Rayleigh number by one order of magnitude compared to M1 (see table 1). Decreasing the Rayleigh-number (M2-5) leads to more laterally symmetric convective flow patterns and decreases the speed and distribution of material flow (see fig. 7(b)). In turn, this probably leads to more equally distributed suction forces induced by the cells. Hence, we argue that decreasing the Rayleigh-number, assuming a 395 relatively strong asthenosphere or applying an enhanced thermal conductivity likely favours the initiation of divergent doubleslab subduction. The resulting slab geometries resemble a symmetric push-down (M2-5), rather than a stable asymmetric subduction (M1). This impacts the deformation in the lithosphere, especially in the crust: the crustal layers of both plates are strongly folded (see fig. 9(b)-(e)). In M1, the deformation of the crustal layers only occurs in the overriding plate. Once convection has started, it stabilises the temperature field over large time scales and controls the thickness of the 420 lithosphere (Richter, 1973;Parsons and McKenzie, 1978). Although we allow for material inflow up to z = -330 km, the arrows in fig. 9

Mantle convection, thermal erosion and tectonics in the Archean
Due to significantly reduced viscosities (see fig. 6(c)), convection in the upper mantle occurs at an average Rayleigh number 450 of ca. 9×10 7 in M6. Compared to estimates for present-day Earth's upper mantle convection (Torrance and Turcotte, 1971;Schubert et al., 2001) this Rayleigh-number is an order of magnitude higher.
The lithosphere is recycled rapidly and the resulting values for density directly below the lithosphere are ca. 100 kg m −3 lower in M6 compared to M1-5. The resulting density structure in the upper region of the mantle deviates significantly from the PREM model (Dziewonski and Anderson (1981), see fig. 2(b)). Convection at such high Rayleigh numbers leads to an 455 enhanced temperature field. Resulting values for temperature at the Moho locally reach ca. 1000°C. The scenario calculated in this model does not correspond to present-day observations. In the Archean eon, the mantle potential temperature was probably 200-300°C higher (Herzberg et al., 2010) and therefore convection was more vigorous

Comparison with estimates of Earth's mantle viscosity and thermal structure
To apply our models to geodynamic processes on Earth, we compare several model quantities with measurements and indirect

Formation and reactivation of magma-poor rifted margins: potential applications
Our model can be applied to some first-order geodynamic processes that were likely important for the orogeny of the Alps.
Rifting in the Early to Middle Jurassic (Favre and Stampfli, 1992;Froitzheim and Manatschal, 1996;Handy et al., 2010) lead to the formation of the Piemont-Liguria ocean which was bounded by the hyper-extended magma-poor rifted margins of 480 the Adriatic plate and the Briançonnais domain on the side of the European plate. We follow here the interpretation that the Piemont-Liguria ocean was an embryonic ocean which formed during ultra-slow spreading and was dominated by exhumed our study are in agreement with these observations. We speculate that upper mantle convection might have played a role in the 515 formation of the Alpine orogeny in the form of inducing an additional suction force below the Adriatic margin and assisting in the onset of subduction.

Conclusions
Our 2D thermo-mechanical numerical model of coupled lithosphere-mantle deformation is able to generate a 120 Myrs long geodynamic cycle of subsequent extension (30 Myrs), cooling (70 Myrs) and convergence (20 Myrs) in a continuous simu-520 lation. The simulations capture the fundamental features of such cycles, such as formation of hyper-extended margins, upper mantle convective flow or subduction initiation, with model outputs that are applicable to Earth. We propose that the ability of a model to generate such long-term cycles in a continuous simulation with constant parameters provides further confidence that the model has captured correctly the first-order physics.
In our models, subduction at a hyper-extended passive margin is initiated during horizontal shortening and by shear localiza-525 tion due to mainly thermal softening. In contrast, after 70 Myrs of cooling without far-field deformation, subduction of a 400 km wide exhumed and cold mantle is not spontaneously initiated. The buoyancy force due to the density difference between lithosphere and asthenosphere is too small to overcome the mechanical strength of the lithosphere.
Our models show that the viscosity structure of the asthenosphere and the associated vigour of upper mantle convection has a significant impact on lithosphere dynamics during a long-term geodynamic cycle. In comparison to a strong asthenosphere 530 with minimum viscosities of 5×10 20 Pa s, a weak asthenosphere with minimum viscosities of ca. 10 19 Pa s generates the following differences: (1) More asymmetric hyper-extended margins during extension, (2) locally larger suction forces due to convective flow, which are able to assist in establishing a single-slab subduction instead of a double-slab subduction, and (3) smaller horizontal driving forces to initiate convergence-induced subduction, namely ca. 15 TN m −1 instead of ca. 22 TN m −1 . Therefore, quantifying the viscosity structure of the asthenosphere is important for understanding the actual geodynamic 535 processes acting in specific regions.
The first-order geodynamic processes simulated in the geodynamic cycle of subsequent extension, cooling and convergence are applicable to orogenies that resulted from the opening and closure of embryonic oceans, which might have been the case for the Alpine orogeny.
Data availability. The data presented in this study are available on request from Lorenzo G. Candioti.

545
where v i denotes velocity vector components and x i spatial coordinate components, where (i,j=1) indicates the horizontal direction and (i,j=2) the vertical direction, σ ij are components of the total stress tensor, ρ is density and a i = [0; g] is a vector with g being the gravitational acceleration. Density is a function of pressure P (negative mean stress) and temperature T computed as a simplified equation of state for the crustal phases like where ρ 0 is the material density at the reference temperature T 0 and pressure P 0 , α is the thermal expansion coefficient, β is the compressibility coefficient, ∆T = T − T 0 and ∆P = P − P 0 . Effective density for the mantle phases is pre-computed using the software package Perple_X (Connolly, 2005) for the bulk rock composition of a Hawaiian pyrolite (Workman and Hart, 2005). Figure A1 shows the density distribution for the calculated pressure and temperature range.
The viscoelastic stress tensor components are defined using a backward-Euler scheme (e.g., Schmalholz et al., 2001) as where δ ij = 0 if i = j, or δ ij = 1 if i = j, η eff is the effective viscosity, G is shear modulus, ∆t is the time step,ε ij are the components of the deviatoric strain rate tensor, τ o ij is the deviatoric stress tensor of the preceding time step and J ij comprises components of the Jaumann stress rate as described in detail in Beuchert and Podladchikov (2010). A visco-elasto-plastic Maxwell model is used to describe the rheology, implying that the components of the total deviatoric strain rate tensorε ij 560 are additively decomposed into contributions from the viscous (dislocation, diffusion and Peierls creep), plastic and elastic deformation aṡ A local iteration cycle is performed on each cell/node until eq. A5 is satisfied (e.g., Popov and Sobolev, 2008 where the ratio in front of the pre-factor A results from conversion of the experimentally derived 1D flow law, obtained from laboratory experiments, to a flow law for tensor components (e.g., Schmalholz and Fletcher, 2011). For the mantle material diffusion creep is taken into account and its viscosity takes the following form where d is grain size and m is a grain size exponent. Effective Peierls viscosity is calculated using the regularised form of Kameyama et al. (1999) for the experimentally derived flow law by Goetze and Evans (1979) as where s is an effective, temperature dependent stress exponent: where A P is a pre-factor, γ is a fitting parameter and σ P is a characteristic stress value. In the frictional domain, stresses are limited by the Drucker-Prager yield function and the non-associated potential function: where φ is the internal angle of friction and C is the cohesion. If the yield condition is met (f = τ II − τ y > 0) the equivalent plastic viscosity is computed as At the end of the iteration cycle, the effective viscosity in eq. A4 is computed as the inverse of the quasi-harmonic average of the viscous and the plastic contributions for the corresponding invariants of the strain rate tensor components of the distinct deformation mechanisms as Thermal evolution of the model is calculated with the heat transfer equation where c P is the specific heat capacity at constant pressure, D/Dt is the material time derivative, k is thermal conductivity, H A = Tαv z gρ is a heat source or sink resulting from adiabatic processes assuming lithostatic pressure conditions, H D = τ 2 ij / 2η eff results from the conversion of dissipative work into heat (so-called shear heating) and H R is a radiogenic 595 heat source.
To initiate the deformation, we perturbed the initial marker field with a random amplitude vertical displacement like where z M is the vertical marker coordinate in km, A is a random amplitude varying between -1.25 km and 1.25 km, x M is the horizontal marker coordinate in km and λ = 25 × 10 3 km is the half-width of the curve. The perturbation is applied to the 600 horizontal centre of the domain between -75 km and 75 km.
All physical parameters are summarised in table A1.

Appendix B: Nusselt number scaling laws and phase transitions
Modelling thermal convection beneath an actively deforming lithosphere can be numerically expensive, because the convection velocities can be as high as or even higher than the motion of the lithospheric plates, depending on the vigour of the convecting 605 system. This significantly reduces the maximum time step necessary to ensure numerical stability. In consequence, it takes more time steps to run a simulation to the same physical time when convection is modelled together with deformation in the lithosphere. Hence, the computational time can be twice as long compared to models, where only the deforming lithosphere is modelled. However, it is possible to include the effect of convection in the mantle on the thermal field and keep a constant vertical heat flux through the lithosphere-asthenosphere boundary (LAB) into a numerical model without explicitly modelling 610 convection by using an effective thermal conductivity for the mantle material below the lithosphere. Two dimensionless quan-tities have to be defined, namely the Rayleigh and Nusselt numbers. The Rayleigh number is the ratio of the thermal diffusion and advection time scale where ρ is density, g is gravitational acceleration, α is a coefficient of thermal expansion, ∆T is the temperature difference 615 between the top and the bottom and D is the thickness of the convecting layer, κ = k/ρ/c P is the thermal diffusivity and η eff is the effective viscosity. The Nusselt number can be expressed in terms of the Rayleigh number as where Ra crit is the critical Rayleigh number at which convection starts, typically in the order of 10 3 , and β is a power-law exponent . The Nusselt number is the ratio of advective heat flux, q Adv , which is the vertical heat flux For this study, the standard value for k = 2.75 of the upper mantle material and Nu = 13. The effective conductivity according to eq. B4 is k eff = 36. To avoid a strong contrast of conductivities directly at the base of the lithosphere, we linearly increase the conductivity from 2.75-36 W m −1 K −1 over a temperature range of 1350-1376°C. Applying this effective conductivity approach reduces the number of time steps necessary for computation up to the same physical time by ca. a factor 2 in M4 635 compared to M1.
As mentioned above, the vigour of convection is defined by the Rayleigh number (eq. B1). For Ra»Ra crit , the time scale for thermal diffusion is much larger than the time scale for advection of material. This means that the entropy of the system remains relatively constant in time. By definition, such a system is adiabatic (Kondepudi and Prigogine, 2014). In the presented models, the density and entropy for the mantle phases is pre-computed using Perple_X for a given bulk rock composition. Assuming 640 that the temperature gradient in the upper mantle is adiabatic and stress conditions are close to lithostatic (i.e., deviatoric stresses are negligible), the temperature at any depth can be determined by following an isentrop from the Perple_X database.
Starting coordinates in pressure-temperature space are the (lithostatic) conditions at the base of the lithosphere. From these pressure and temperature values one can follow the closest isentrop (black line Fig. A1) until the (lithostatic) pressure value at target depth (in this study 660 km, red diamond Fig. A1) is reached and extract the corresponding temperature value. Trubitsyn 645 and Trubitsyna (2015) derived an analytical solution to calculate temperatures assuming an adiabatic gradient for given depths.
We determined the temperature at the bottom of the model domain using both approaches and the obtained values that differ by only 0.01°C.
Involving phase transitions necessitates mainly two major assumptions: (1) is compressibility of material due to large density variations important and (2) does latent heat released or consumed at a phase transition significantly change the convective 650 pattern? Bercovici et al. (1992) concluded that compressibility effects on the spatial mantle structure are minor when the superadiabatic temperature drop is close to the adiabatic temperature of the mantle, which is the case for the Earth. Although the net density varies largely in P-T space of the phase diagram used in this study ( fig. A1), the maximum value for the density time derivative computed from M1 is two orders of magnitude lower than the velocity divergence. We assume that density changes due to volumetric deformation are, hence, still negligible and density changes are accounted for in the buoyancy 655 force only. This means the classical Boussinesq approximation is still valid. However, not considering adiabatic heating in the energy conservation equation leads to a significant deviation of the thermal structure from the initially imposed temperature gradient over large time scales (>100 Myrs). The resulting temperature profile is constant throughout the upper mantle and the newly equilibrated constant temperature is equal to the imposed temperature at the bottom boundary. In consequence, the density structure read in from the phase diagram table according to pressure and temperature values is significantly wrong. To 660 avoid these problems, we use the extended Boussinesq approximation, i.e., the adiabatic heating term is included in the energy conservation equation. As a result, the initially imposed adiabatic (or isentropic) temperature gradient can be maintained over large time scales. The resulting density structure agrees well with the PREM model (Dziewonski and Anderson, 1981) as shown in this study. A detailed comparison between different approximations of the system of equations is clearly beyond the scope of this study. Latent heat that is released or consumed by a phase transition can perturb the thermal field by up to 100 K 665 and induce a buoyancy force aiding or inhibiting the motion of cold, especially low-angle, subducting slabs (van Hunen et al., 2001) or hot rising plumes. However, when the lateral differences in temperature do not vary much, the deflection of the phase transition by an ascending plume or a subducting slab has a much bigger impact on the buoyancy stresses than the latent heat released or consumed by the phase transition (Christensen, 1995). Because a detailed parametric investigation of the impact of latent heat on buoyancy stresses is beyond the scope of study, we neglect latent heat for simplicity.

Appendix C: Convection benchmark
In this section, we present the results of a convection benchmark performed by the algorithm used in this study. Equations for continuity and force balance are solved as in eq. A1 and A2, density is a function of temperature only and calculated as and the total stress tensor is decomposed into a pressure and a deviatoric part as Transfer of heat is calculated as in eq. A15. Stresses and strain rates (ε ij ) are related to each other via the viscosity η as Viscosity is computed via a linearized Arrhenius law, also called Frank-Kamenetskii approximation (Kamenetskii, 1969): 680 with γ T = log(η T ) and γ z = log(η z ). By choosing η z = 1, γ z = 0 in eq. C4 and, therefore, the viscosity is only temperature dependent.
The dimensionless equations are discretized over a domain that extends from 0 to 1 in both horizontal and vertical directions and a small amplitude perturbation is applied to the initial temperature profile as As mentioned above, the vigour of the convecting system is described by its Rayleigh number (eq. B1). A local Ra = 10 2 is applied to the top boundary by setting α = 10 −2 , g = 10 4 and all other parameters of eq.B1 are set to 1. The applied viscosity decrease by choosing η T = 10 5 in eq. C4 results in a global Ra = 10 7 . All mechanical boundaries are set to free slip, the thermal boundary conditions are constant temperature at the top (T = 0) and bottom (T = 1) and insulating (i.e., zero flux) at the two vertical boundaries. Tosi et al. (2015) tested several algorithms, including finite element, finite differences, finite volume and 690 spectral discretization, on their capability of modelling distinct rheologies of the mantle, from temperature dependent viscosity only up to visco-plastic rheologies. We have chosen the simplest test, case one in Tosi et al. (2015), and report the results of two distinct diagnostic quantities: the average temperature over the entire modelling domain and the root mean square velocity at the surface The model develops one convection cell below a stagnant lid ( fig.A2(a), (b)). Diagnostic quantities of average temperature ( fig.A2(c)) and root-mean square velocity ( fig.A2(d)) at the top of the domain agree with values that have been produced by the algorithms (grey areas in fig.A2(c) and (d)) tested in Tosi et al. (2015). We tested numerical resolutions of 50 2 , 100 2 , 150 2 and 300 2 . Only for resolutions >100 2 the desired convective pattern developed. The numerical algorithms tested by Tosi et al. 700 (2015) passed the benchmark already for lower resolutions. This is due to the fact that the algorithm presented here uses a uniform grid size across the domain. The algorithms tested by Tosi et al. (2015) used refined meshes. Sufficient resolution of the thermal boundary layers at the top and at the bottom is crucial to develop the desired pattern. Using a refined mesh in these regions, allows for lower total resolution, whereas using a regular mesh necessitates a much higher resolution in total.
Nevertheless, values for the diagnostic quantities reproduced by the presented algorithm lie well within the minimum and where P L is the lithostatic pressure calculated as  (Molnar and Lyon-Caen, 1988;Schmalholz et al., 2019;Bessat et al., 2020) acting in the system during the different stages of the hyper-extension and convergence cycle. We also calculate the vertical integral of the second invariant of the deviatoric stress tensor 720τ The value of F D = 2×τ avg II (x), whereτ avg II (x) is calculated averaging the average ofτ II (x) both over the left and rightmost 100 km of the domain, is also identical to the vertical integral of the difference between the horizontal total stress and the lithostatic pressure, if shear stresses are negligible (Molnar and Lyon-Caen, 1988;Schmalholz et al., 2019). During onset of convergence (100 Myrs), when the lithosphere is more or less homogeneously shortened, shear stresses are most likely negligible compared 725 to normal deviatoric stresses. F D can therefore be used to estimate the plate driving force (see fig.7(d)). For calculation of the suction force per unit length (F S ) induced by the mantle flow in fig. 6(a) & (f) we used the following formula where a = 350 km, b = 150 km, c = −120 km and d = −450 km are the integration bounds and ∆ρ is the difference in density between the initial density field and the density field at the end of the cooling period.  (Dziewonski and Anderson, 1981). In (c) and (f): circles and stars indicate viscosity profiles inferred from Occam-style inversion of glacial isostatic adjustment and convection related data originally by Mitrovica and Forte (2004), diamonds show a four-layer model fit of mantle flow to seismic anisotropy originally by Behn et al. (2004). All profiles taken from Forte et al. (2010) (fig. 2). The median is used as statistical quantity for averaging, because it is less sensitive to extreme values compared to the arithmetic mean.       normal average temperature calculated as in eq. C6 for the time period at which the convection reaches a steady state and (f) root mean square velocity at the surface of the modelled domain calculated as in eq. C7 for the time period at which the convection reaches a steady state. In fig. A2(b) and fig. A2(d)-(f), the dark grey area only shows the range of minimum maximum values for the given diagnostic quantity obtained by the algorithms tested by Tosi et al. (2015) excluding the spectral algorithm. Dark combined with light grey areas indicate the range obtained by all algorithms tested by Tosi et al. (2015) including the spectral algorithm.