Reply on RC1

COMMENT: L 20-21: “Our results highlight that surface water temperatures in reservoirs differ significantly from those found in lakes” is slightly misleading as you did not investigate any lakes. I suggest you reword the sentence to reflect your actual findings, and from that make inferences about lakes if appropriate. RESPONSE: Thank you for your comment. We agree, the sentence has been changed. Line 20 -21 was replaced with the following sentence: “Our results highlight the need to include anthropogenic inflow and outflow controls in regional and global climate models”

RESPONSE: Thank you. It is a good idea. As suggested, we have made changes regarding the description of the scenarios, and we have also changed the designation of the scenarios. Furthermore, we have replaced the term "lake" with "lake (similar to a seepage lake)".
COMMENT: Secondly, the main difference between reservoirs and lakes in my opinion is not the presence/absence of inflows and outflows (though reservoirs are typically dominated by hydrology as riverine systems), but the fact that most reservoirs have deep water outlets, whereas lakes discharge from the surface. This can have profound effects on the heat budget in stratified waters because of potentially large differences in surface (e.g. 25 degrees C) and bottom water temperature (eg 10 degrees C or potentially lower). Deep water withdrawal tends to shrink the hypolimnion, expand the surface mixed layer, decrease vertical temperature gradients, shorten stratification duration, and increase the heat content of reservoirs in comparison to lakes with surface water outlets. This aspect was not addressed in the paper but would significantly influence surface water temperature and I recommend that the authors provide information on whether/how this was accounted for, how the outflows were represented in the model, and what its potential effect on the results would be.

RESPONSE:
Thank you for your comment. We agree that deep-water withdrawal can have a significant effect on surface-water temperature and that this is one of the main differences between lakes and reservoirs. The 2-D model considered in the definition of the baseline scenario includes a selective withdrawal algorithm that calculates a withdrawal zone based on outflow, outlet geometry and upstream density gradients (Cole and Wells, 2008). Hence, the effect of deep withdrawal was accounted for.
During the development of this study, we have included 24 reservoirs with different waterresidence time and morphological characteristics (volume, depth, surface area). We have taken this large number of waterbodies into consideration, including run-of-river hydropower schemes, to also evaluate the effect of deep-versus shallow-water withdrawal outlets. In fact, we have evaluated the relationship between the water level, mixed-layer depth, the thermocline depth and the surface-water temperature variation for each reservoir, between the W2 reservoir and the W2 lake scenarios.
These were the conclusions regarding this specific analysis: the water-level variation is clearly related to surface-water temperature simulation bias, and we show that the outflow (deep abstraction) reduces the volume of hypolimnion and increases the volume of the epilimnion and of the mixed-layer depth (i.e., the thermocline depth increases). The water-level reduction increases the area-to-the-epilimnion volume ratio, which results in an increase in epilimnetic temperature (e.g., Carr et al. 2020). The hypolimnion water temperature (HWT) was generally higher in the reservoirs than in lakes, due to the heat transported by interflow and underflow currents.
These conclusions have already been well accepted by the scientific community. Therefore, after considering the length of the manuscript versus the scientific benefit of the inclusion of this specific analysis, we have decided to exclude this part from the final version of the manuscript. While quantification of the individual contribution of the outflowwithdrawal depth would be relevant to the study, we only focused on the integrated effect of advection/level variation in this study.
Our revised manuscript will include a description of the outflow representation in the 2-D model, and the following sentences will be included: Line 205 -the following sentence was modified: "Surface-heat fluxes, in particular the evaporation rate, are also affected by advection due to inflows and outflows (e.g., deep-water withdrawal) and by water level (WL) fluctuations (Rimmer et al., 2011;Friedrich et al., 2018)." Line 205 -the following sentences were included: "Outlet geometry, outflows and in-pool densities are the input to the selective withdrawal algorithm that calculates vertical withdrawal zone limits. Among the two model options of the withdrawal-line sinks, which are wide in relation to dam width (> 1/10) and point sinks, which are narrow in relation to dam width (< 1/10)-only point sinks were considered. The point-sink approximation assumes the flow is radial, both longitudinally and vertically (Cole and Wells, 2008). Therefore, for the outflow structure definition, the centerline elevation of the structure was included in the model (Table 4). Additionally, as suggested by Cole and Wells (2008), the algorithm was allowed to retrieve water from the top elevation of the computational grid." Line 447 -the following sentence was included: "Overall, the results show that the water-level variations are clearly related to surfacewater temperature simulation bias; besides, the outflow (deep abstraction) reduces the volume of hypolimnion and increases the volume of the epilimnion (mixed layer) by lowering the thermocline. Herewith, water-level reduction increases the area-to-theepilimnion volume ratio, which results in an increase in epilimnetic temperature (e.g., Carr et al., 2020). The hypolimnion water temperature (HWT) was generally higher in the W2 hydrology scenarios than in the W2 scenarios, due to the heat transported by interflow and underflow currents." COMMENT: Thirdly, I found the description of the scenarios somewhat confusing and I have some questions. Could you please describe the rationale behind the scenarios in more detail and perhaps early on (e.g. the baseline scenario is actually 4 scenarioshourly, and daily forcing, with and without hydrology). Please also clearly indicate whether the 1D model scenarios considered inflows and outflows or not (I believe it was mentioned in the introduction but it should be clear in the methods too). Consider adding a table describing the different scenarios and comparisons. You may also consider removing Table  2 as its content can easily be summarized in the text.

RESPONSE:
Thank you for your comment. We agree with the reviewer. Therefore, Table  2, previously included in the methods section, was replaced with a table describing the different scenarios, which clarifies the rationale behind the scenarios.
Line 159 - Table 2 was summarized in the text by the following sentence: "Meteorological datasets considered in the modeling process included: air temperature (ºC); relative humidity (%); wind velocity (m/s); wind direction (rad); cloud fraction (0 to 10) and shortwave-solar radiation (W/m 2 ). These datasets were considered in all models with the following exceptions: wind direction is not considered for 1-D models forcing; the ANN modeling relays in the air-temperature, relative-humidity and wind-velocity datasets only." Line 138 -The following sentence was included: " Table 2 shows a full description of the scenarios considered in the development of this study." Line 161 - Table 2 was replaced. COMMENT: Finally, the model errors were assessed against the observed data (correct in my opinion) but in other analyses model errors were assessed against the baseline scenario W2-Reservoir, as far as I can ascertain. This can be problematic because each model has its own characteristics and does not per se represent the "truth", especially if some parameters are highly calibrated as is the case of the baseline scenarios here. For instance, if the extinction coefficient was calibrated in the baseline scenario with the 2-d model but not in the 1-d models, or if different parameter values were used, are the results comparable and how meaningful are the model errors? I think it may make more sense to calculate the model error always against the observed values, and then to assess the relative changes in error in the different scenarios. E.g. "not accounting for inflows increased the model error by X". If the model error is used mainly as an analytical variable, please describe the rationale here clearly.

RESPONSE:
Thank you for your comment. Indeed, in the first part of the manuscript results, the models' errors were assessed against the observed temperature values (profiles and surface-water temperature values) taking into consideration a period of 20 years.
Note that the error obtained by comparing the models' results with the observed values also includes a partial error that can be associated with the uncertainty of the model forcing data from the downscaled dataset of ERA-interim reanalysis. Since the 1-D simulations do not include inflow/outflow and water-level variations, it was not possible to assess if the model errors were linked to the neglect of inflows and outflows or caused by the forcing errors.
Therefore, at the second stage of the model-performance analysis, model errors were assessed against the baseline scenario ("W2 hydrology"). The reviewer is right, at this point the error should be interpreted as an analytical variable used to answer the following questions: What is the exact variation that is associated with the neglect of inflows and outflows? This also enabled the definition of the 100-days retention time threshold. How well can ANN simulate the evolution of a reservoir SWT? How far apart is the performance of this simplified 1-D model from a state-of-the-art calibrated 2-D model that includes the parametrization of inflows/outflows and waterlevel variation? What is the partial contribution of the neglect of inflows and outflows versus the effect of wind sheltering over the models' meteorological forcing to the final error value; Can we identify differences in the conceptualization of important physical processes (e.g., differences in the conceptualization of diurnal variations of SWT between 1-D and the 2-D models), through the consideration of daily versus hourly meteorological forcing?
Regarding the baseline scenario "quality", it is important to mention the following: One of the two calibration parameters-wind-sheltering coefficient (WSC)-is directly linked to the effect of the watershed morphology on wind forcing and accounts, therefore, for potential uncertainties in meteorological input (Cole and Wells, 2008). The second calibration parameter was the water-extinction coefficient. These two parameters are commonly associated with a high degree of uncertainty in regional and global climate models. It is noteworthy that the WSC had the strongest effect on temperature during calibration. The CE-QUAL-W2 model has been extensively used in the last 20 years to reproduce water temperatures in Portuguese reservoirs, as well as in over 400 waterbodies worldwide under a wide variety of external conditions. The model demonstrated remarkably accurate temperature predictions -provided the accurate geometry and boundary conditions are available (vide Cole and Wells, 2008).
Therefore, we believe that the 2-D model, after calibration, represents a reliable benchmark for testing the representativeness of simplified 1-D models for reservoir parameterization in regional and global models.
To resolve the reviewer's concerns and to clarify our approach, the following changes have been proposed to the revised version: Line 155: the paragraph has been added: "The baseline scenarios (W2 hydrology) were defined to address the following questions: How large is the uncertainty associated with the neglect of inflows and outflows? How adequate is the performance of simplified 1-D models compared with the state-ofthe-art calibrated 2-D model, including parametrization of inflows/outflows and WL variation? What is the relative contribution to the final model error of the in-and outflow neglect vs. neglect of the wind sheltering in meteorological forcing? Can we identify conceptual differences in representation of the fundamental physical processes (such as differences in the conceptualization of diurnal variations of SWT) by 1-D and 2-D models through the comparison of outputs from daily versus hourly forcing? How well can ANN simulate the evolution of a reservoir SWT?
The reliability of the baseline scenarios (W2 hydrology) for representation of the reservoir thermal regime has been demonstrated by the model calibration results and is supported by the outcomes of a large number of successful model applications worldwide (vide Cole and Wells, 2008).
Using 2-D modeling results as a baseline "benchmark" scenario for validating 1-D models allows the isolation of the errors associated with the quality of meteorological forcing and observed data (e.g., water-temperature data sets) while providing the continuity usually unavailable from observational datasets. Hence, the error obtained when comparing 1-D versus 2-D model results is to be regarded as an analytical variable, encapsulating differences among the different scenarios and not the conventional model error (model output versus observed data)." Introduction COMMENT: The aims don't seem to include a comparison of the effect of in and outflows.

RESPONSE:
Thank you for your comment. The reviewer is right: the main objective of the study was to evaluate the combined effect of in-and outflows without their separation.
COMMENT: L 89 -91: This sentence is a little confusing. Try: Maximum daily mean air temperature ranges from 13 ºC in the central highlands to 25 ºC in the southeast region. The minimum daily mean air temperature ranges from 5 ºC in the northern and central regions to 18 ºC in the south.

RESPONSE:
Thank you for your comment. We agree, the sentence has been replaced.
COMMENT: L118 -120: Did you use single linear regressions between air and water temperature? There is typically a seasonal hysteresis in the air-water temperatures that varies in strength depending on some factors (eg Q), so that you would probably get a better estimate of inflow water temperature using a model like air2stream (https://github.com/marcotoffolon/air2stream, see references here). I am not suggesting you do this and repeat all the simulations and analyses, but could you please comment on this effect? RESPONSE: Thank you for the comment and for suggesting an alternative modeling approach. We agree with the reviewer: rivers, reservoirs and lakes reveal thermal inertia with regard to air temperature. During the model calibration process, we have tested some different approaches: Linear regressions between air and water temperature, taking into consideration full datasets, and also considering seasonal datasets; A non-linear regression model (Mohseni O. et al., 1998); An equation developed by Stefan and Preud'homme (1992), which is used by the SWAT watershed model and considers air temperature, a time lag between air and water temperature and the mean annual flow.
We are aware of the importance of the time lag between air and water temperature and its relation to the strength of in-and outflow. While the approximation described in (iii) is the most advanced among the three approaches above, the linear-regression (i) equations produced the best results based on monthly inflow values. We have not evaluated this issue further because we were satisfied with the results. Nevertheless, we agree that the results could have been improved if a model like air2stream was used. Therefore, we have added the following note in lines 136-139 of the manuscript: "A deeper insight into the relationship between the air and surface temperatures may be obtained by application of more detailed semi-stochastic models (Toffolon and Piccolroaz, 2015), while the effects of the reservoir volume (depth) and the flow would require specific attention in this case (Calamita et al. 2021). " COMMENT: L117: Could you please describe what the baseline scenario is? It would be helpful to explain the rationale of the scenarios early on and in a little more detail.

RESPONSE:
Thank you for your comment. We agree with the reviewer. In order to address this problem, we have placed section 3.1 Forcing and calibration data after section 3.2 Models/scenarios. Thus, the scenarios' rationale appears before section 3.1 Line 117 was replaced with the following sentence: "The W2 hydrology scenario was forced…" COMMENT: L131 -134: why did you not use the solar radiation and cloud cover from ERA-Interim?

RESPONSE:
Thank you for pointing this out. In an initial stage of the development of this study, we did not have access to the solar-radiation and cloud-cover datasets obtained from the Weather Research and Forecasting (WRF) model for Portugal. Therefore, to address this limitation we had two alternatives: To include ERA-Interim datasets for cloud cover and solar radiation; To consider mean monthly cloud cover values observed in the closest meteorological station and apply the cloud cover reduction of clear sky solar radiation described by Wunderlich (1972), which is still included in the most recent version of the CE-QUAL-W2 model parameterization (Wells, 2021) and calculate the clear sky radiation with a suitable algorithm (e. g. ).
We chose the second option, because: We could not find validation studies of ERA-Interim cloud cover datasets for Portugal, and it is well known that cloud cover in ERA-Interim is underestimated by at least 10% for all regions except the poles (Free et al., 2016, Stengel et al., 2018. This underestimation implies an overestimation of shortwave incident radiation, which is more significant in summer (Free et al., 2016, Zhang et al., 2020; Cloud-cover datasets from reanalyses are obtained with physical model parameterizations, therefore they are also subject to error (Free, et al., 2016); The spatial resolution of the ERA-Interim data set is approximately 80 km. This is an important limitation when taking into consideration 2-D model spatial resolution and above all, the known difficulties in modeling cloud formation (Jakob 1999, Bedacht et al. 2007, Clark and Walsh 2010, Wu et al. 2012); We have always obtained good results with the solar radiation data sets obtained with the EPA method , with the cloud cover datasets derived from mean monthly values described in the climatological normal of Portugal  and with the Wunderlich (1972) algorithm. Unfortunately, all projects were only written in Portuguese. However, they are available in the following address: https://apambiente.pt/agua/modelacao-da-qualidade-da-agua-em-albufeiras-de-aguaspublicas Considering the known difficulties in obtaining high-resolution and high-quality cloud cover observations, we think that the question from the reviewer is quite relevant. In the near future we will assess the differences between these two options, namely, their effect in the forcing of water-quality models.
COMMENT: Section 3.2: could you please describe how you parameterized the basin morphology for W2 simulations, especially the data sources and horizontal and vertical grid resolution, and outlet heights? RESPONSE: Thank you for your comment. We have used 1:25000 charts to define the reservoirs' bathymetry. This information was included in the revised version of the manuscript on line 213: "The reservoirs' bathymetry was defined from 1:25000 topographic charts of the watersheds. Hence, each reservoir computational grid is described by a specific number of branches, segments, and layers (Table 4)." Additionally, we have included Table 4, which describes the computational grid dimensions of each reservoir (number of branches, layers, segments) and the outflow-centerline elevation regarding each outflow structure.
COMMENT: L 143-145: Please describe the calibration procedure. Which parameters were calibrated in which ranges? Was calibration automatic? Which extinction values did you use for the uncalibrated models?

RESPONSE:
This comment is much appreciated. Details on the calibration procedure were included in the revised version of the manuscript. As described earlier in this text, we have calibrated two parameters (wind sheltering and water-extinction coefficient). The 2-D models were calibrated by comparing the results with observed data, without applying automatic adjustment algorithms. Due to the difficulties encountered in characterizing reservoir water transparency with observed data, all 1-D simulations were performed with a constant extinction coefficient value of 0.45 m -1 . This extinction coefficient is the reference value considered by Cole and Wells (2008) for the model CE-QUAL-W2 when only water temperature is simulated, 0.45. According to the eutrophication criteria defined by the OCDE (OCDE, 1982), this magnitude of water transparency is associated with eutrophic unstable systems, which are becoming common in Portugal. This value is also close to the mean value obtained from observed Secchi disk data available for four reservoirs: Bouçã (R10), Crestuma-Lever (R16), Cabril (R22) and Castelo do Bode (R23), 0.37.
Line 145: The following sentence was included in the manuscript: "After each model run, results were compared with the observed data sets and if needed the calibration parameters were retuned manually. The wind-sheltering coefficient (WSC) and the extinction coefficient for water were the only parameters modified at each model run. These parameters varied in the range from 0.1 to 1.0 and from 0.25 to 1.0, respectively. Data on the mean water-extinction coefficient was available for four reservoirs: Bouçã (μ=0.27; σ=0.05), Cabril (μ=0.27;σ=0.05) and Castelo do Bode (μ=0.26; σ=0.05), therefore they were not calibrated. All 1-D simulations were performed with a constant water-extinction coefficient value of 0.45, corresponding to the reference value suggested by Cole and Wells (2008). According to the eutrophication criteria defined by the OCDE (OCDE, 1982), this value of water transparency is associated with eutrophic unstable systems and is also close to the mean value of 0.37 obtained from the four reservoirs listed above." COMMENT: L 174: Change to "Two different temporal resolutions …" RESPONSE: Thank you for pointing this out. This sentence was changed.
COMMENT: L 212: the sentence doesn't convey what the Ultimate algorithm is used for. I suggest to reword it.

RESPONSE:
Thank you for your pointing this out.
Line 212 -The sentence was changed to: "The Ultimate algorithm was considered as the solution for the numerical transport for temperature and constituents (Cole and Wells, 2008)."

Results:
COMMENT: L 293-296: The parameter ranges appear to be quite big and the calibration resulted in especially low values of the wind factor and extinction. The average extinction value of 0.38 1/m is typical of reasonably deep oligotrophic lakes but several reservoirs are quite shallow and I would expect some considerably higher values. I am wondering if the low wind factors were compensating for the low extinction values? Were these calibrated values used for the other models, or which parameterisations were used? RESPONSE: Thank you for this question. The reviewer is right, there are some low values of the wind-sheltering coefficient (WSC). In some cases, the wind velocity had to be considerably reduced. According to Cole and Wells, 2008, the wind-sheltering coefficient [WSC] has the most effect on temperature during calibration and should be adjusted first. Then, if needed, all the other calibration parameters can be adjusted, including the extinction coefficient. We have always followed this recommendation, which in our perspective prevents the unbalancing of the calibration process. Furthermore, the modeling of such a long time period (20 years) enabled the evaluation of the model's response under different forcing conditions, namely the wind stress. This fact per se, reduces the probability of validating an unbalanced model. COMMENT: L302: change 'major' to 'high' or 'highest'.

RESPONSE:
Thank you for your comment. The sentence was changed.
Line 302: The following sentence was included in the manuscript: "The three highest RMSE" COMMENT: L319: can this result also be attributed to the fact that W2 and ANN were fitted to the data and the 1D models were not? RESPONSE: Thank you for your comment. Yes, the performance of W2 would be worse without calibration, and the ANN fully relied on tuning to the observational data by its definition. We mention this fact in the manuscript (Lines 169-170 and Figure 2) and discuss the representativeness of the uncalibrated 1-D models (Lines 319-321): "This result can be attributed to the wind-forcing treatment by 1-D models. The latter do not consider the wind-sheltering effect, which was the most relevant parameter for calibration of the 2-D model, reducing the wind velocity by around 34%." COMMENT: L326-7: better to refer to Fig 4d. RESPONSE: Thank you. The reference was changed.
COMMENT: L 437: change heart to heat RESPONSE: Thank you. This word was changed.
COMMENT: L 512: change 'average mean air temperatures' to 'mean air temperatures' RESPONSE: Thank you. This sentence was changed. Figure 4: What exactly does the trendline show? Also it is a bit difficult to compare the models with each other as they are shown in different panels. Consider plotting the individual trendlines for each model on one panel, say overlaid on the ensemble panel, if this is not too cluttered. RESPONSE: Thank you. We have tried to improve the visual representation using the suggested approach. The reason for the inclusion of the trendline was to facilitate identification of the reservoirs with higher and lower RMSEs. After several editions, we decided to return to the original figure, as we felt it provided a better insight into the results. Figure 5: caption -I don't get the first part: Evaluation of simulation bias and RMSE. 2-D baseline scenario (W2 Reservoir) simulated SWT for: a) the exclusion of inflows and outflows (W2 Lake), … Does it mean that you calculated bias and RMSE based on the comparison of the different model runs with W2 reservoir? RESPONSE: Thank you for pointing this out. The caption was not correct. In the revised version the caption was changed to: "Evaluation of simulation bias and RMSE. 2-D baseline scenario (W2 Reservoir) simulated SWT versus: a) the exclusion of inflows and outflows (W2 Lake);" COMMENT: Figure 11: suggest log scale on the x-axis.

RESPONSE:
Thank you for the suggestion. The revised version of the manuscript includes this change.