175
Views
5
CrossRef citations to date
0
Altmetric
Research Article

Comparison between observed and remotely sensed attributes to include in the region-of-influence approach of extreme precipitation estimation: a case study in the Yangtze River basin, China

ORCID Icon & ORCID Icon
Pages 1777-1789 | Received 27 Feb 2021, Accepted 21 Jun 2021, Published online: 08 Sep 2021

ABSTRACT

The application of remotely sensed (RS) data at ungauged locations is well recognized in hydrological studies; however, its suitability for use as a descriptor in the region-of-influence (ROI) approach is hardly assessed. This study compares two types of at-site attributes, observed and RS, to include in the ROI approach for the estimation of extreme precipitation, in particular at ungauged locations in China. The performance of the method against the fixed-group-based regional approach is also examined. The results, which are based on data for the Yangtze River basin, showed that the ROI scheme that used physical proximity and elevation combined produced the lowest error and performed better than that containing RS data. The scheme also outperformed the fixed regional models in terms of error. Overall, although the application of RS data is intuitively attractive, its inclusion is unable to outperform the observed site descriptors for the study region.

Editor A. Fiori Associate Editor A. Requena

1 Introduction

Estimation of extreme precipitation is needed in various engineering applications ranging from design and operation of drainage facilities to flood risk assessment of proposed infrastructure projects. Regionalization (Cunnane Citation1988, Hosking and Wallis Citation1997, Institute of Hydrology Citation1999, Wallis et al. Citation2007, Gado et al. Citation2017, Requena et al. Citation2019a, Srivastava et al. Citation2019) is considered to be a reliable approach for such estimations in situations where data is limited or unavailable. The use of annual maximum (AM) data in a regionalization framework is a standard and widely used approach (Cunnane Citation1988, Hosking and Wallis Citation1997, Institute of Hydrology Citation1999, Svensson and Jones Citation2010). An efficient regional process requires that the data gathered should pass the homogeneity test (Hosking and Wallis Citation1997, Viglione et al. Citation2007, Das and Cunnane Citation2011). However, a slight deviation from homogeneity still should be more useful and effective than an at-site analysis performed based on a limited dataset (Hosking and Wallis Citation1997, Institute of Hydrology Citation1999).

Different approaches have been tried to form the so-called homogeneous regions. The formation of regions on a political map is the dominant one. The regions formed in such a way are generally fixed in nature and have the same growth curve except for the scaling factor (see EquationEquation 4). Geographical convenience is the conventional and classical approach to form regions on a map (NERC Citation1975, Gellens Citation2002, Fowler and Kilsby Citation2003, Norbiato et al. Citation2007). More sophisticated approaches have emerged in recent years, notably a multivariate approach: cluster analysis (Hosking and Wallis Citation1997, Satyanarayana and Srinivas Citation2008, Yang et al. Citation2010, Darwish et al. Citation2020). A number of hydroclimatological characteristics are used in cluster analysis to delineate groups of stations that are often found to be homogeneous from a hydrological perspective (Satyanarayana and Srinivas Citation2008, Yang et al. Citation2010, Chen et al. Citation2013). However, with fixed groups, when the centre of attention is shifted from one site to another, the transfer of statistics for the regional group remains unaffected (Kyselý et al. Citation2011). In this situation, some performance issues are encountered with sites at the borders between regions.

To avoid this inconvenience, the site-specific pooling group has come to the fore. In this approach, a specific group is formed for a subject site using the region-of-influence (ROI) method (Burn Citation1990) for which the frequency analysis is performed. Although the method requires the application of expert judgement in several components, especially assigning weights (Bobée and Rasmussen Citation1995), it has been established as a reliable method (Burn Citation1990, Eslamian Citation1995, Institute of Hydrology Citation1999, Kyselý et al. Citation2011, Hailegeorgis et al. Citation2013, Requena et al. Citation2019b) and its application in ungauged conditions is widely accepted (Zrinji and Burn Citation1994, Das Citation2018).

The method uses a similarity distance measure, which contains site descriptors to group stations for the subject site. The inclusion of appropriate descriptors plays a critical role in delineating homogeneous groups that ultimately lead to a superior estimation. The identification of attributes is, thus, an important factor in terms of formation of the governing character of extreme precipitation. Observed attributes, e.g. physical proximity, elevation and long-term climatic variables (e.g. annual mean precipitation) (Gaál and Kyselý Citation2009, Das Citation2017, Ball et al. Citation2019) are generally used. However, climate variables can only be used at gauged locations. To be applicable under ungauged conditions, remotely sensed climate data can be used. The application of remotely sensed data at ungauged locations is well recognized in hydrological studies (Wagner et al. Citation2012, Xue et al. Citation2013, Das et al. Citation2020, Das Citation2021); however, its suitability for use as a descriptor in the ROI approach has scarcely been assessed. Thus, there is an opportunity to examine remotely sensed data in the context of this methodology.

The ROI approach is found to be superior to the region-based approach in different parts of the world, including South Asia (Das Citation2017), the Middle East (Dehghan et al. Citation2019), Europe (Gaál et al. Citation2008), the UK (Reed et al. Citation1999) and Canada (Requena et al. Citation2019b). Although the method has been successfully applied to different parts of the world, it is scarcely assessed in China. Hence, there is a need to investigate its appropriateness in this region.

In China, at-site frequency analysis based on Pearson Type III (PE3) distribution is the recommended procedure (see Fang et al. Citation2007), but in recent years many researchers have investigated the potential utility of a regional form of frequency analysis (mainly fixed region) to estimate return period values of extreme precipitation (Yang et al. Citation2010, Sun et al. Citation2017, Wang et al. Citation2017). In the present work, the Yangtze River basin, which occupies one fifth of the land area of China, was chosen as the proposed study location. The basin, which is recognized for hosting major economic activities for China, is also subjected to frequent flooding due to extreme precipitation (Su et al. Citation2008, Zhang et al. Citation2008). The assessment of extreme precipitation is considered difficult because of the dynamic nature of the underlying mechanisms that control the overall precipitation patterns of the basin.

Several regional analyses have been carried out on the Yangtze basin. Notably, the studies by Su et al. (Citation2009) and by Chen et al. (Citation2013) should be mentioned. Su et al. (Citation2009) categorized the Yangtze basin into 11 hydrological sub-basins using the rotated empirical orthogonal function (Su et al. Citation2008). In contrast, Chen et al. (Citation2013) divided the basin into six homogeneous regions using the clustering technique of a fuzzy c-mean algorithm based on several site characteristics, namely latitude, longitude, elevation and the observed mean annual precipitation. The frequency analysis was conducted on daily extreme precipitation using the L-moment regional algorithm based on the following extreme value distributions: the generalized extreme value (GEV), generalized normal (GNO) and Pearson Type III (PE3). A fixed regional approach was used in the above studies. Hence, the site-specific ROI approach, which has been given little attention in previous studies, should provide new insight into the behaviour of extreme precipitation in the study region. Also, it remains to be seen whether the site-specific pooling approach is superior to the fixed-region-based approach in this study area.

Overall, the study has the following objectives:

  • Identification of the effectiveness of attributes (observed vs. remotely sensed) to include in the ROI model for the estimation of extreme precipitation in the study area;

  • Performance of the selected ROI model in comparison with the successful regional models (e.g. Su et al. Citation2009, Chen et al. Citation2013) in the study area.

2 Study area and datasets

The Yangtze basin is considered an important basin for the development of the Chinese economy, society and eco-hydrological balance. The total drainage area of the basin is approximately 1 800 000 km2. The length of the Yangtze River, the basin’s principal stream, is about 6300 km. The river originates from the Tanggula Mountains in the Tibetan Plateau and flows east into the East China Sea (Ju et al. Citation2014). The basin has a modest sloping topography that falls from above 5000 m to sea level. The direction of the slope is from the west to the east.

Two distinct types of climate predominate in the basin. The southern part is climatically close to the tropical climate and the northern part to the temperate zone. The precipitation is primarily controlled by the Asian monsoon system, with the monsoon originating from the Indian Ocean influencing the upper basin, and the monsoon originating from the Pacific governing most of the mid- and lower basins (Yihui and Chan Citation2005). This brings huge precipitation during the summer for an extended period of several weeks. The basin is, thus, prone to frequent flooding due to extreme precipitation. The annual mean precipitation (AMP) of the basin is about 1100 mm, with the western side receiving about 400 mm and the value increasing to 1800 mm on the eastern side of the basin. The precipitation pattern maps reasonably well onto the decrease in altitude/elevation (see DEM and AMP values in ).

Figure 1. Location of gauged stations along with the spatial distribution of annual mean precipitation (mm) derived from TRMM datasets. The DEM values for the basin are also indicated in the inset that shows the location of Yangtze basin in China

Figure 1. Location of gauged stations along with the spatial distribution of annual mean precipitation (mm) derived from TRMM datasets. The DEM values for the basin are also indicated in the inset that shows the location of Yangtze basin in China

Daily AM data covering 1948–2013 at 128 stations was used for the analysis. The locations of the stations are shown in . The AM series were extracted from the daily datasets obtained from the National Meteorological Information Center of China Meteorological Administration (http://cdc.cma.gov.cn/index.jsp). The associated elevation values for the gauging stations were available along with the datasets. The quality of the datasets is ensured by the same organization. In this study, stations having data lengths greater than 50 years were used. The higher data length ensures the standard error of the estimate will be at the minimum level, which allows the outcome to be robust. The average length of the data series is about 57 years.

The satellite value of AMP was derived from the Tropical Rainfall Measuring Mission (TRMM), a joint mission between NASA and the Japan Aerospace Exploration Agency designed to monitor and study tropical rainfall. The AMP values were obtained from the compilation by Bookhagen (Citation2013) which is based on a 12-year time series of the TRMM 2B31 product. The values are shown in in map format.

3 Methodology

3.1 Site-specific pooling group and the associated evaluation tool

The ultimate goal of a frequency analysis is to estimate return period values at a study location. Regional analysis in terms of site-specific pooling offers a unique way to group sites for a specific location in an area. The approach uses the ROI method (Burn Citation1990) to pick stations that are hydrometeorologically similar in terms of the similarity distance measure. The method is generally formulated to avoid conflicts at the boundaries of regions associated with the traditional regional methods. Thus, the primary difference from the traditional (fixed-region) method lies in the technique that assists in forming groups. The homogeneity assessment and the frequency analysis are conducted in a similar way to how they are carried out in the traditional method.

The formation of homogeneous groups with the ROI method depends primarily on two criteria: selection of appropriate site descriptors in the similarity distance measure and the group size. Upon formation of a group, its homogeneity is assessed by a heterogeneity check. The Euclidean distance, dij (Institute of Hydrology Citation1999), in the site descriptor space is generally used to estimate the similarity between the target (ith) and pooled (jth) sites. The general form of the Euclidean distance measure is as follows:

(1) dij=k=1nWkXk,iXk,j2(1)

where n is the number of descriptors; Xk,iand Xk,j are the values of the kth descriptor at the ith andjth site, respectively; and Wk is the weight allotted to the descriptors.

In this study, Wk is taken as 1 (when a set comprises more than one descriptor) following the recommendation by Hosking and Wallis (Citation1997, p. 147): when “choosing an appropriate weight is difficult and … the problem is analogous to that of deciding appropriate weights to assign to the variables used in a cluster analysis.” A similar observation was made by Bobée and Rasmussen (Citation1995): “the selection and weighting of variables is one of the problems where no strict mathematical solution is available, but use of common sense can lead to quite acceptable results.”

The geographical and climatological descriptors are widely used in the calculation of dij. However, an assessment is required to find a suitable combination of descriptors so that a robust ROI model can be achieved that can even be implemented in ungauged cases. In terms of pooling group size, the 5T rule (Institute of Hydrology Citation1999) is often employed; this refers to the total number of station-years of data to be included when estimating the T-year event. However, a group containing 500 station-years produced a good performance (in terms of minimizing error) for a range of recurrence levels up to 100 years (Kjeldsen et al. Citation2008).

The groups so formed are then assessed with a homogeneity test. The popular and powerful heterogeneity measure, H1, by Hosking and Wallis (Citation1997), is used to testify the homogeneity of a group. The test, which is based on the L-coefficient of variation (L-CV), computes the sample variability of L-CV among the samples in the group and compares it to the variation that would be expected in a homogeneous group. The variability is defined as

(2) V1=i=1Mnit2it2R2/i=1Mni1/2(2)

where t2i and ni are the values of L-CV and the sample size for sitei; M is the number of sites in the group and t2R is the group average of L-CV.

The expected value (μυ1) and the standard deviation (συ1) of V1for a homogeneous group are obtained by simulation. Homogeneous groups in large numbers were generated using the four-parameter distribution, kappa, with L-moment ratio values equal to t2R, t3Rand t4Rand the at-site mean (L-moment 1) equal to 1. The following equation is then used to estimate H1:

(3) H1=V1μυ1συ1(3)

According to the guideline set by Hosking and Wallis (Citation1997), a region is considered to be “acceptably homogeneous” if H1<1, “possibly heterogeneous” if 1<H1<2, and “definitely heterogeneous” if H1>2.

3.2 Estimation of return period values

The regional frequency procedure including the ROI framework uses the index-flood method (Dalrymple Citation1960) to estimate return period values. With this method, a regional/pooled growth curve (XT) in terms of return period (T) is calculated that is common to all scaled data recorded at all sites within a homogeneous region/group. The regional curve is then multiplied by the at-site index value (PIndex,j) to obtain the estimate of quantile PT,j for site (Hosking and Wallis Citation1997, Institute of Hydrology Citation1999):

(4) PT,j=PIndex,jXT(4)

The mean or median of the subject site’s AM series is generally taken as the index measurement. This study uses median value as the index measurement because it is unaffected by the presence of outliers (Institute of Hydrology Citation1999).

The estimation of growth curve requires the identification of a suitable distribution. There are several ways to determine the suitability of a distribution. The L-moment ratio diagram (LMRD) (Vogel and Fennessey Citation1993, Hosking and Wallis Citation1997) and Hosking-Wallis goodness-of-fit (GOF) measure (Hosking and Wallis Citation1997) are two popular methods. The latter approach is applicable to pooling groups that are not identified in the first place in this work; in fact, this is one of the aims of this study: to produce an appropriate ROI grouping scheme. Hence, LMRD is applied to identify a suitable distribution for the whole study region. Later the groups identified by the selected ROI were assessed by the Hosking-Wallis GOF measure. The GEV (Hosking and Wallis Citation1997) was found to be a suitable distribution for the study region. The suitability of the GEV is explained in detail in Section 4.1.

The GEV with three parameters, location (ξ), scale (α) and shape (κ), defined by (Hosking and Wallis Citation1997) has the following expression

(5) Fx=eey,y=k1log1kxξα,k0xξ/α,k=0(5)

The shape parameter influences the tail behaviour: for κ=0, the distribution is two-parameter Gumbel; for k<0, the distribution is lower bounded whereas for k>0, the distribution is upper bounded.

The growth curve (see EquationEquation 4) based on GEV (reduced to two parameters: κ and beta, β) has the following form (Das and Cunnane Citation2011):

(6) XT=1+βκln2κlnTT1κ(6)

The parameters estimated based on L-moments (Hosking and Wallis Citation1997) have the following expressions:

(7) κ=7.8590c+2.9554c2in which c=23+t3ln2ln3(7)
(8) β=κt2t2Γ1+κln2κ+Γ1+κ12κ(8)

where t2 is the L-coefficient of variation (L-CV), t3 is the L-skewness and Γ is the complete gamma function.

The regional/pooled estimate of t2 and t3 takes it to the regional/pooled growth curve (Hosking and Wallis Citation1997). The regional estimate of t2 and t3 can be achieved based on the following equation:

(9) tiR=j=1Mwijtjj=1Mwij(9)

where tj is either t2 or t3 for the jth most analogous site and wij is a weight, generally taken proportional to the record length.

This study considers wij to be 1 (unweighted average) following the observation noted by Hosking and Wallis (Citation1997, p. 90): weighting it “proportionally to record length may give undue influence to sites that have frequency distributions markedly different from the region as a whole and that also have long records.”

3.3 ROI models

ROI models differ in their similarity distance measure (see EquationEquation 1). A set of different descriptors (also known as pooling variables) and their combinations are generally tested to find the right one. When selecting, it is kept in mind that the chosen set should be applicable under ungauged conditions. It should also be easily accessible. A suitable ROI model is generally used for the whole region. A range of descriptors, primarily the geographical and climatological descriptors, are commonly used (Gaál and Kyselý Citation2009, Das Citation2017). However, the application of remotely sensed data is scarcely carried out.

Four site descriptors are employed in this study. Among them three are geographical (location descriptors in the form of longitude and latitude, and elevation) and one is a climatological descriptor (AMP). The location attributes are selected since the geographical proximity of the sites has proved to deliver favourable results in finding similar regimes of extreme precipitation (Reed et al. Citation1999, Kyselý et al. Citation2011, Das Citation2018, Requena et al. Citation2019a). The elevation is quite an instinctive choice since it is fairly well correlated with the precipitation field (Goovaerts Citation2000, Lloyd Citation2005). The descriptor is routinely used in ROI methodology (Das Citation2017, Ball et al. Citation2019). The AMP is also used in delineating successful homogeneous pooling groups (Gaál et al. Citation2008). Because observed AMP values are only available at gauged locations, the descriptor in this form may not be useful for ungauged cases. This study, thus, takes the satellite-based AMP values. The TRMM-derived AMP value has shown a good relation with the observed AMP in different territories of the world including China (Shi et al. Citation2015). Therefore, the remotely sensed data such as AMP has the ability to serve as a pooling variable for ROI methodology.

Based on four site descriptors, several different ROI models are formed. In order to make a comparison, in particular, with the remotely sensed AMP, an additional model containing the observed AMP is included. lists the details of these models. The weight is taken as 1 (see EquationEquation 1) when a set consists of more than one descriptor (see e.g. ROI-5 and ROI-6).

Table 1. Description of ROI models that differ in their similarity distance measure

3.4 Means of assessment

3.4.1 Assessment between ROI models

ROI models listed in were compared using a simulation framework. The GEV distribution suitable for the study region is used by this framework to generate data. The framework is a two-step procedure. In the first step, at-site population values of t2 and t3 were estimated which were then used to generate data. The use of observed at-site t2 and t3 as population values is considered to yield a simulated region that has much more heterogeneity than the actual data. As a consequence, Hosking and Wallis (Citation1997, p. 93) recommended not using the observed at-site L-moment ratio values for simulation purposes. In the second step, the core simulation was conducted.

The present study uses a method outlined by Das and Cunnane (Citation2011) to estimate at-site population values of t2 and t3. A similar approach can be found in other ROI studies (e.g. Castellarin et al. Citation2001, Gaál et al. Citation2008). The method uses a ROI approach with a distance measure (δi,j) defined in EquationEquation (10) to form a group for a subject site. The population values of t2 and t3 are then appraised as the resultant pooled estimate of t2 and t3 using EquationEquation (9).

(10) δi,j=t2,it2,jσt22+t3,it3,jσt32(10)

In the core simulation, the random series for each site was generated based on GEV with L-moments equal to the at-site population values of t2 and t3 (estimated in first step). The XT value for a target site (a pooling group is formed for the target site) was then evaluated using EquationEquation (6) following the estimation of κ and β using EquationEquations (7Equation8) based on the pooled estimate of t2 and t3 values. A detailed description of the simulation procedure can be obtained from Das and Cunnane (Citation2011). The root mean square error (RMSET), defined in EquationEquation (11), was then employed to compare ROI models.

(11) RMSET=1Mi=1M1Rr=1RXˆi,rTXiTXiT2(11)

where Xˆi,rT is the estimated T-year pooled growth factor at site i at the rth repetition, XiT is the true T-year growth factor at site i, Mis the total number of group members and Ris the total number of repetitions (10 000 used in this study).

3.4.2 Assessment of ROI and regional models

The pooled uncertainty measure (PUM) was used to compare between the selected ROI model and the regional models. Simulated data (such as the synthetic data generation presented in an earlier section) was not used; rather, the actual data was used in the comparison because the models were already identified. The PUM for the return period , defined by the Flood Estimation Handbook (Institute of Hydrology Citation1999), is a form of weighted average of the differences between each site growth factor and the pooled growth factor measured on a logarithmic scale:

(12) PUMT=i=1MlongnilnXTilnXTPi=1Mlongni(12)

where Mlong is the number of long-record sites in a group, nis the record length, XTi is the T-year site growth factor at the ith site and XTP is the T-year pooled growth factor. In this study, stations that have a record length of over 50 years were used. Thus, all sites in a group are considered Mlong.

4 Results and analysis

4.1 Distribution selection

The choice of distribution plays a pivotal role in frequency analysis. In this study the distribution is identified using the LMRD. The AM series were analysed using the diagram; this analysis is reported in . The average estimation of L-moment ratio value falls on the theoretical line of GEV, which signifies that GEV is the most appropriate distribution for the Yangtze basin. Thus, GEV is considered to be the characteristic distribution for carrying out precipitation frequency analysis.

Figure 2. Distribution selection by the L-moment ratio diagram. Three-parameter candidate distributions are represented by lines: GLO, generalized logistic; GEV, generalized extreme value; PE3, Pearson Type III; GNO, generalized normal; GPA, generalized Pareto. Two-parameter distributions are represented by points: N, normal; G, Gumbel. The average value of L-moment ratio (red circle) falls on the GEV line

Figure 2. Distribution selection by the L-moment ratio diagram. Three-parameter candidate distributions are represented by lines: GLO, generalized logistic; GEV, generalized extreme value; PE3, Pearson Type III; GNO, generalized normal; GPA, generalized Pareto. Two-parameter distributions are represented by points: N, normal; G, Gumbel. The average value of L-moment ratio (red circle) falls on the GEV line

4.2 ROI model selection

The simulation framework (see Section 3.4.1) that generates samples under GEV distribution was used to compare the considered ROI models. The error measure in the form of RMSET defined in EquationEquation (11) was assessed at return periods of 10, 50 and 100 years for each station. Data from 128 stations were used in the assessment which, in turn, produces with the ROI approach 128 pooling groups.

The variation in RMSE values at the target return levels is displayed in box plot form for different schemes in . The associated average values are tabulated in . It appears that the statistic differs by a very small number between ROI schemes. The set that encompasses physical distance and elevation (ROI-4) performed best in connection with providing the smallest RMSE100 value, while the set consisting of only remotely sensed AMP (ROI-3) came last in the list. The set consisting of all variables (ROI-6) achieved an error comparable to that of ROI-4 but was not superior to the ROI-4 model. The same outcome is also obtained for the small and mid-range return period context: RMSE10 and RMSE 50. The observed AMP was included to compare its performance against the ROI schemes, in particular against ROI-3. That scheme (ROI-7) performed better than ROI-3 but could not outperform ROI-4, the selected model.

Table 2. Comparison of results in terms of mean RMSE corresponding to T = 10, 50 and 100 years for different ROI models listed in

Figure 3. Assessment of different ROI models. RMSE values in box plot format in three return periods (10, 50 and 100 years) are evaluated. Each box plot contains 128 values

Figure 3. Assessment of different ROI models. RMSE values in box plot format in three return periods (10, 50 and 100 years) are evaluated. Each box plot contains 128 values

In general, the geographical distance and elevation can be taken as being the most appropriate pooling variables for Yangtze basin, and this combination performed better than the attributes associated with remotely sensed data.

4.3 Comparison between site-specific ROI model and regional model

The ROI-4 model is identified as the most appropriate site-specific pooling model for Yangtze basin. The model is compared against the regional models that have been successfully applied to the basin.

Two regional models were taken into consideration for comparison. One is based on a study by Chen et al. (Citation2013) and is labelled R-I. Six homogeneous regions were identified using the fuzzy c-mean clustering methodology. The second one comprises 11 hydrological sub-basins divided over the whole basin. These sub-basins were used as regions in conducting a regional frequency analysis (Su et al. Citation2008, Citation2009). This model is termed R-II. The delineated regions are shown in both cases with the current datasets in .

Figure 4. Regions by R-I and R-II models with current datasets. R-I was delineated by cluster analysis (Chen et al. Citation2013) while R-II was delineated based on sub-basins (Su et al. Citation2009)

Figure 4. Regions by R-I and R-II models with current datasets. R-I was delineated by cluster analysis (Chen et al. Citation2013) while R-II was delineated based on sub-basins (Su et al. Citation2009)

The homogeneity criterion is an important condition for a successful regional analysis. However, several studies identified that a weak homogeneity still should be more considered effective for estimating high return period values than conducting at-site frequency analysis (Hosking and Wallis Citation1997, Institute of Hydrology Citation1999). With the current datasets, the homogeneity test was revisited for both regional models. The heterogeneity measure quantified for both models is reported in . All the regions in R-I are judged to be homogeneous, which is not surprising considering that they were homogeneously delineated in the previous study. Thus, the regions formed by Chen et al. (Citation2013) are deemed robust considering that a little variation in the datasets does not affect the homogeneity of the regions. On the other hand, mixed results were identified with R-II. Out of 11 regions, three are heterogeneous and eight are homogeneous. All of the heterogeneous regions belong to the Upper Yangtze basin. In the case of the ROI model, over 91% of the groups are homogeneous.

Table 3. Heterogeneity measure, H1 and PUM value in three return periods for regions delineated by the regional models with current datasets

The homogeneity measure is like a significance test (Hosking and Wallis Citation1997), thus being homogeneous does not guarantee a superior model. Hence, an error measure is needed. The PUM described in Section 3.4.2 was used to compare two different categories of models. The PUM values measured at T = 10, 50, 100 are displayed in box plots in . Six data points, 11 data points and 128 data points were used to construct box plots for R-I, R-II and ROI-4, respectively. The PUM value in an individual case is reported in (only for R-I and R-II) to demonstrate how a heterogeneous region leads to a higher quantity of error. The variation in the mean PUM value between models is reported in . It appears that the statistic differs by a very small amount between the models. The ROI model has the lowest PUM value, followed by R-I and R-II, in the three representative return period contexts: lower (X10) as well as medium (X50) and higher growth factors (X100).

Table 4. Summary comparison results between site-specific, ROI-4 and regional models: R-I and R-II in terms of mean PUM value

Figure 5. Model selection based on PUM value in box plot form in three return periods: 10, 50 and 100 years. R-I contains six values, R-II contains 11 values and ROI-4 contains 128 values

Figure 5. Model selection based on PUM value in box plot form in three return periods: 10, 50 and 100 years. R-I contains six values, R-II contains 11 values and ROI-4 contains 128 values

The regional model consisting of 11 hydrological sub-basins (R-II) is the least suitable method for the estimation of daily extreme precipitation quantiles. The reason might be the effect of three heterogeneous regions which greatly contributed to the overall PUM value. Although the R-I model performed better than R-II, the homogeneously formed regions were unable to outperform the ROI model in terms of mean PUM value. The difference in error between them is, however, not very large. The superior performance of ROI-4 and R-I over R-II suggests that the improved methods with additional variables give rise to a significant difference in PUM value. While R-I uses a clustering technique to divide a large group of datasets into several smaller groups, ROI selects a tailor-made group of stations for a subject site. In both cases, the similarity is provided by the chosen descriptors. In general, the identified ROI model is able to reduce the error and leads to more reliable extreme precipitation estimates over the Yangtze River basin.

5 Discussion

The accomplishments of the selected model (ROI-4) are further discussed in this section. The first point of discussion is the selection of site descriptors. Although the inclusion of remotely sensed data provides a new direction for the ROI regional analysis, the inclusion is unable to outperform the traditional (geographic) at-site observed descriptors. The location information and elevation that were identified as suitable in the distance measure are easily accessible and could be used to delineate groups for ungauged conditions. The identification of close proximity as an effective descriptor is no surprise as the regional analysis itself is based on the idea that nearby stations in a region possess similar hydroclimatological characteristics. Close proximity was found to deliver successful results in several past ROI studies (Das Citation2018, Requena et al. Citation2019a).

The addition of elevation further improves the accuracy of the model. Elevation/altitude is relatively high in the western part of the basin (upper basin) but drops dramatically towards the east. The systematic drop helps to group the stations appropriately in the study area. As a result, the inclusion of physical distance and elevation combined in the similarity distance measure serves a major role in characterizing the behaviour of extreme precipitation. The analogous distance measure defined in three dimensions using latitude, longitude and elevation was also found suitable in Australian context (Johnson and Green Citation2018, Ball et al. Citation2019). The results regarding geographic descriptors displaying superior performance to climatic descriptors are consistent with the findings of Johnson and Green (Citation2018).

Although the incorporation of remotely sensed data was unable to overpower the traditional attributes in terms of error, the introduction of remotely sensed data paves a new way to be used with the ROI method. Due to the advent of space technology, remotely sensed data are available, but they are hardly used in regional frequency analysis. Future studies should explore their application in other basins around the world. There is an intuitive appeal to using this approach in countries where the density of gauged stations is low or where there are difficulties in installing meteorological stations.

Considering the importance of selecting a suitable distribution in the frequency analysis, we revisit our distribution selection with the selected model. The ROI-4 model is applied to each site and a pooling group is formed with a minimum of 500 years of data. The Hosking-Wallis GOF is applied to each pooling group. The GEV was the best distribution for 63% of cases and was acceptable in 100% of cases. The second best GNO was acceptable for 90% of sites but was the best distribution in only 26% of cases. The performance was sub-par for the remaining three distributions: GLO, PE3 and GPA. Thus, the GEV provided the best overall fit to the AM data of Yangtze basin. This supported the initial selection by the LMRD. The selection is also consistent with the findings of Chen et al. (Citation2013) for the same basin.

Further insights were explored for groups formed based on the selected model. Two statistics were examined that are critical to frequency analysis: the heterogeneity measure and the shape parameter of the GEV distribution. They reveal the deep understanding of the model in evaluating extreme precipitation.

The variation of the shape parameter (κ) and the corresponding pooled average κ value, indicated by a circle, is displayed in box plot form in . The H1 value evaluated for each group is also displayed in the same figure. Most of the pooling groups (over 91%) passed the homogeneity test, with H1 values of less than 2. This justifies the use of combined descriptors: physical distance and elevation as pooling variables. The shape parameter, which plays a pivotal role in describing the tail behaviour of the distribution, is effectively scaled down by the ROI methodology. Similar characteristics were also noticed by Johnson and Green (Citation2018) in their study. The pooled (group average) value of the shape parameter of most pooling groups (about 90%) is below zero which suggests that the model is unbounded (see the parametrization of GEV in Section 3.2). The use of this unbounded condition is often advisable in engineering practice (Papalexiou and Koutsoyiannis Citation2013) as it yields increasing difference in design precipitation between high recurrence intervals.

Figure 6. Shape parameter (k) value and heterogeneity measure of pooling groups delineated by ROI-4 model for each station. Group members’ shape parameter values are shown in box plot form with the pooled average indicated with blue circles

Figure 6. Shape parameter (k) value and heterogeneity measure of pooling groups delineated by ROI-4 model for each station. Group members’ shape parameter values are shown in box plot form with the pooled average indicated with blue circles

It is interesting to note that in several cases the upper bounded condition is correlated with the non-homogeneity of the group, as can be seen from . Another aspect to point out is that all the heterogeneous groups belong to the upper basin (see ). A possible reason is that these stations are located in the upper part of the basin, a distinct climatic area, where fewer meteorological stations are available. This permits the model to use a number of stations from other regions to complete the 500 station-year dataset (about 10 stations per group), which may bring heterogeneity into the groups. In addition, the extremes and the yearly variation of AM data are not that significantly high, which may induce an upper bounded condition. It is suggested to carefully check ROI groups formed in this area and, if necessary, to refine the group members manually as per the suggestion by Hosking and Wallis (Citation1997).

Figure 7. Spatial distribution of the heterogeneity measure (H1) for groups delineated based on the ROI-4 model at the gauged sites

Figure 7. Spatial distribution of the heterogeneity measure (H1) for groups delineated based on the ROI-4 model at the gauged sites

Finally, the growth factor estimated by the selected model is also assessed. The growth factors estimated for each pooling group are presented in box plots with respect to return periods in . A box plot for a particular return period includes the corresponding values for all stations. The mean value increases with an increasing return periods, which is understandable considering that the model, in most cases, is unbounded. The range in each case also provides a sense of uncertainty regarding how the growth factor (e.g. design estimation) can vary within a specified return period.

Figure 8. Box plots of growth factors with respect to return period. Each box plot contains values for all stations (pooling group formed for each station)

Figure 8. Box plots of growth factors with respect to return period. Each box plot contains values for all stations (pooling group formed for each station)

6 Conclusion

This study compares two types of at-site attributes, observed and remotely sensed, to include in the site-specific pooling approach in the estimation of extreme precipitation. The study also examines whether the selected site-specific regional model is superior to the traditional regional group in conducting regional precipitation frequency analysis in Chinese climatic conditions. The present work is carried out in Yangtze basin which is deemed appropriate for performing such examinations considering its gigantic size with varying climatic and topographical regions.

Regarding the first objective, a combination of descriptors – location variables (latitude and longitude), elevation and remotely sensed annual mean precipitation – was used. The satellite data is employed so that if selected, the attribute can be applied at ungauged locations similar to other descriptors considered. Several ROI models that differ in their similarity distance measure were investigated. With respect to the second objective, two regional models (one based on a geographical approach and the other based on a clustering approach) were examined relative to the selected ROI model.

Annual maximum daily precipitation data from 128 stations were analysed to assess the study. Only stations with a high record length (N >50) were considered for the analysis. The first aim was evaluated using a simulation technique in which data were generated based on the representative GEV distribution. The second aim was assessed using an uncertainty measure, namely PUM, based on observed data.

From the study, we draw the following conclusions:

  • The GEV provided the best overall fit to the annual maximum precipitation data of the Yangtze basin.

  • Although the inclusion of remotely sensed data provides a new direction for the ROI regional analysis, its incorporation was unable to outperform the traditional observed site descriptors. The ROI scheme based on physical distance and elevation combined attained the best results among the considered schemes. The groups delineated by the model in most cases successfully passed the homogeneity test. This indicates that the identified similarity measure has a strong link with the governing character of extreme precipitation in the study area. Barring a few instances, the frequency behaviour in most cases is unbounded, which is attractive in practical applications.

  • The ROI, which allows having a tailor-made pooling group for a target site, was found to be superior to the traditional regional models examined in this study. The regional model that divides the whole basin into 11 sub-basins was identified as the least appropriate for estimating extreme precipitation, while the model that used a clustering technique to divide the basin into six homogeneous regions was found to be very close to the ROI model in terms of error measure.

The study is expected to help engineers and water resource managers to assess flood risk in the study region. Although the inclusion of remotely sensed data was outperformed in this case study, future studies should explore its application in other basins across the world. Further research is also required to refine the group selection criteria for the assessment of the ROI approach in the upper part of the Yangtze basin.

Acknowledgements

The authors thank Ana Requena (Associate Editor), Saeid Eslamian and one anonymous reviewer for their critical comments, which helped improve the quality of the manuscript.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

The study was funded by the NUIST faculty start-up grant [grant number 2243141501015] to the first author. Daily precipitation data series were obtained from the National Meteorological Information Center of China Meteorological Administration. The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.