569
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Investigation of landslide dam life span using prediction models based on multiple machine learning algorithms

ORCID Icon, & ORCID Icon
Article: 2273213 | Received 07 Dec 2022, Accepted 16 Oct 2023, Published online: 26 Oct 2023

Abstract

A rapid and accurate prediction of a landslide dam’s life span is of significant importance for emergency geological treatment. However, current prediction models for the state of a landslide dam are based solely on geomorphological indexes, and do not take into consideration attribute properties such as landslide types, trigger factors, and dam types. This study investigates the relationships between a landslide dam’s geometry and the capacity of the barrier lake and proposes fitting models, which supplement the current landslide dam database. Subsequently, six predictive models for landslide dam life span are established, utilizing machine learning algorithms such as logistic regression, k-nearest neighbors, support vector machine, Naïve Bayes, decision tree, and random forest, which consider five factors, including geometry parameters and attribute properties. The performances of these six models are analyzed and compared to a typical prediction model, the dimensionless blockage index (DBI). The results suggest that the models established in this study not only have a consistent absolute accuracy as the DBI model, but also overcome the disadvantage that a large number of cases cannot be judged by the DBI model. Among the formulated machine learning models, the random forest model exhibits the highest absolute accuracy (89%), lowest error rate (7%), lowest false alarm rate (15%), and no uncertainty rate. Additionally, three renowned landslide dams, namely the Costantino, Hsiaolin, and Baige landslide dams, are analyzed to illustrate the applicability of the established machine learning models. The study results provide essential guidance for the predictions and emergency geological treatments of landslide dam disasters.

1. Introduction

Landslide dams, resulting from rockfalls, landslides, and debris flows, are common hazardous geohazards in alpine canyon regions across the globe (Costa and Schuster Citation1988; Cui et al. Citation2009; Havenith et al. Citation2015a; Nian et al. Citation2021). Failure of these dams can cause catastrophic flooding and pose a significant threat to human life and property downstream (Nibigira et al. Citation2018; Fan et al. Citation2021; Wu et al. Citation2023). Notably, the Tangjiashan landslide dam in China in 2008 threatened almost one million people living downstream (Chen et al. Citation2015). In 2018, China’s Baige landslide dam caused a direct economic loss of 15 billion yuan (Cui et al. Citation2020; Fan et al. Citation2020). The landslide dam’s life span varies greatly from a few hours to several years, and some longer-lasting dams are still present today, offering opportunities for power generation and tourism (Havenith et al. Citation2015b; Fan et al. Citation2020; Nian et al. Citation2020; Shen et al. Citation2020; Fang et al. Citation2023). For instance, the Waikaremoana barrier lake in New Zealand serves as a holiday destination and a water source for a power station (Schuster and Alford Citation2004; Risley et al. Citation2006), while China’s Hongshiyan landslide dam was reconstructed as a hydropower station in 2014 (Liu Citation2014). Thus, rapid assessment of the state of a landslide dam is crucial.

Due to the lack of necessary engineering investigation data during the early stages of landslide dam formation, rapid prediction methods have been proposed to determine the stable state of such dams based on geomorphic indices (), including the geometry of the landslide dam and barrier lake characteristics (Casagli and Ermini Citation1999; Ermini and Casagli Citation2003; Korup Citation2004; Dong et al. Citation2011; Tacconi Stefanelli et al. Citation2016; Nian et al. Citation2018; Tacconi Stefanelli et al. Citation2018; Li et al. Citation2020; Shan et al. Citation2020; Wu et al. Citation2020). Fan et al. (Citation2020) have summarized the current popular models for assessing landslide stability, among which the DBI model proposed by Ermini and Casagli (Citation2003), has gained wide acceptance. This model predicts the stable state of a landslide dam based on the dam height, dam volume, and catchment area of the barrier lake. However, these prediction methods tend to overlook the influence of the formation process of a landslide dam and the materials involved on the potential danger posed by the dam. In reality, the formation process and materials of the landslide have a significant impact on the state of a landslide dam. Engineering geological factors such as landslide types, landslide dam materials, and structures may be more critical than geomorphic indices in determining the stability of a landslide dam, as noted by Ermini and Casagli (Citation2003) and Oppikofer et al. (Citation2020). Therefore, it is essential to explore the consideration of engineering geological factors in prediction methods for landslide dam state, in addition to geomorphic indices, to accurately assess the potential risks posed by the dam to downstream populations and infrastructure.

Figure 1. Illustration the factors of geometric parameters of a landslide dam and catchment. (a) Cross-sectional view of the formation process of a landslide dam; (b) longitudinal section of a landslide dam; (c) catchment area of the barrier lake created by a landslide dam.

Figure 1. Illustration the factors of geometric parameters of a landslide dam and catchment. (a) Cross-sectional view of the formation process of a landslide dam; (b) longitudinal section of a landslide dam; (c) catchment area of the barrier lake created by a landslide dam.

With the increasing expansion of the landslide dam database, various influencing factors have been considered in the rapid prediction methods for landslide dam states. Zheng et al. (Citation2021) expanded an extensive database, including 1737 landslide dam cases, and then developed the DBI model by considering the dam materials. Additionally, Shan et al. (Citation2020) developed a new rapid prediction method for landslide dam stability using logistic regression. This method considers the morphology of the landslide dam, particle composition, and barrier lake hydrodynamics. Liao et al. (Citation2022) developed a geotechnical index that considers various parameters such as the volume, height, width, grain size of the landslide dam, and the reservoir capacity of a barrier lake, to estimate the landslide dam state. However, the stability of a landslide dam is not always constant and is subject to change under external conditions. Moreover, even if some landslide dams are initially assessed as unstable, they may fail only after a considerable time lapse of, for example, a month or more (Peng and Zhang Citation2012; Shen et al. Citation2020).

Although some research achievements have been made in rapid prediction methods for stable states of landslide dams, there are still several aspects that require further improvement. Firstly, the existing landslide dam database missed a lot of data information, resulting in fewer cases being applied. To address this issue, it is necessary to explore the intrinsic correlation between the parameters of the database and expand the database to provide more data information. Secondly, the stable state of a landslide dam can change with the alteration of external factors, such as climatic conditions or seismicity. Therefore, life span prediction for a landslide dam may be more meaningful than rapid stability assessment for emergency geological treatment of a landslide dam. Thirdly, types and trigger factors of landslides can significantly impact the materials of the landslide dam (Hungr et al. Citation2014; Wu et al. Citation2020), and they can be easily investigated quickly in the early stage of the landslide dam formation. As a result, it is necessary to explore the relationship between the type and trigger factor of a landslide and the landslide dam’s life span. Fourthly, Costa and Schuster (Citation1988) classified landslide dams into six categories, and the landslide dam type may have an impact on the landslide dam’s life span. Therefore, it is crucial to consider the landslide dam type in prediction models of landslide dam life span.

To address the above-mentioned aspects, this study establishes six prediction models of landslide dam life span using machine learning (ML) algorithms. The principles of the ML algorithms are briefly described in section 2, and the training set sample data for ML algorithms are expanded based on an open-sourced landslide dam database. The ML procedures for life span prediction of landslide dams are introduced in detail in section 3. Furthermore, the comparative results of multiple ML models for life span prediction of landslide dams are presented in section 4. Additionally, three well-known landslide dams, including the Costantino, Hsiaolin, and Baige landslide dam, are introduced in section 5 to demonstrate the applicability of the established ML models. Finally, section 6 discusses the current work’s limitations and future work that needs to be done.

2. Machine learning algorithms and implementations

2.1. Brief descriptions of machine learning algorithms

Six widely-used ML algorithms were employed in this study. These are the logistic regression algorithm, k-nearest neighbor algorithm, support vector machine algorithm, Naïve Bayes algorithm, decision tree algorithm, and random forest algorithm. Below is a brief description of each algorithm.

Logistic regression (LR) is a simple and efficient method for binary and linear classification problems. It is widely-used algorithm in ML. In this algorithm, a y-value is obtained from a linear regression that can produce any y-value, which is then transformed using functions between 0 and 1. The research objects are divided into two categories, with the y-value of 0.5 as the standard (Lombardo and Mai Citation2018).

K-nearest neighbor (KNN) is a simple and easy-to-implement supervised ML algorithm suitable for classification and regression problems. To obtain the optimal training model, the training dataset for the K neighbors closest to the point of concern needs to be found (Abu El-Magd et al. Citation2021).

Support vector machine (SVM) is a supervised learning algorithm used for data analysis for classification and regression (Ullah et al. Citation2022). SVM has several advantages: it is effective in high dimensional spaces; it is effective in cases where several dimensions are more significant than the number of samples; it is effective in memory because of using a subset of training points in the decision function; and different kernel functions can be specified for the decision function (Huang and Zhao Citation2018).

Naïve Bayes (NB) algorithm is based on Bayes’ theorem, with strong assumptions of independence between features, and is a class of simple probabilistic classifiers. The algorithm can be easily constructed without complicated iterative parameter estimation methods. Furthermore, noise and irrelevant attributes are not a problem for the NB algorithm (Das et al. Citation2012). A variation of the NB algorithm based on Gaussian normal distribution and supporting continuous data is known as Gaussian Naive Bayes.

The decision tree (DT) algorithm appears as a hierarchical tree structure and is a non-parametric method. It can identify non-linear and non-additive relationships between input and targeting factors. In the DT algorithm, a factor may be a binary, nominal, ordinal, or quantitative value; however, the classes must be qualitative (category, binary, or ordinal). When DT is applied to a dataset containing factors and classes, it produces a list of rules that are used to recognize classes of records that have not yet been observed (Saito et al. Citation2009; Tsangaratos and Ilia Citation2016).

The random forest (RF) algorithm, proposed by Breiman (Citation2001), is a powerful and intuitive method of classification and regression. It is used to solve classification problems when the unweighted majority of class members must be considered. Random samples of variables are used as the training dataset for calibration using the bagging technique. For each variable, the function determines the model prediction error if the values of that variable are permuted across the out-of-bag observations (Chen et al. Citation2017).

2.2. Implementation

To implement the above ML algorithms, the scikit-learn package was adopted. It is the most comprehensive and open-sourced ML package in Python. Scikit-learn includes a collection of efficiently implemented ML algorithms and is well-documented and maintained by the community.

3. Methods

3.1. Open-sourced database of landslide dams

In the 1990s, Costa and Schuster (Citation1991) were the first to establish a database of landslide dams worldwide by conducting field investigations and literature retrieval. Since then, numerous scholars (Chai et al. Citation1995; Ermini and Casagli Citation2003; Korup Citation2004; Cui et al. Citation2009; Safran et al. Citation2015; Tacconi Stefanelli et al. Citation2015; Zhang et al. Citation2015; Nian et al. Citation2018; Fan et al. Citation2020; Shan et al. Citation2020; Wu et al. Citation2022) continued to enrich the database. The open-sourced database has more than 1800 cases of landslide dams and provides comprehensive information, such as the landslide type, landslide location, landslide trigger factors, landslide dam type, landslide dam volume, landslide dam height, width, and length, barrier lake volume, dam failure time and other relevant details. In this study, a new open-sourced database published by Fan et al. (Citation2020) was used to conduct ML algorithms-based life span prediction of landslide dams.

3.2. Characteristics of landslide dams

According to the open-sourced database, the landslide type can be divided into nine classifications. Different landslide types affect the life spans of landslide dams. suggests that the landslide dams induced by debris flows are more likely to breach in a short time compared with landslide dams induced by avalanches. Based on this realization, landslide type is considered in the current ML study.

Figure 2. Life spans of landslide dams triggered by different landslide types.

Figure 2. Life spans of landslide dams triggered by different landslide types.

To analyze the impact of landslide dam type, trigger factor, and landslide type on the life span of landslide dams, summarizes the characteristics of these factors based on the open-sourced database. Costa and Schuster (Citation1988) divided landslide dams into six categories based on their shape and size in relation to the valley floor, as depicted in . Type I dams are small and do not span the entire valley floor, while type II dams cover the entire valley floor. Type III dams fill the valley sideways, moving considerable distances up and down the valley. Type IV dams occur when both sides of the valley fail simultaneously. Type Ⅴ dams form when a landslide has multiple paths that entire a valley. Finally, Type Ⅵ dams involve one or more failure surfaces that extend under the valley and emerge on the opposite valley side.

Figure 3. Classification of landslide dams (modified from Costa and Schuster (Citation1988)).

Figure 3. Classification of landslide dams (modified from Costa and Schuster (Citation1988)).

Table 1. Summary of the characteristics of landslide dams.

In the open-sourced database, type II and type III landslide dams are the most prevalent, accounting for 51.5% and 25.3% of the data sets, respectively, when unknown classifications are excluded. Earthquakes and rainfall are the primary trigger factors of landslide dams, accounting for 51.5% and 44.7% of the data sets, respectively, when unknown classifications are excluded. Therefore, the current study focuses on type II and type III landslide dams and those triggered by earthquakes and rainfall, while disregarding other landslide dam types and trigger factors.

3.3. Data augmentation

Although there are more than 1800 cases in the database, the number of cases in which information on the length, width, height of a landslide dam, and the capacity of a barrier lake, the landslide dam types, the landslide types, and the trigger factors were recorded at the same time is approximately only 79. However, through data analysis, we find that the product of the length, width, and height of a landslide dam has a linear relationship with the capacity of a barrier lake. shows the linear relationships in different countries and regions. The linear relationships can be expressed by Equation (1). Thus, the number of landslide dam cases in the current study is expanded based on Equation (1) from 79 to 153, increasing by more than 90%. The expended data set is used for subsequent ML study. The supplementary document Table S1 list the data set of landslide dam cases used in the current study. (1a) logVl=0.018+1.412log(HdLdWd),  R2=0.69(1a) (1b) logVl=0.019+1.000log(HdLdWd),  R2=0.66(1b) (1c) logVl=0.048+1.068log(HdLdWd), R2=0.90(1c) (1d) logVl=0.173+0.974log(HdLdWd), R2=0.61(1d) (1e) logVl=0.717+0.867log(HdLdWd), R2=0.68(1e) in which Vl indicates the capacity of the barrier lake (m3), Hd indicates the landslide dam height (m), Ld indicates the landslide dam length (m), Wd indicates the landslide dam width (m). EquationEquations 1a, Equation1b, Equation1c, Equation1d, and 1e represent the linear relationships between the product of the length, width, and height of the landslide dam and the capacity of the barrier lake in the Chinese mainland, Japan, the United States, Italy, and other countries and regions, respectively.

Figure 4. The linear relationship of the landslide dam size and the capacity of barrier Lakes in different countries and regions. (a) Chinese mainland, (b) Japan, (c) the United States, (d) Italy, and (e) other countries and regions.

Figure 4. The linear relationship of the landslide dam size and the capacity of barrier Lakes in different countries and regions. (a) Chinese mainland, (b) Japan, (c) the United States, (d) Italy, and (e) other countries and regions.

3.4. Data pre-processing and modeling procedure

The current study follows a modeling flow consisting of three main steps, as illustrated in . Firstly, the condition factors of landslide dam cases are collected and pre-processed. Secondly, multiple ML algorithms are used to model the process and predict the life span of landslide dams. Lastly, the modeling process is evaluated.

Figure 5. Modeling flow chart for the current study.

Figure 5. Modeling flow chart for the current study.

In the current ML algorithms, five condition factors are utilized, comprising three category factors and two quantitative factors. The category factors, namely trigger factor, landslide dam type, and landslide type, require eigenvalues to be assigned. presents the category factors and their corresponding eigenvalues. Additionally, lists the two quantitative factors, namely the volume index of the landslide dam and the capacity of the barrier lake. The eigenvalue for a landslide dam’s life span is assigned as 1 if it is less than or equal to 30 days, while it is assigned as 0 if it is more than 30 days. Among the 153 landslide dams in the current database (Table S1), 85 existed for more than 30 days, and 68 existed for less than or equal to 30 days. The database is randomly divided into a training database and a test database, with the training database containing 80% of the cases (123 in total) and the test database containing the remaining 20% (30 in total). In the training database, 58 landslide dams existed for less than or equal to 30 days, while in the test database, 10 landslide dams existed for less than or equal to 30 days. The training database is used to train the models, while the test database is used to verify the models obtained from the training database, which are not used in the training processes. Four-fold cross-validation is used to evaluate the models obtained from different hyperparameters combinations and optimization algorithms. The specific method is described as follows. For each step, 3/4 cases in the training database are used for training the model, and the remaining 1/4 cases are used for verification. The above step is repeated four times, and the average accuracy of the four model predictions is taken as the actual accuracy of the model, and the model obtained by the combination of hyperparameters with the highest average accuracy is selected as the developed model.

Table 2. The condition factors and their eigenvalues.

4. Comparison analysis of the multiple models

4.1. Model training results

illustrates the results of the life span prediction of landslide dams with multiple models on the training database. The x-axis displays the prediction results using the ML algorithms, represented by 0 and 1. The black square and red circle indicate the actual life span of a landslide dam. The y-axis displayed the series number of the landslide dams. It can be found in that the average accuracies of the results with the LR algorithm, KNN algorithm, SVM algorithm, NB algorithm, DT algorithm, and RF algorithm in the training database are 79%, 81%, 85%, 76%, 92%, and 93%, respectively.

Figure 6. Life span prediction results of landslide dams with multiple algorithms in the training database. (a) results of LR algorithm, (b) results of KNN algorithm, (c) results of SVM algorithm, (d) results of NB algorithm, (e) results of DT algorithm, (f) results of RF algorithm.

Figure 6. Life span prediction results of landslide dams with multiple algorithms in the training database. (a) results of LR algorithm, (b) results of KNN algorithm, (c) results of SVM algorithm, (d) results of NB algorithm, (e) results of DT algorithm, (f) results of RF algorithm.

To evaluate the accuracy of the established models, the models are used to predict the life span of landslide dams in the test dataset that comprises 30 cases. shows the life span prediction results. It can be found that the average accuracies of the results with the LR algorithm, KNN algorithm, SVM algorithm, NB algorithm, DT algorithm, and RF algorithm in the test database are 73%, 67%, 70%, 73%, 70%, and 73%, respectively. As shown in and , the established models perform better in the training dataset than in the test dataset. However, the average accuracies meet the emergency rescue engineering requirements.

Figure 7. Life span prediction results of landslide dams with multiple algorithms in the test database. (a) results of LR algorithm, (b) results of KNN algorithm, (c) results of SVM algorithm, (d) results of NB algorithm, (e) results of DT algorithm, (f) results of RF algorithm.

Figure 7. Life span prediction results of landslide dams with multiple algorithms in the test database. (a) results of LR algorithm, (b) results of KNN algorithm, (c) results of SVM algorithm, (d) results of NB algorithm, (e) results of DT algorithm, (f) results of RF algorithm.

4.2. Analysis on the effectiveness of the models

Due to the imbalance in the proportion of positive and negative samples in the database, to evaluate the model’s overall performance more accurately, this study employs the receiver operating characteristic (ROC) curves to analyze the effectiveness of the algorithms comprehensively. The ROC curve is drawn from the false positive and valid positive rates corresponding to different judgment criteria. To make the quantitative comparison more convenient, the area under the curve (AUC) is usually used to represent the overall performance of an ML model. The AUC value is closer to 1 if the model’s performance is better (He et al. Citation2019). shows the ROC curves and AUC values of different ML models with the training dataset. The results demonstrate that the RF model’s AUC value is the highest, indicating that the RF model has the best performance in the training dataset. shows the ROC curves and AUC values of different ML models with the test dataset. The AUC value of the RF model is 0.78 in the test dataset, which is lower than that in the training dataset. This indicates that the performance of the RF model in the test dataset is not as good as that in the training dataset. The LR model shows the best performance in the test dataset, and its AUC value is 0.90.

Figure 8. ROC curves and AUC values of different ML models with the training dataset.

Figure 8. ROC curves and AUC values of different ML models with the training dataset.

Figure 9. ROC curves and AUC values of different ML models with the test dataset.

Figure 9. ROC curves and AUC values of different ML models with the test dataset.

4.3. Comparison with the dimensionless blockage index

In this subsection, we compare the performance of the developed machine learning (ML) models with the dimensionless blockage index (DBI) proposed by Ermini and Casagli (Citation2003) using the same database of landslide dams. The DBI model is a widely used for assessing landslide dam stability and is expressed as EquationEquation (2): (2) DBI=lg(AbHdVd)(2) where Ab is the catchment area (km2); Hd is the landslide dam height (m); Vd is the landslide dam volume (106 m3).

We evaluate the performance of the developed ML models and the DBI model using absolute accuracy, false alert rate, error rate, and uncertain rate as metrics. Absolute accuracy pertains to the likelihood that the actual condition of a landslide dam aligns perfectly with its prediction. False alarm rates manifest when a predicted result erroneously indicates that a landslide dam is unstable, or that the dam’s life span is less than 30 days, while the actual situation contradicts this prediction. Error rates denote the probability that a landslide dam is unstable, or that the dam’s life span is less than 30 days, but the predictive result suggests otherwise. Lastly, the uncertainty rate is defined as the proportion of cases in which the state of the landslide dam cannot be definitively determined. The performances of the developed ML models are shown in and , while the performance of the DBI model is shown in . Based on these results, the absolute accuracy, false alert rate, error rate, and uncertain rate can be obtained.

Figure 10. Predicted results of the DBI model. If the DBI is less than 2.75, a landslide dam may be stable; if the DBI is greater than 3.08, a landslide dam may be unstable; if the DBI is range from 2.75 to 3.08, the stability of a landslide dam cannot be judged.

Figure 10. Predicted results of the DBI model. If the DBI is less than 2.75, a landslide dam may be stable; if the DBI is greater than 3.08, a landslide dam may be unstable; if the DBI is range from 2.75 to 3.08, the stability of a landslide dam cannot be judged.

compares the absolute accuracies, false alert rates, error rates and uncertain rates of the developed ML models and the DBI model. The absolute accuracies of RF, DT and SVM are greater than the absolute accuracy of the DBI model. In addition, the false alert rates and error rates of RF, DT and SVM models are lower than which of the DBI model, with no uncertain rates observed in the RF, DT, and SVM models. Among the six ML models, the RF model has the best absolute accuracy (89%), the lowest error rate (13%), the lowest false alert rate (7%), and no uncertain rate. These results suggest that the RF model can perform well in practical applications.

Figure 11. Comparison results of multiple models.

Figure 11. Comparison results of multiple models.

5. Case studies

This paper presents three typical landslide dams, namely the Costantino landslide dam, the Hsiaolin village landslide dam, and the first Baige landslide dam. These dams are introduced to demonstrate the applicability of the ML models developed in this study. The basic information about the three dams is as follows.

The Costantino landslide dam was triggered by heavy rainfalls on the night of January 3-4, 1973, in Italy. The landslide dam has a maximum height of approximately 90 m, a width of approximately 700 m, and a length of approximately 400 m. The storage capacity of the Costantino barrier lake was approximately 7.5 million m3. The landslide was classified as rock avalanche and rock fall, and the landslide dam can be classified as type II (Fan et al. Citation2020). The landslide dam breached within 31 days, a deep canyon downstream the barrier lake was formed due to filtration and erosion, resulting in lowering the lake to a depth of about 20 m (Cencetti et al. Citation2017).

The Hsiaolin village landslide dam was triggered by rainfall on August 9, 2009, in Taiwan. The landslide dam’s height, width, and length were approximately 44, 370, and 1500 m, respectively. The storage capacity of the barrier lake was approximately 9.9 million m3. The landslide dam was composed of an unconsolidated heterogeneous mixture of earth and rock debris in a naturally unstable state, which was classified as a slide and flow type. The landslide dam was classified as type III. The life span of the landslide dam was less than 1 h, and the peak discharge rate of this massive landslide dam breach was 70649 m3/s. More than 400 people died in flash floods caused by the dam failure (Dong et al. Citation2011; Li et al. Citation2011).

The first Baige landslide located in Sichuan-Tibet occurred on Ooctober 10, 2018 have aroused widespread concern in the landslide study community (Nian et al. Citation2020). The Baige landslide blocked the Jinsha River for approximately two days. The occurrence of landslides is related to rainfall. The landslide type can be classified as slide and flow. The landslide dam type can be classified as II. The landslide dam’s height was range from 81-130 m, and the dam’s width was range from 960 to 1330 m, and the dam’s length was range from 500-700 m. The medium values of the dam’s height, width and length are adopted for calculation. The storage capacity of the barrier lake was approximately 290 million m3 (Nian et al. Citation2020).

summarizes the input parameters used to establish the ML models for predicting the life spans of the landslide dams. The six machine models evaluated that the Costantino landslide dam would exist for more than 30 days, while the Hsiaolin village landslide dam and the first Baige landslide dam would exist for less than 30 days. The evaluation results are consistent with the actual situation, indicating that the ML models proposed in this study have good applicability.

Table 3. Input parameters of the three landslide dams.

6. Discussions

This study makes a comparative analysis of the performances of multiple ML models and a traditional empirical model (DBI). Although traditional statistical methods are usually more interpretable than machine learning methods (Liao et al. Citation2022), machine learning methods can potentially offer higher predictive power. As evidenced by , the nonlinear RF model presents superior performance to linear ML models and the traditional empirical model. This indicates that the prediction of the life spans of landslide dams is a complex task, incorporating nonlinear relationships. Traditional statistical methods and linear ML models, such as the LR model, may encounter difficulties, while nonlinear ML models can offer a more effective solution.

In the context of classification problems, an ideal model outcome can be achieved with a training database encompassing 100 cases (Domingos Citation2012). An expansion in data volume can help alleviate issues of overfitting and data uncertainty (Arabameri et al. Citation2021; Pham et al. Citation2021). Our current database of dam instances comprises 153 landslide dam cases, some of which are historic and contain uncertain records of landslide dam characteristics. Moreover, some life span records of landslide dams are qualitative, such as several days, several months, no failure, and lack corresponding quantitative information. Such limitations can be mitigated by increasing the volume of data. Supplementing or blending the database using numerical simulations or physical model experiments based on the mechanism model is a viable approach (Reichstein et al. Citation2019). Consequently, qualitative life span descriptions of landslide dams can be transformed into quantitative data. The future research could focus on developing a hybrid model that combines mechanism and machine learning models to predict the numerical life spans of landslide dams.

A single machine learning model may have limited problem-solving ability and poor generalization capability. To address this issue, ensemble learning, which has been shown to be an effective approach for enhancing the performance of individual machine learning models (Pham et al. Citation2021; Karir et al. Citation2022), is suggested for future investigation. A more accurate prediction model for landslide dam life span could be developed by using novel ensemble learning models.

7. Conclusions

This study proposed models for predicting the life span of landslide dams using multiple machine learning algorithms based on an expanded landslide dam database. The performances of multiple ML algorithms, including the LR algorithm, KNN algorithm, SVM algorithm, NB algorithm, DT algorithm, and RF algorithm, are analyzed and compared with a typical model. Three typical landslide dams are introduced to demonstrate the applicability of the established ML models. The main conclusions that can be drawn are as follows.

  1. The product of a landslide dam’s length, width, and height has a linear relationship with the capacity of a barrier lake. Six fitting models are proposed to expand the landslide dam database information. The number of landslide dam cases is expanded based on the models from 79 to 153, increasing by more than 90%.

  2. Five factors, including the landslide types, landslide triggers, landslide dam types, the product of the length, width, and height of a landslide dam, and the capacity of a barrier lake, are considered in ML algorithms for prediction of landslide dam life span.

  3. Compared with the DBI model, the ML models established in this study not only has a consistent absolute accuracy as it, but also overcomes the disadvantage that a large number of cases cannot be judged by the DBI model. Among the established ML models, the RF model has the best absolute accuracy (89%), the lowest error rate (7%), the lowest false alert rate (15%), and no uncertain rate.

In future work, this study suggests investigating more accurate prediction models for landslide dam life span using novel ensemble learning methods. Additionally, numerical simulations or physical model experiments based on the mechanism model are recommended to obtain the numeric life span of landslide dams, as the current dam database mostly contains qualitative descriptions, such as "several days" or "several months," which lack relevant quantitative information. A hybrid model that combines mechanism and machine learning models could be used to predict the numeric life span of landslide dams.

Author contributions

HW: methodology, investigation, and writing-original draft. TK N and ZG S: conceptualization, supervision, funding acquisition, writing-review, and editing. All authors contributed to the article and approved the submitted version.

Supplemental material

Supplemental Material

Download MS Word (352 KB)

Acknowledgments

Critical comments by anonymous reviewers greatly improved the initial manuscript. Thanks to Associate Professor Yihuai Lou of Zhejiang University for his valuable suggestions on this paper.

Data availability statement

The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure statement

The authors declare that they have no known competing financial interests with POWERCHINA Huadong Engineering Corporation Limited, and no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Funding

This research was supported by the National Natural Science Foundation of China (42207228 and 51579032), and the Sichuan Science and Technology Program (2022NSFSC1060), and the Zhejiang Provincial Natural Science Foundation of China (LQ22E090008). The authors gratefully acknowledge the support from the funding.

References

  • Abu El-Magd SA, Ali SA, Pham QB. 2021. Spatial modeling and susceptibility zonation of landslides using random forest, naïve bayes and K-nearest neighbor in a complicated terrain. Earth Sci Inform. 14(3):1227–1243. doi: 10.1007/s12145-021-00653-y.
  • Arabameri A, Chandra Pal S, Costache R, Saha A, Rezaie F, Seyed Danesh A, Pradhan B, Lee S, Hoang ND. 2021. Prediction of gully erosion susceptibility mapping using novel ensemble machine learning algorithms. Geomat Nat Haz Risk. 12(1):469–498. doi: 10.1080/19475705.2021.1880977.
  • Breiman L. 2001. Random forests. Mach Learn. 45(1):5–32. doi: 10.1023/A:1010933404324.
  • Casagli N, Ermini L. 1999. Geomorphic analysis of landslide dams in the Northern Apennine. Chikei. 20:219–249.
  • Cencetti C, Di Matteo L, Romeo S. 2017. Analysis of Costantino landslide dam evolution (Southern Italy) by means of satellite images, aerial photos, and climate data. Geosciences. 7(2):30. doi: 10.3390/geosciences7020030.
  • Chai HJ, Liu HC, Zhang ZY. 1995. The catalog of Chinese landslide dam events. J Geol Hazards and Environ Preserv. 6(4):1–9. (in Chinese)
  • Chen Z, Ma L, Yu S, Chen S, Zhou X, Sun P, Li X. 2015. Back analysis of the draining process of the Tangjiashan Barrier Lake. J Hydraul Eng. 141(4):05014011. doi: 10.1061/(ASCE)HY.1943-7900.0000965.
  • Chen W, Xie X, Wang J, Pradhan B, Hong H, Bui DT, Duan Z, Ma J. 2017. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. CATENA. 151:147–160. doi: 10.1016/j.catena.2016.11.032.
  • Costa JE, Schuster RL. 1988. The formation and failure of natural dams. Geol Soc Am Bull. 100(7):1054–1068. doi: 10.1130/0016-7606(1988)100<1054:TFAFON>2.3.CO;2.
  • Costa JE, Schuster RL. 1991. Documented historical landslide dams from around the world. U.S. Geological Survey Open-File Report 91-239. 486 pp. Vancouver, Washington. doi: 10.3133/ofr91239.
  • Cui Y, Bao P, Xu C, Fu G, Jiao Q, Luo Y, Shen L, Xu X, Liu F, Lyu Y, et al. 2020. A big landslide on the Jinsha River, Tibet, China: geometric characteristics, causes, and future stability. Nat Hazards. 104(3):2051–2070. doi: 10.1007/s11069-020-04261-9.
  • Cui P, Zhu YY, Han YS, Chen XQ, Zhuang JQ. 2009. The 12 May Wenchuan earthquake-induced landslide lakes: distribution and preliminary risk evaluation. Landslides. 6(3):209–223. doi: 10.1007/s10346-009-0160-9.
  • Das I, Stein A, Kerle N, Dadhwal VK. 2012. Landslide susceptibility mapping along road corridors in the Indian Himalayas using Bayesian logistic regression models. Geomorphology. 179:116–125. doi: 10.1016/j.geomorph.2012.08.004.
  • Domingos P. 2012. A few useful things to know about machine learning. Commun ACM. 55(10):78–87. doi: 10.1145/2347736.2347755.
  • Dong JJ, Li YS, Kuo CY, Sung RT, Li MH, Lee CT, Chen CC, Lee WR. 2011. The formation and breach of a short-lived landslide dam at Hsiaolin village, Taiwan—part I: post-event reconstruction of dam geometry. Eng Geol. 123(1–2):40–59. doi: 10.1016/j.enggeo.2011.04.001.
  • Dong JJ, Tung YH, Chen CC, Liao JJ, Pan YW. 2011. Logistic regression model for predicting the failure probability of a landslide dam. Eng Geol. 117(1–2):52–61. doi: 10.1016/j.enggeo.2010.10.004.
  • Ermini L, Casagli N. 2003. Prediction of the behaviour of landslide dams using a geomorphological dimensionless index. Earth Surf Process Landforms. 28(1):31–47. doi: 10.1002/esp.424.
  • Fan X, Dufresne A, Siva Subramanian S, Strom A, Hermanns R, Tacconi Stefanelli C, Hewitt K, Yunus AP, Dunning S, Capra L, et al. 2020. The formation and impact of landslide dams – State of the art. Earth-Sci Rev. 203:103116. doi: 10.1016/j.earscirev.2020.103116.
  • Fan X, Dufresne A, Whiteley J, Yunus AP, Subramanian SS, Okeke CA, Pánek T, Hermanns RL, Ming P, Strom A, et al. 2021. Recent technological and methodological advances for the investigation of landslide dams. Earth-Sci Rev. 218:103646. doi: 10.1016/j.earscirev.2021.103646.
  • Fang K, Tang H, Li C, Su X, An P, Sun S. 2023. Centrifuge modelling of landslides and landslide hazard mitigation: a review. Geosci Front. 14(1):101493. doi: 10.1016/j.gsf.2022.101493.
  • Fan X, Yang F, Siva Subramanian S, Xu Q, Feng Z, Mavrouli O, Peng M, Ouyang C, Jansen JD, Huang R. 2020. Prediction of a multi-hazard chain by an integrated numerical simulation approach: the Baige landslide, Jinsha River, China. Landslides. 17(1):147–164. doi: 10.1007/s10346-019-01313-5.
  • Havenith HB, Fan X, Torgoev A. 2015b. Hazard and risk related to earthquake-triggered landslides. In: Lollino G, Giordan D, Crosta GB, Corominas J, Azzam R, Wasowski J, Sciarra N, edtiors. Proceedings of the Engineering Geology for Society and Territory - Volume 2. Cham: Springer.
  • Havenith HB, Strom A, Torgoev I, Torgoev A, Lamair L, Ischuk A, Abdrakhmatov K. 2015a. Tien Shan geohazards database: earthquakes and landslides. Geomorphology. 249:16–31. doi: 10.1016/j.geomorph.2015.01.037.
  • He Q, Shahabi H, Shirzadi A, Li S, Chen W, Wang N, Chai H, Bian H, Ma J, Chen Y, et al. 2019. Landslide spatial modelling using novel bivariate statistical based Naïve Bayes, RBF Classifier, and RBF Network machine learning algorithms. Sci Total Environ. 663:1–15. doi: 10.1016/j.scitotenv.2019.01.329.
  • Huang Y, Zhao L. 2018. Review on landslide susceptibility mapping using support vector machines. CATENA. 165:520–529. doi: 10.1016/j.catena.2018.03.003.
  • Hungr O, Leroueil S, Picarelli L. 2014. The Varnes classification of landslide types, an update. Landslides. 11(2):167–194. doi: 10.1007/s10346-013-0436-y.
  • Karir D, Ray A, Kumar Bharati A, Chaturvedi U, Rai R, Khandelwal M. 2022. Stability prediction of a natural and man-made slope using various machine learning algorithms. Transp Geotech. 34:100745. doi: 10.1016/j.trgeo.2022.100745.
  • Korup O. 2004. Geomorphometric characteristics of New Zealand landslide dams. Eng Geol. 73(1-2):13–35. doi: 10.1016/j.enggeo.2003.11.003.
  • Liao HM, Yang XG, Lu GD, Tao J, Zhou JW. 2022. A geotechnical index for landslide dam stability assessment. Geomat Nat Haz Risk. 13(1):854–876. doi: 10.1080/19475705.2022.2048906.
  • Li D, Nian T, Wu H, Wang F, Zheng L. 2020. A predictive model for the geometry of landslide dams in V-shaped valleys. Bull Eng Geol Environ. 79(9):4595–4608. doi: 10.1007/s10064-020-01828-5.
  • Li MH, Sung RT, Dong JJ, Lee CT, Chen CC. 2011. The formation and breaching of a short-lived landslide dam at Hsiaolin Village, Taiwan—Part II: simulation of debris flow with landslide dam breach. Eng Geol. 123(1–2):60–71. doi: 10.1016/j.enggeo.2011.05.002.
  • Liu N. 2014. Hongshiyan landslide dam danger removal and coordinated management. Front Eng. 1(3):308–317. doi: 10.15302/J-FEM-2014041.
  • Lombardo L, Mai PM. 2018. Presenting logistic regression-based landslide susceptibility results. Eng Geol. 244:14–24. doi: 10.1016/j.enggeo.2018.07.019.
  • Nian TK, Wu H, Chen GQ, Zheng DF, Zhang YJ, Li DY. 2018. Research progress on stability evaluation method and disaster chain effect of landslide dam. Chinese J Rock Mech Engin. 37(8):1796–1812. (in Chinese)
  • Nian TK, Wu H, Li DY, Zhao W, Takara K, Zheng DF. 2020. Experimental investigation on the formation process of landslide dams and a criterion of river blockage. Landslides. 17(11):2547–2562. doi: 10.1007/s10346-020-01494-4.
  • Nian TK, Wu H, Takara K, Li D-y, Zhang Y-j 2021. Numerical investigation on the evolution of landslide-induced river blocking using coupled DEM-CFD. Comput Geotech. 134:104101. doi: 10.1016/j.compgeo.2021.104101.
  • Nibigira L, Havenith HB, Archambeau P, Dewals B. 2018. Formation, breaching and flood consequences of a landslide dam near Bujumbura, Burundi. Nat Hazards Earth Syst Sci. 18(7):1867–1890. doi: 10.5194/nhess-18-1867-2018.
  • Oppikofer T, Hermanns RL, Jakobsen VU, Böhme M, Nicolet P, Penna I. 2020. Semi-empirical prediction of dam height and stability of dams formed by rock slope failures in Norway. Nat Hazards Earth Syst Sci. 20(11):3179–3196. doi: 10.5194/nhess-20-3179-2020.
  • Peng M, Zhang LM. 2012. Breaching parameters of landslide dams. Landslides. 9(1):13–31. doi: 10.1007/s10346-011-0271-y.
  • Pham K, Kim D, Park S, Choi H. 2021. Ensemble learning-based classification models for slope stability analysis. CATENA. 196:104886. doi: 10.1016/j.catena.2020.104886.
  • Reichstein M, Camps-Valls G, Stevens B, Jung M, Denzler J, Carvalhais N, Prabhat. 2019. Deep learning and process understanding for data-driven Earth system science. Nature 566(7743):195–204. doi: 10.1038/s41586-019-0912-1.
  • Risley JC, Walder JS, Denlinger RP. 2006. Usoi dam wave overtopping and flood routing in the Bartang and Panj Rivers, Tajikistan. Nat Hazards. 38(3):375–390. doi: 10.1007/s11069-005-1923-9.
  • Safran EB, O'Connor JE, Ely LL, House PK, Grant G, Harrity K, Croall K, Jones E. 2015. Plugs or flood-makers? The unstable landslide dams of eastern Oregon. Geomorphology. 248:237–251. doi: 10.1016/j.geomorph.2015.06.040.
  • Saito H, Nakayama D, Matsuyama H. 2009. Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: The Akaishi Mountains, Japan. Geomorphology. 109(3–4):108–121. doi: 10.1016/j.geomorph.2009.02.026.
  • Schuster RL, Alford D. 2004. Usoi landslide dam and lake Sarez, Pamir Mountains, Tajikistan. Environ Eng Geosci. 10(2):151–168. doi: 10.2113/10.2.151.
  • Shan Y, Chen S, Zhong Q. 2020. Rapid prediction of landslide dam stability using the logistic regression method. Landslides. 17(12):2931–2956. doi: 10.1007/s10346-020-01414-6.
  • Shen D, Shi Z, Peng M, Zhang L, Jiang M. 2020. Longevity analysis of landslide dams. Landslides. 17(8):1797–1821. doi: 10.1007/s10346-020-01386-7.
  • Tacconi Stefanelli C, Catani F, Casagli N. 2015. Geomorphological investigations on landslide dams. Geoenviron Disasters. 2(1):21. doi: 10.1186/s40677-015-0030-9.
  • Tacconi Stefanelli C, Segoni S, Casagli N, Catani F. 2016. Geomorphic indexing of landslide dams evolution. Eng Geol. 208:1–10. doi: 10.1016/j.enggeo.2016.04.024.
  • Tacconi Stefanelli C, Vilímek V, Emmer A, Catani FJL. 2018. Morphological analysis and features of the landslide dams in the Cordillera Blanca, Peru. Landslides. 15(3):507–521. doi: 10.1007/s10346-017-0888-6.
  • Tsangaratos P, Ilia I. 2016. Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi Perfection, Greece. Landslides. 13(2):305–320. doi: 10.1007/s10346-015-0565-6.
  • Ullah I, Aslam B, Shah SHIA, Tariq A, Qin S, Majeed M, Havenith HB. 2022. An integrated approach of machine learning, remote sensing, and GIS data for the landslide susceptibility mapping. Land. 11(8):1265. doi: 10.3390/land11081265.
  • Wu H, Nian TK, Chen GQ, Zhao W, Li DY. 2020. Laboratory-scale investigation of the 3-D geometry of landslide dams in a U-shaped valley. Eng Geol. 265:105428. doi: 10.1016/j.enggeo.2019.105428.
  • Wu H, Nian TK, Shan Z, Li DY, Guo XS, Jiang XG. 2023. Rapid prediction models for 3D geometry of landslide dam considering the damming process. J Mt Sci. 20(4):928–942. doi: 10.1007/s11629-022-7906-z.
  • Wu H, Trigg MA, Murphy W, Fuentes R. 2022. A new global landslide dam database (RAGLAD) and analysis utilizing auxiliary global fluvial datasets. Landslides. 19(3):555–572. doi: 10.1007/s10346-021-01817-z.
  • Wu H, Zheng DF, Zhang YJ, Li DY, Nian TK. 2020. A photogrammetric method for laboratory-scale investigation on 3D landslide dam topography. Bull Eng Geol Environ. 79(9):4717–4732. doi: 10.1007/s10064-020-01870-3.
  • Zhang L, Peng M, Chang D, Xu Y. 2015. Dam failure mechanisms and risk assessment. Singapore: John Wiley & Sons. doi: 10.1002/9781118558522.
  • Zheng H, Shi Z, Shen D, Peng M, Hanley KJ, Ma C, Zhang L. 2021. Recent advances in stability and failure mechanisms of landslide dams. Front Earth Sci. 9:201. doi: 10.3389/feart.2021.659935.