Full article: A spatial evaluation method for earthquake disaster using optimized BP neural network model

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Rapid spatial evaluation of seismic disaster after earthquake occurrence is required in disaster emergency rescue management, because of its importance in decreasing casualties and property losses. Among many categories of seismic disaster, evaluation of earthquake-affected population is of great significance to clarify the severity of earthquake disaster. For simple classic regression model, it is difficult to describe the strong nonlinear relationship between multiple influencing factors and earthquake disasters. In present study, an optimized BP neural network model considering spatial characteristic of influencing factors is proposed to evaluate the population distribution affected by earthquake. The correlation between earthquake-affected population and influencing factors is analysed using data of 2013 Ms7.0 Lushan earthquake. Ten influencing factors including elevation, slope angle, population density, per capita GDP, distance to fault, distance to river, NDVI, PGA, PGV, and distance to the epicentre, were classified into environmental and seismic factors. Correlation analysis revealed that per capita GDP and PGA factor had a stronger correlation with the earthquake-affected population. The earthquake-affected population was evaluated using a BP neural network by optimizing training samples considering spatial characteristics of per capita GDP and PGA factors. Different numbers of sample points, instead of a random distribution of sample points, were generated in areas with different value intervals of the influencing factors. The optimized samples improved the convergence speed and generalization capability of neuron network compared to random samples. The trained network was applied to the 2017 Ms7.0 Jiuzhaigou earthquake to verify its prediction accuracy. The MAE of the estimated earthquake-affected populations of different counties under Jiuzhaigou earthquake were 1.276 people/km² using network model from optimized samples, smaller than the results of network model from random samples and linear regression model. The results indicate that BP neural network, which considers correlation characteristics of factors, has capability to evaluate spatial earthquake disaster.

Keywords:

1. Introduction

Earthquake disasters have profound impacts on humans and the environment owing to their instantaneous and destructive nature. Intense seismic ground motion can cause severe casualties, house collapse and economic losses. Strong earthquakes have continued to occur worldwide to the past two decades (Rossetto et al. Citation2007; Lara et al. Citation2017; Shimada Citation2016). Earthquake disaster can injure or kill a few to tens of thousands of people, distributed in different spatial locations (Zhao et al. Citation2018). Although concerns on seismic problems continue to deepen and the seismic awareness of human being is constantly enhanced, the active activities of geological structures are still affecting the anthroposphere in the past decades (Sun et al. Citation2016; Wu et al. Citation2020; Santos-Reyes and Gouzeva Citation2020; Luo et al. Citation2022). Because of the unpredictability of earthquake occurrence, it is difficult to prepare before an earthquake; thus, countries are committed to improving the emergency rescue ability after an earthquake (Huang and Li Citation2014). Evaluation of earthquake disaster information is particularly important for offering references to emergency rescue and decision making. Among various types of seismic disasters, number of populations affected by earthquake is the most representative indicator being helpful in assessing the severity of earthquake disaster. Therefore, casualty evaluation after an earthquake is rapidly becoming an important issue that is increasingly responsible for more effective economic, social, and environmental risk management (Huang and Huang Citation2018).

The evaluation method of seismic disaster should be rapid, reliable, and computationally easy. Rapid Visual Screening (RVS) has been extensively applied for disaster investigation of building damage. However, it was time-consuming and required subjective judgement. Besides RVS, vulnerability analysis or probability analysis also had wide application in disaster assessment (Harirchian et al. Citation2021). Satish et al. (Citation2021) proposed a rapid seismic vulnerability assessment method of buildings based on HAZUS methodology. Büyüksaraç et al. (Citation2021) determined the probabilistic seismic hazard curves under different earthquake scenarios. Seismic vulnerability analysis has good practicability and effectiveness for disaster assessment. However, the categories of influencing factors of vulnerability analysis are limited and disaster under intense seismic motion is a complex result of various influencing factors. Seismic intensity, topography, population, and economic level are all related, to a certain extent, to casualties caused by earthquakes. It is important to consider the multidimensional complexity of influencing factors of the evaluation model (Erdik et al. Citation2011).

With the continuous improvement in computing speed in recent years, machine learning methods have become more widely used. An increasing number of scholars have applied them for disaster mapping under earthquakes considering that machine learning methods can learn from historical data to produce insight on extreme events (Yang et al. Citation2015; Choubin et al. Citation2019; Pourghasemi et al. Citation2019; Jena et al. Citation2020; Hou et al. Citation2020; Si and Du Citation2020; Luo et al. Citation2020). Aghamohammadi et al. (Citation2013) used an artificial neural network (ANN) to estimate the human loss due to building damage under the 2003 Bam earthquake. Harirchian and Lahmer (Citation2020) used ANN to predict the seismic damage state of reinforced concrete buildings considering various building properties. Huang et al. (Citation2015) proposed a robust wavelet (RW) v-SVM (support vector machine) earthquake casualty prediction model. Factors including earthquake magnitude, intensity, population density, pre-warning level, in-building probability, location of occurrence, supply support, and building collapse ratio were considered. It was concluded that the RW v-SVM model had higher prediction accuracy and faster learning than the standard SVM and neural network. Gul and Guneri (Citation2016) built an ANN model for casualty prediction, taking occurrence time, magnitude, and population density as factors. Data from 21 earthquakes in Turkey were collected as samples for network training. Huang et al. (Citation2020) established an extreme learning machine (ELM) network to predict earthquake casualties based on data from 84 groups of earthquake victims of China. It was found that the ELM algorithm had better robustness and generalization capability than the BP neural network and SVM. It can be noticed that the existing studies focused on the prediction of the number of people affected by earthquakes and consider factors of multiple dimensions. Moreover, the accuracy and performance of different machine learning methods were compared based on the evaluation results of the earthquake casualties. However, the input layers of the different machine learning methods used numerical data without spatial information, and the spatial characteristics of the disaster information on the output layer were not evaluated effectively. The spatial property of assessment result of seismic disaster deserves to be paid more attention, instead of only the numerical value of result. For earthquake emergency management, the spatial distribution of disaster information within the earthquake-affected areas is crucial to the formulation of detailed rescue plans.

The generalization capability of a network refers to its ability to obtain accurate outputs when inputting new data other than training samples. The generalization capability is the most important index for measuring the performance of a network. The complexity of the structure and samples are the main factors affecting the generalization capability of the model. Research by Partridge on a three-layer neural network found that the influence of the training set on generalization capability is greater than that of neural number (Partridge Citation1996). Many researchers have combined principal component analysis (PCA), clustering analysis, and other methods with machine learning to optimize the training set to improve the generalization capability of the network (Basharat et al. Citation2016; Li et al. Citation2020; Lythgoe et al. Citation2021). Azar et al. (Citation2021) used adaptive neuro-fuzzy inference system optimized by Harris hawk optimization to increase the performance of SVM model. Lou et al. (Citation2012) used PCA to reduce the dimension of assessment factors, disaster-formative environments, and disaster-affected bodies, and established a BP neural network to assess the economic loss under tropical cyclones in Zhejiang Province. Gao et al. (Citation2020) adopted a combined application of PCA and ANN to evaluate personal exposure level to PM2.5 and found that the combined method produced more accurate results than the simple ANN method. It was found that optimizing the input samples of the network could improve the generalization capability. Most of the existing sample optimization methods are based on statistical analyses of the numerical dimensions. The distribution of influencing factors and training results in the spatial dimension are also related. Sample optimization based on spatial correlation characteristics may provide a novel solution to improving the generalization capability.

The study presented herein aimed to effectively evaluate the spatial distribution of earthquake disaster information on each county. Earthquake-affected population is used as the index to classify the degree of earthquake disaster. The evaluation model is constructed based on the correlation characteristics of the influencing factors and BP neural network using the data from the 2013 Ms7.0 Lushan earthquake. The selection of samples was optimized based on the spatial characteristics obtained from the correlation analysis to improve the generalization capability of the network and the accuracy of the evaluation results.

2. Influencing factors of the 2013 Ms7.0 Lushan earthquake disaster

2.1. Earthquake-affected population

The Lushan Ms7.0 earthquake occurred on April 20, 2013 and the epicentre was located at 30°18’N, 103°56’E, in Lushan County, Sichuan Province, China. The focal depth of the earthquake was 13 km. The affected area was the junction of the Qinghai Tibet Plateau and the Sichuan Basin. The Lushan earthquake was caused by a tectonic activity in the Longmenshan fault zone, similar to the 2008 Ms8.0 Wenchuan earthquake. The distance between the epicentres of the Lushan earthquake and the Wenchuan earthquake was approximately 85 km. A total of 196 people were killed, 21 were missing, and 11470 were injured in the Lushan earthquake. The Lushan earthquake affected an area of 12500 km² and caused a direct economic loss of approximately 185.4 billion yuan. After the earthquake, the Sichuan Province immediately started first-level emergency procedures and sent out an army to carry out emergency rescue work.

The earthquake-affected population under the Lushan earthquake reached 3.7 million. Earthquake-affected population refers to the people that suffer property or life losses due to an earthquake. The earthquake-affected population not only reflects the severity of natural disasters, but also reveals the impact of earthquakes on people’s lives. It also provides information about the formulation of emergency rescue plans; thus, it has become an important index for evaluating damages caused by earthquakes. Referring to the National Earthquake Emergency Plan of China, we classify the earthquake disaster based on the earthquake-affected population, as listed in . Earthquake disaster is classified into four categories including particularly significant, significant, large, and general, based on the value of earthquake-affected population density (number of earthquake-affected people per square kilometer). illustrates the earthquake disaster degree of Lushan earthquake in each county-level administrative region. The data were collected and released by the Sichuan provincial government on the Internet after the earthquake (Wang and Li Citation2014). In , the color represents the earthquake disaster degree. It suggests that the earthquake disaster in Yucheng District and Mingshan County were relatively more severe, reaching “particularly significant degree”. The earthquake-affected population density in Mingshan County was the highest, reaching approximately 432 people/km². It can be observed that the region with the most serious earthquake disaster was not the region where the epicentre was located. This indicates that the impact of earthquakes on the population is complicated in space, and the epicentre is not necessarily the most severely affected area under seismic motion. A similar phenomenon was observed in the 2008 Ms8.0 Wenchuan earthquake (Yang et al. Citation2014). The casualties caused by earthquakes are related to many categories of influencing factors. The factors influencing the earthquake-affected population are divided into environmental and seismic factors.

Figure 1. Earthquake disaster degree in the 2013 Ms7.0 Lushan earthquake.

Table 1. Classification criteria of earthquake disaster degree.

Download CSV Display Table

2.2. Environmental factors

The environmental influencing factors refer to the environmental conditions in the study area, and there is no direct relationship between environmental factors and earthquake occurrence. The environmental factors considered in the research included elevation, slope angle, population density, per capita GDP, distance to fault, distance to river, and normalized difference vegetation index (NDVI). The details of the environmental influencing factors are shown in . The update date of data is before the Lushan earthquake occurrence. The maps for the elevation, slope angle, distance to fault, distance to river, and NDVI contain data that vary with spatial location. However, population density and per capita GDP had inconsistent gradations with other environmental factors. There was one attribute value of each county-level administrative region for these two factors because counties were used as the basis of the statistical analysis.

Table 2. Details of the environmental factors.

Download CSV Display Table

2.2.1. Elevation

Elevation is considered as the most important factor in the analysis of natural disaster susceptibility (Peng et al. Citation2014; Tehrany et al. Citation2015; Saha et al. Citation2021). There is also a correlation between the elevation and distribution of earthquake-affected populations. On the one hand, the population is concentrated on the plains with lower elevations; on the other hand, there is a slope amplification effect on seismic ground motion, resulting in severer geological disasters in high elevation areas (Zhang et al. Citation2018). The digital elevation model with a resolution of 30 × 30 m updated in 2009 () was obtained from the Geospatial Data Cloud site, Computer Network Information Centre, Chinese Academy of Sciences. The elevation of study area ranges from 291 m to 5631 m, with a great span. The epicentre of Lushan earthquake was located at the edge of basin.

Figure 2. Spatial distribution of environmental influencing factors.

2.2.2. Slope angle

The slope angle is a geomorphic parameter that has an important impact on seismic geological disasters, such as landslides, debris flows, and barrier lakes. In the investigation of historical strong earthquakes, it was found that large numbers of earthquake casualties were caused by geological disasters triggered by seismic motion (Xu et al. Citation2015). The slope angle map () was derived from a digital elevation model using ArcMap software. The maximum topographic gradient of the study area reaches 87.93°.

2.2.3. Population density

Population density is a key factor of the risk assessment of natural disasters. It is calculated as the ratio of the population to bare land area. Some strong earthquakes occur in mountainous areas with low population density, thus posing a relatively small threat to people’s lives and property (Ara Citation2014). Owing to the strong mobility of the population, it is difficult to obtain the spatial distribution of the population at the moment before the earthquake occurs. Therefore, the resident population of each county in the census was used to approximate the population distribution (). Population density is the ratio of the population to the area of each county. It can be noticed from that the distribution of population density in geographical space had the characteristics of less in the west and more in the east. People were mainly concentrated in the basin of low elevation. The population data were updated in 2011 and provided by the China Earthquake Network Center.

2.2.4. Per capita GDP

The per capita GDP is another crucial factor of earthquake disaster evaluations. The per capita GDP reflects the economic status of the local people, which influences the seismic resistance ability of engineering constructions. Generally, the higher the economic level, the stronger are the seismic resistance ability of the constructions. Similar to the population density map, the per capita GDP of each county is presented as per capita GDP distribution map (). The per capita GDP data were updated in 2011 and provided by the China Earthquake Network Center.

2.2.5. Distance to fault

The distance to the fault is another significant factor affecting seismic geological disasters. Generally, fractured or weak zones are located near fault bedding planes, which are susceptible to weathering and sliding (Conforti et al. Citation2014). The fault data were provided by the China Earthquake Network Center, and the distance to the fault was calculated using the ArcMap software buffers ().

2.2.6. Distance to river

The distance to the river can also influence the degree of earthquake disaster, as river erosion and soil saturation can decrease the seismic stability of slopes (Yalcin Citation2008). The river data were provided by the China Earthquake Network Center, and the distance to river was calculated using ArcMap software buffers ().

2.2.7. NDVI

The NDVI played an important role in the earthquake-affected population evaluation for the Lushan earthquake. NDVI quantifies vegetation by measuring the difference between near-infrared (strong vegetation reflection) and red light (vegetation absorption). The closer the NDVI is to +1, the better the vegetation coverage in the area, and the lower the degree of urbanization. The NDVI map was obtained from Landsat 7 ETM + satellite images acquired in 2012 from the Geospatial Data Cloud site, Computer Network Information Center, Chinese Academy of Sciences ().

2.3. Seismic factors

The seismic influencing factors refer to the elements and characteristics that are directly related to earthquake occurrence, and can be rapidly measured using seismic motion monitoring instruments after earthquake occurrence. The seismic influencing factors considered in this research included peak ground acceleration (PGA), peak ground velocity (PGV), and distance to the epicentre. The data of seismic influencing factors are shown in . The resolution of PGA and PGV data is 30 × 30 m.

Table 3. Details of the seismic factors.

Download CSV Display Table

2.3.1. PGA

The PGA distribution map is the most commonly used parameter to describe the seismic ground motion intensity of an earthquake (Boatwright et al. Citation2003; Yuan et al. Citation2013). PGA represents the peak value of the acceleration time-history waveform recorded on the ground surface during an earthquake occurrence. It can be considered as the maximum instantaneous force exerted by an earthquake and can effectively evaluate the intensity of seismic ground motion at different positions in space. The PGA data are recorded by a strong-motion seismograph network shortly after earthquake occurrence. The data used in this research was provided by the China Earthquake Network Center ().

Figure 3. Spatial distribution of seismic influencing factors.

2.3.2. PGV

The PGV is also an important index for evaluating the intensity of seismic ground motion. The acceleration time-history wave can miss some information, such as low-frequency components. The velocity time-history wave can better record this information. Therefore, PGV data are used to comprehensively evaluate the seismic ground motion intensity. The PGV data used in this study were provided by the China Earthquake Network Center ().

2.3.3. Distance to epicentre

The distance to the epicentre is used to measure the relative distance between the study region and the epicentre. In previous studies, it was found that with an increase in distance to the epicentre, the impacts of earthquake disasters were gradually reduced. The distance to the epicentre was calculated using the ArcMap software buffers ().

3. Spatial correlation characteristics of the influencing factors

Earthquake-affected population is related to environmental and seismic factors. The spatial distribution variability of the influencing factors leads to differences in earthquake-affected populations in different counties. The Spearman rank correlation coefficients were calculated to analyse the relationship between the earthquake-affected population and influencing factors, which can be expressed as follows: (1) $r_{s} = 1 - \frac{6 \sum_{i = 1}^{n} D_{i}^{2}}{n (n^{2} - 1)}$ (1) where Di denotes the ranking difference between the earthquake-affected population and factor, and n is the number of samples. The coefficient measures the degree of consistency and describes the strength of monotonicity between the earthquake-affected population and influencing factors (Peng Citation2015; Shahaki Kenari and Celikag Citation2019). The coefficient ranges from 1 to −1. The data are positively correlated when the coefficient is positive. The closer the absolute value of the coefficient to 1, the stronger is the correlation between the data.

In the study region shown in , 1000 sampling points were randomly generated and distributed using ArcMap software. The values of earthquake-affected population density and influencing factor data at the sampling points were extracted to construct a database for correlation analysis. The Spearman correlation coefficient and results of the test of significance between earthquake-affected population and environmental, seismic influencing factors are listed in .

Table 4. Spearman correlation coefficients.

Download CSV Display Table

3.1. Correlation analysis between earthquake-affected population and environmental factors

The results of the test of significance between earthquake-affected population and environmental factors were all less than 0.05. The results showed that the number of samples was reasonable, and the value of the correlation coefficient was acceptable. The highest correlation coefficient was −0.322 between the earthquake-affected population density and per capita GDP, indicating that in the regions with high per capita GDP, the earthquake-affected populations were generally low. The minimum value of the coefficient was −0.073. The values of the correlation coefficients indicate that none of the environmental factors had a direct linear relationship with earthquake-affected population density. The earthquake-affected population density distribution was the result of multiple environmental factors. The highest correlation coefficient within environmental factors was −0.693 between population density and elevation, indicating the similarity between these two factors of data variation trend. Under the condition of insufficient data, elevation influencing factors can be omitted to reduce the number of input variables.

The spatial correlations between the earthquake-affected population density and environmental factors are shown in . The histogram statistics the average earthquake-affected population density of different factor intervals, which reflect the related characteristics between the spatial distribution of earthquake-affected population density and influencing factors. It can be seen in that for various factors, the average earthquake-affected population density of different intervals was discrete. However, a relatively high earthquake-affected population density was concentrated on a specific range of certain factors. For example, in the 2013 Lushan earthquake, the area between 500 and 1200 m elevation had a relatively high earthquake-affected population density (). For the relationship between earthquake-affected population density and distance to river, the results showed that the maximum earthquake-affected population density was in the interval closest to the river ().

Figure 4. Correlation between earthquake-affected population density and environmental influencing factors.

3.2. Correlation analysis between earthquake-affected population and seismic factors

In , the results of the test of significance between earthquake-affected population and seismic factors were all less than 0.05. The correlation coefficient between the earthquake-affected population density and PGA had the highest value of 0.433. The results show that the earthquake-affected population density had a remarkable PGA positive correlation and that the higher the PGA, the greater the earthquake-affected population. The highest correlation coefficient within seismic factors was −0.878 between distance to epicentre and PGV, indicating PGV factor can be replaced by distance to epicentre factor to some extent when seismic data are insufficient. The correlation coefficient value of environmental and seismic factors were all less than 0.35, indicating the significant difference in two categories of factors and the rationality of classification.

The spatial correlations between the earthquake-affected population density and the seismic factors are illustrated in . It was found that PGA had a stronger positive correlation with the earthquake-affected population density. The earthquake-affected population density of each interval generally increased with the PGA value. However, when the PGA value ranged from 750 m/s² to 850 m/s², the earthquake-affected population density had a relatively lower value (). This was because the area with high seismic motion intensity had a low population density. Therefore, the population density affected by the earthquake was relatively low.

Figure 5. Correlation between earthquake-affected population density and seismic influencing factors.

The correlation coefficient is a statistical index that reflects the degree of linear correlation between the variables. The results of the correlation analysis indicate that there was a nonlinear relationship between the earthquake-affected population and the various influencing factors in the spatial distribution. The correlation coefficients imply the existence of a spatial correlation between the factors and disaster results. For example, the earthquake-affected population density was higher in the area with greater seismic motion. Nevertheless, the values of the correlation coefficients indicate that the earthquake-affected population was a result of the complex interaction of multiple factors. It was difficult to evaluate the spatial earthquake-affected population with a linear relationship.

4. Spatial evaluation of earthquake disaster using BP neural network

The flow chart in shows the entire model frame work for earthquake disaster evaluation. A study process including the phases as here data collection, correlation analysis, samples optimization, network training, and model verification is followed.

Figure 6. Model framework for earthquake disaster evaluation.

4.1. BP neural network

Artificial neural networks have the powerful ability of data processing by imitating the information transmission principle of biological neuron (Wang et al. Citation2016). The back propagation (BP) neural network is a common model of artificial neural networks for weight training and is widely applied in forecasting. It can perfectly reflect the mapping relationship between neurons and arbitrarily approximate nonlinear functions (Wang et al. Citation2018). The BP neural network consists of input layer, hidden layer, and output layer, and each layer comprises several neurons. The mapping relationship between the input and output layers is jointly determined by the activation function and the threshold in the hidden layer. In this research, the input layer contained the environmental and seismic influencing factors of the earthquake-affected population, and the output layer calculated the earthquake-affected population density (). Moreover, the hidden layer was constructed through model training based on sample data from the 2013 Ms7.0 Lushan earthquake. The network building process was divided into three steps: forward calculation, error back propagation, and weight update (Wen and Yuan Citation2020). The sigmoid was selected as the excitation function, and the gradient descent algorithm was used to obtain the best solution of the network; the learning rate was 0.02.

Figure 7. BP neural network structure diagram.

4.2. Sample optimization selection based on correlation characteristics

Generalization capability is an important indicator for measuring the accuracy of a neural network in predicting data outside the samples. To effectively evaluate the spatial distribution of the earthquake-affected population in a newly occurring earthquake, it is necessary to ensure that the network has an acceptable generalization capability. Training samples have a significant impact on generalization capability (Partridge Citation1996); thus, the selection of samples is optimized based on the results of correlation analysis.

In network training, the vector images were transformed into raster layer data to standardize the format of the influencing factor and earthquake-affected population data. There were more raster-based samples than influencing factors, which resulted in over-fitting of the neural network. In practical applications, a part of all samples is randomly selected as the training set. However, random selection of samples could lose part of the spatial characteristics of all sample data in the training process, thus, reducing the generalization capability and evaluation accuracy of the network. Based on the correlation between the influencing factors and earthquake-affected population density, it was found that compared with other factors, there was a stronger correlation between per capita GDP, PGA, and the earthquake-affected population density. This implies that the areas with lower per capita GDP and higher PGA had higher earthquake-affected populations, which was the key study area for the evaluation of earthquake disaster. The per capita GDP and PGA indicators indirectly reflected, to some extent, the spatial distribution characteristics of the earthquake-affected population. Therefore, more samples were selected in the area with lower per capita GDP and higher PGA, and fewer in the area with higher per capita GDP and lower PGA to consider the spatial variability of the earthquake disaster.

shows the frequency histograms of the per capita GDP and PGA factors. The frequency of raster was approximately normal to per capita GDP, and the frequency of raster decreased with an increase in the PGA value. This indicates that the areas with poor economy and intense seismic motion were smaller than those with strong economy and weak motion. Although the area was small, it was the predominant area for disaster assessment and emergency rescue during earthquakes. The per capita GDP and PGA data were classified into five clusters based on the value using the natural breaks classification method. The natural breaks classification method is an extensively applied clustering method for maximizing the internal similarity of each cluster and the difference between clusters. The proportion of samples was determined based on the average value of the clusters, as listed in . The number of samples was proportional to the average number of clusters. A total of 1000 sample points were generated in the study area to extract attribute values from the raster layers. The distribution comparison between the random sample points and optimized sample points is shown in . Based on the result of the clustering analysis, in the area with low per capita GDP values and high PGA values, the sample points were denser. The samples were optimized based on the numerical and spatial characteristics of per capita GDP and PGA indicators.

Figure 8. Frequency histogram of influencing factor.

Figure 9. Comparison between random sample points and optimizing sample points.

Table 5. Range of PGA clusters and number of sample points.

Download CSV Display Table

4.3. Earthquake-affected population evaluation results

One hidden layer is sufficient for most applications (Aghamohammadi et al. Citation2013). Therefore, a three-layer network containing one input layer, one hidden layer, and one output layer was adopted. Two networks were trained using random and optimized samples. To ensure scientific evaluation, 70% of the sample data were used as the training dataset and 30% as the testing dataset. The testing dataset was used to test the classification ability of the trained network. The stopping criterion of the training dataset was 0.0003. When the error of the training set during iteration was less than 0.0003, the training process stopped.

shows the number of iteration number and root mean square error (RMSE) of the two networks based on random and optimized samples. The mathematical expression of the RMSE is given by the following equation: (2) $RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(t_{i} - y_{i})}^{2}}$ (2) where t_i is the evaluation result of the earthquake-affected population, and y_i is the actual data of the earthquake-affected population. Networks with different numbers of neurons in the hidden layer were trained. In , the horizontal axis represents the number of neurons, the left vertical axis represents the number of iterations required for the error of the training dataset to reach the goal error, and the right vertical axis is the RMSE of the testing dataset. Moreover, the blue curve represents the results from the random samples, and the red curve represents the results from the optimized samples. It can be seen in the figure that in networks with different numbers of neurons, the number of iterations based on the optimized samples was less than that of the random samples. This indicates that the optimized samples accelerated the convergence speed. However, comparing the RMSE of the networks, it can be noted that the RMSE of the optimized samples was smaller than that of the random samples, except when the number of neurons was 13. When the number of neurons in the hidden layer was 13, the RMSEs of the two networks were nearly similar as shown in . The RMSEs measures the difference between the estimated values and testing data and reflects the generalization capability and evaluation accuracy of the network for new data. This implies that the evaluation of the earthquake-affected population based on optimized samples not only had a faster convergence speed, but also had better generalization capability and prediction accuracy compared to random samples.

Figure 10. Results of iteration number and RMSE of network models.

4.4. Verification of network generalization capability

The data used in the paper are from the open database on the internet and disaster report of government. Since the training data of network model are all from Lushan earthquake disaster, the application scenario of the model should have condition factors similar to Lushan earthquake, like similar environment and seismic factors. The Jiuzhaigou earthquake in 2017 occurred in Sichuan Province just like Lushan earthquake, and the earthquake affected areas had similar geographical characteristics. Moreover, the magnitude of these two earthquakes was the same Ms7.0. Therefore, the evaluation accuracy of the trained network, which was obtained using the samples of the 2013 Ms7.0 Lushan earthquake, was verified by evaluating the earthquake-affected population of the 2017 Ms7.0 Jiuzhaigou earthquake.

The Jiuzhaigou Ms7.0 earthquake happened on 8 August 2017, and its epicentre was located at 33°12'N, 103°49'E, in Jiuzhaigou County, Sichuan Province, China. The epicentre of the Jiuzhaigou earthquake was approximately 330 km from that of the Lushan earthquake. The focal depth of the Jiuzhaigou earthquake was 20 km. The Jiuzhaigou earthquake killed 25 people, injured 525 people, and damaged approximately 70,000 houses. The earthquake-affected population was approximately 220,000. The areas severely affected by the earthquake mainly included Hongyuan County, Jiuzhaigou County, Pingwu County, Songpan County, and Zoige County. The earthquake disaster degree in each county-level administrative region collected by the Sichuan provincial government is shown in .

Figure 11. Earthquake disaster degree in the 2017 Ms7.0 Jiuzhaigou earthquake.

The networks, which were based on the optimized and random samples of the Lushan earthquake, were used to evaluate the earthquake-affected population of the Jiuzhaigou earthquake. When the neuron number of hidden layer was 15, the RMSE value of trained network reached the minimum. Therefore, a hidden layer containing 15 neurons was adopted in network generalization capacity verification test. In addition to neuron network, a multiple linear regression model was also used for earthquake-affected population calculation in Jiuzhaigou earthquake, to compare the performance of network model with that of simple classic model. The output of the network was a raster layer with earthquake-affected population density data, which varied with the spatial position. The average value of the earthquake-affected population density raster data within the county area was calculated as the earthquake-affected population density of the county.

The actual data and evaluation results of the earthquake-affected population are listed in . The mean absolute error (MAE), described in EquationEq. (3)(3) $MAE = \frac{1}{n} (\sum_{i = 1}^{n} | t_{i} - y_{i} |)$ (3) , was calculated to assess the evaluation results. (3) $MAE = \frac{1}{n} (\sum_{i = 1}^{n} | t_{i} - y_{i} |)$ (3)

Table 6. Actual data and evaluation results of earthquake-affected population of the Jiuzhaigou earthquake.

Download CSV Display Table

The MAE of the earthquake-affected populations using optimized, random samples, and regression model were 1.276, 3.829, and 19.385 people/km², respectively. It suggests that the average error of earthquake-affected population density for each county was 1.276 people/km², and the neuron network model could provide effective results for spatial assessment of earthquake disaster. The accuracy of network model is higher than that of simple regression model, and the accuracy of network model based on optimized samples is higher than that based on random samples. A comparison between the actual and estimated earthquake-affected populations for each county is shown in . The histogram represents the actual data and evaluation results, and the curve represents the error rate of the evaluation results. The error rate is expressed as follows: (4) $Error rate = \frac{| t - y |}{t}$ (4)

Figure 12. Comparison of actual data and evaluation results of earthquake-affected population.

The error rate is defined as the ratio of the difference between the actual value and the result to the actual value. It can be seen in the figure that the error rate was relatively large for the Hongyuan County. The population density of Hongyuan County was the smallest one of the affected areas of Jiuzhaigou earthquake, with 5.45 people/km². However, in training data of Lushan earthquake disaster, the smallest population density was 11.20 people/km² (), larger than 5.45 people/km². Therefore, the training set of network model lacks data with such small population density, resulting in relatively inaccurate evaluation results in area having small population density. For Jiuzhaigou County, Pingwu County, and Songpan County, where the earthquake-affected populations were relatively large, the error rate of the evaluation results based on the BP neural network was less than 0.2. For Hongyuan County and Zoige County, the error rates were relatively high. The maximum error rate based on the optimized and random samples were 4.897 and 10.491, respectively, and that based on the linear regression model was 25.884. It can be seemed that in each area, the error rate of the optimized samples was smaller than that of the random samples and regression model. The earthquake-affected population evaluation based on the optimized samples had a more accurate prediction for new data. This indicates that optimized samples can effectively provide a more accurate evaluation of earthquake-affected population.

5. Conclusions

In the present study, a spatial evaluation model of earthquake-affected population was proposed based on the correlation characteristics of influencing factors and the BP neural network. The main conclusions are as follows:

The influencing factors of the earthquake-affected population are classified into environmental and seismic factors. Elevation, slope angle, population density, per capita GDP, distance to fault, distance to river, and NDVI were considered as environmental factors, and PGA, PGV, and distance to the epicentre were considered as seismic factors. The correlation analysis between the earthquake-affected population and influencing factors indicates that per capita GDP and PGA had a stronger correlation with the earthquake-affected population of the Lushan earthquake. There was a substantial nonlinear relationship between the earthquake-affected population and various influencing factors.
The samples had a significant impact on the generalization capability and evaluation accuracy of the neural network. The samples were optimized according to the spatial distribution of per capita GDP and PGA based on the correlation characteristics. In the area with lower per capita GDP and higher PGA, more sample points were generated and distributed based on the correlation between per capita GDP, PGA, and earthquake-affected population. Compared to the random samples, the optimized samples effectively improved the convergence speed and generalization capability of the trained network. In networks with different numbers of neurons, the number of iterations based on the optimized samples was less than that of the random samples. The network trained using the optimized samples, considering the spatial characteristics, had a more accurate prediction ability.
A BP neural network was established using the influencing factors as input indicators based on the data from the Lushan earthquake. The trained network was applied to the Jiuzhaigou earthquake to test its generalization capability and prediction accuracy. The results show that the neural network had good prediction accuracy for the spatial evaluation in the study area. The MAE of earthquake-affected population of the five counties affected by the Jiuzhaigou earthquake based on the optimized samples was 1.276 people/km². The BP neural network could construct complex nonlinear relations to evaluate earthquake-affected populations. The trained network can offer a spatial evaluation of earthquake-affected populations as well as other earthquake disaster information immediately after the occurrence of an earthquake, providing a significant information on emergency rescue.

Disclosure statement

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. We do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Data availability statement

The processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

Additional information

Funding

This work is supported by the Scientific Research Fund of Institute of Engineering Mechanics, China Earthquake Administration (Grant No. 2021EEEVL0209) and the fund of Research on key technologies of emergency response in Sichuan Seismic Risk Area (201902).

References

Aghamohammadi H, Mesgari MS, Mansourian A, Molaei D. 2013. Seismic human loss estimation for an earthquake disaster using neural network. Int J Environ Sci Technol. 10(5):931–939.
Web of Science ®Google Scholar
Ara S. 2014. Impact of temporal population distribution on earthquake loss estimation: a case study on Sylhet, Bangladesh. Int J Disaster Risk Sci. 5(4):296–312.
Web of Science ®Google Scholar
Azar NA, Milan SG, Kayhomayoon Z. 2021. The prediction of longitudinal dispersion coefficient in natural streams using LS-SVM and ANFIS optimized by Harris hawk optimization algorithm. J Contam Hydrol. 240:103781.
PubMed Web of Science ®Google Scholar
Basharat M, Ali A, Jadoon IAK, Rohn J. 2016. Using PCA in evaluating event-controlling attributes of landsliding in the 2005 Kashmir earthquake region, NW Himalayas, Pakistan. Nat Hazards. 81(3):1999–2017.
Web of Science ®Google Scholar
Boatwright J, Bundock H, Luetgert J, Seekins L, Gee L, Lombard P. 2003. The dependence of PGA and PGV on distance and magnitude inferred from Northern California ShakeMap Data. Bull Seismol Soc Am. 93(5):2043–2055.
Web of Science ®Google Scholar
Büyüksaraç A, Işık E, Harirchian E. 2021. A case study for determination of seismic risk priorities in Van (Eastern Turkey). Earthq Struct. 20(4):445–455.
Web of Science ®Google Scholar
Choubin B, Mosavi A, Alamdarloo EH, Hosseini FS, Shamshirband S, Dashtekian K, Ghamisi P. 2019. Earth fissure hazard prediction using machine learning models. Environ Res. 179(Pt A):108770.
PubMed Web of Science ®Google Scholar
Conforti M, Pascale S, Robustelli G, Sdao F. 2014. Evaluation of prediction capability of the artificial neural networks for mapping landslide susceptibility in the Turbolo River catchment (northern Calabria, Italy). Catena. 113:236–250.
Web of Science ®Google Scholar
Erdik M, Şeşetyan K, Demircioğlu MB, Hancılar U, Zülfikar C. 2011. Rapid earthquake loss assessment after damaging earthquakes. Soil Dyn Earthquake Eng. 31(2):247–266.
Web of Science ®Google Scholar
Gao S, Zhao H, Bai Z, Han B, Xu J, Zhao R, Zhang N, Chen L, Lei X, Shi W, et al. 2020. Combined use of principal component analysis and artificial neural network approach to improve estimates of PM2.5 personal exposure: a case study on older adults. Sci Total Environ. 726:138533.
PubMed Web of Science ®Google Scholar
Gul M, Guneri AF. 2016. An artificial neural network-based earthquake casualty estimation model for Istanbul city. Nat Hazards. 84(3):2163–2178.
Web of Science ®Google Scholar
Harirchian E, Lahmer T. 2020. Improved rapid assessment of earthquake hazard safety of structures via artificial neural networks. IOP Conf Ser: mater Sci Eng. 897(1):012014.
Google Scholar
Harirchian E, Aghakouchaki Hosseini SE, Jadhav K, Kumari V, Rasulzade S, Işık E, Wasif M, Lahmer T. 2021. A review on application of soft computing techniques for the rapid visual safety evaluation and damage classification of existing buildings. J Build Eng. 43:102536.
Web of Science ®Google Scholar
Hou P, Jolliet O, Zhu J, Xu M. 2020. Estimate ecotoxicity characterization factors for chemicals in life cycle assessment using machine learning models. Environ Int. 135:105393.
PubMed Web of Science ®Google Scholar
Huang C, Huang Y. 2018. An information diffusion technique to assess integrated hazard risks. Environ Res. 161:104–113.
PubMed Web of Science ®Google Scholar
Huang RQ, Li WL. 2014. Post-earthquake landsliding and long-term impacts in the Wenchuan earthquake area, China. Eng Geol. 182:111–120.
Web of Science ®Google Scholar
Huang X, Zhou Z, Wang S. 2015. The prediction model of earthquake casuailty based on robust wavelet v-SVM. Nat Hazards. 77:717–732.
Web of Science ®Google Scholar
Huang X, Song J, Jin H. 2020. The casualty prediction of earthquake disaster based on extreme learning machine method. Nat Hazards. 102(3):873–886.
Web of Science ®Google Scholar
Jena R, Pradhan B, Beydoun G, Alamri AM, Sofyan H, Ardiansyah Nizamuddin. 2020. Earthquake hazard and risk assessment using machine learning approaches at Palu, Indonesia. Sci Total Environ. 749:141582.
PubMed Web of Science ®Google Scholar
Lara A, Garcia X, Bucci F, Ribas A. 2017. What do people think about the flood risk? An experience with the residents of Talcahuano city, Chile. Nat Hazards. 85(3):1557–1575.
Web of Science ®Google Scholar
Li X, Cheng X, Wu W, Wang Q, Tong Z, Zhang X, Deng D, Li Y. 2020. Forecasting of bioaerosol concentration by a back propagation neural network model. Sci Total Environ. 698:134315.
PubMed Web of Science ®Google Scholar
Lou WP, Chen HY, Qiu XF, Tang QY, Zheng F. 2012. Assessment of economic losses from tropical cyclone disasters based on PCA-BP. Nat Hazards. 60(3):819–829.
Web of Science ®Google Scholar
Luo L, Lombardo L, van Westen C, Pei X, Huang R. 2022. From scenario-based seismic hazard to scenario-based landslide hazard: rewinding to the past via statistical simulations. Stoch Environ Res Risk Assess. 36(8):2243–2264.
Web of Science ®Google Scholar
Luo Z, Huang F, Liu H. 2020. PM2.5 concentration estimation using convolutional neural network and gradient boosting machine. J Environ Sci (China). 98:85–93.
PubMedGoogle Scholar
Lythgoe K, Loasby A, Hidayat D, Wei S. 2021. Seismic event detection in urban Singapore using a nodal array and frequency domain array detector: earthquakes, blasts and thunderquakes. Geophys J Int. 226(3):1542–1557.
Web of Science ®Google Scholar
Partridge D. 1996. Network generalization differences quantified. Neural Netw. 9(2):263–271.
Web of Science ®Google Scholar
Peng L, Niu R, Huang B, Wu X, Zhao Y, Ye R. 2014. Landslide susceptibility mapping based on rough set theory and support vector machines: a case of the Three Gorges area, China. Geomorphology. 204:287–301.
Web of Science ®Google Scholar
Peng Y. 2015. Regional earthquake vulnerability assessment using a combination of MCDM methods. Ann Oper Res. 234(1):95–110.
Web of Science ®Google Scholar
Pourghasemi HR, Gayen A, Panahi M, Rezaie F, Blaschke T. 2019. Multi-hazard probability assessment and mapping in Iran. Sci Total Environ. 692:556–571.
PubMed Web of Science ®Google Scholar
Rossetto T, Peiris N, Pomonis A, Wilkinson SM, Del Re D, Koo R, Gallocher S. 2007. The Indian Ocean tsunami of December 26, 2004: observations in Sri Lanka and Thailand. Nat Hazards. 42(1):105–124.
Web of Science ®Google Scholar
Saha S, Arabameri A, Saha A, Blaschke T, Ngo PTT, Nhu VH, Band SS. 2021. Prediction of landslide susceptibility in Rudraprayag, India using novel ensemble of conditional probability and boosted regression tree-based on cross-validation method. Sci Total Environ. 764:142928.
PubMed Web of Science ®Google Scholar
Santos-Reyes J, Gouzeva T. 2020. Mexico city’s residents emotional and behavioural reactions to the 19 September 2017 earthquake. Environ Res. 186:109482.
PubMed Web of Science ®Google Scholar
Satish D, Prakash EL, Anand KB. 2021. Earthquake vulnerability of city regions based on building typology: rapid assessment survey. Asian J Civ Eng. 22(4):677–687.
Google Scholar
Shahaki Kenari M, Celikag M. 2019. Correlation of ground motion intensity measures and seismic damage indices of masonry-infilled steel frames. Arab J Sci Eng. 44(5):5131–5150.
Web of Science ®Google Scholar
Shimada N. 2016. Outline of the Great East Japan Earthquake. In Urabe J., Nakashizuka T, editors. Ecological Impacts of Tsunamis on Coastal Ecosystems. Ecological Research Monographs. Springer, Tokyo.
Google Scholar
Si M, Du K. 2020. Development of a predictive emissions model using a gradient boosting machine learning method. Environmental Technology & Innovation. 20:101028.
Web of Science ®Google Scholar
Sun L, Chen J, Li T. 2016. A MODIS-based method for detecting large-scale vegetation disturbance due to natural hazards: a case study of Wenchuan earthquake stricken regions in China. Stoch Environ Res Risk Assess. 30(8):2243–2254.
Web of Science ®Google Scholar
Tehrany MS, Pradhan B, Jebur MN. 2015. Flood susceptibility analysis and its verification using a novel ensemble support vector machine and frequency ratio method. Stoch Environ Res Risk Assess. 29(4):1149–1165.
Web of Science ®Google Scholar
Wang S, Li D. 2014. ArcGIS-based system analysis of building damage from the Ms7.0 Lushan earthquake. Earthq Research in Sichuan. 2(151):1–5. (in Chinese)
Google Scholar
Wang S, Zhang N, Wu L, Wang Y. 2016. Wind speed forecasting based on the hybrid ensemble empirical mode decomposition and GA-BP neural network method. Renew Energy. 94:629–636.
Web of Science ®Google Scholar
Wang XP, Zhang F, Ding JL, Kung HT, Latif A, Johnson VC. 2018. Estimation of soil salt content (SSC) in the Ebinur Lake Wetland National Nature Reserve (ELWNNR), Northwest China, based on a Bootstrap-BP neural network model and optimal spectral indices. Sci Total Environ. 615:918–930.
PubMed Web of Science ®Google Scholar
Wen L, Yuan X. 2020. Forecasting CO2 emissions in Chinas commercial department, through BP neural network based on random forest and PSO. Sci Total Environ. 718:137194.
PubMed Web of Science ®Google Scholar
Wu Q, Wu J, Gao M. 2020. Correlation analysis of earthquake impacts on a nuclear power plant cluster in Fujian province, China. Environ Res. 187:109689.
PubMed Web of Science ®Google Scholar
Xu C, Xu X, Shyu JBH, Gao M, Tan X, Ran Y, Zheng W. 2015. Landslides triggered by the 20 April 2013 Lushan, China, Mw 6.6 earthquake from field investigations and preliminary analyses. Landslides. 12(2):365–385.
Web of Science ®Google Scholar
Yalcin A. 2008. GIS-based landslide susceptibility mapping using analytical hierarchy process and bivariate statistics in Ardesen (Turkey): comparisons of results and confirmations. Catena. 72(1):1–12.
Web of Science ®Google Scholar
Yang J, Chen J, Liu H, Zheng J. 2014. Comparison of two large earthquakes in China: the 2008 Sichuan Wenchuan Earthquake and the 2013 Sichuan Lushan Earthquake. Nat Hazards. 73(2):1127–1136.
Web of Science ®Google Scholar
Yang ZH, Lan HX, Gao X, Li LP, Meng YS, Wu YM. 2015. Urgent landslide susceptibility assessment in the 2013 Lushan earthquake-impacted area, Sichuan Province, China. Nat Hazards. 75(3):2467–2487.
Web of Science ®Google Scholar
Yuan RM, Deng QH, Cunningham D, Xu C, Xu XW, Chang CP. 2013. Density Distribution of landslides triggered by the 2008 Wenchuan earthquake and their relationships to peak ground acceleration. Bull Seismol Soc Am. 103(4):2344–2355.
Web of Science ®Google Scholar
Zhang Z, Fleurisson JA, Pellet F. 2018. The effects of slope topography on acceleration amplification and interaction between slope topography and seismic input motion. Soil Dyn Earthquake Eng. 113:420–431.
Web of Science ®Google Scholar
Zhao J, Ding F, Wang Z, Ren J, Zhao J, Wang Y, Tang X, Wang Y, Yao J, Li Q. 2018. A rapid public health needs assessment framework for after major earthquakes using high-resolution satellite imagery. IJERPH. 15(6):1111.
Web of Science ®Google Scholar

A spatial evaluation method for earthquake disaster using optimized BP neural network model

Abstract

1. Introduction