2,898
Views
6
CrossRef citations to date
0
Altmetric
Article

Machine learning remote sensing using the random forest classifier to detect the building damage caused by the Anak Krakatau Volcano tsunami

, , , , , , , & show all
Pages 28-51 | Received 24 May 2022, Accepted 09 Nov 2022, Published online: 07 Dec 2022

Abstract

In Indonesia, tsunamis are frequent events. In 2000–2016, there were 44 tsunami events in Indonesia, with financial losses reaching 43.38 trillion. In 2018, a tsunami occurred in the Sunda Strait due to the eruption of the Anak Krakatau Volcano, which caused many fatalities and much building damage. This study aimed to detect the building damage in the Labuan District, Banten Province. Machine learning methods were used to detect building damage using random forest with object-based techniques. No previous research has combined selected predictors into scenarios; hence, the novelty of this study is combining various random forest predictors to identify the extent of building damage using 14 predictor scenarios. In addition, field surveys were conducted two years and nine months after the tsunami to observe the changes and efforts made. The results of the random forest classification were validated and compared with three datasets, namely xBD, Copernicus, and field survey data. The results of this study can help classify the level of building damage using satellite imagery to improve mitigation in tsunami-prone areas.

Graphical Abstract

1. Introduction

Tsunamis are series of waves that cause a large vertical transfer of water mass in a short period because of seafloor movements caused by underwater earthquakes, volcanic eruptions, and landslides in the sea (NOAA, n.d.; Cartwright and Nakamura Citation2008; Pattiaratchi Citation2014). Two-thirds of Indonesia’s area is occupied by water and the country also has the second-longest coastline in the world (Ministry of Marine Affairs and Fisheries Citation2019). This makes tsunamis one of the main disaster threats that occur off the coast of Indonesia. Of the several large tsunamis and earthquakes Indonesia has experienced, 90% have been caused by incidents at sea, 9% by volcanic eruptions, and 1% by underwater landslides (Hamzah et al. Citation2000).

Indonesia’s many historical tsunami records are collated by the National Agency for Disaster Countermeasure (The National Agency for Disaster Countermeasure Citation2016). For example, the Indian Ocean tsunami that occurred on December 26, 2004 around Aceh caused many fatalities and physical damage such as damage to buildings. An enormous tsunami also occurred in the Sunda Strait, resulting from the eruption of the Anak Krakatau Volcano on August 27, 1883, which resulted in 36,000 deaths and other infrastructure damage (The National Agency for Disaster Countermeasure Citation2016). Another tsunami caused by the eruption of the Anak Krakatau Volcano occurred on December 22, 2018, which resulted in 222 deaths, 843 injuries, and 28 missing people (Ministry of Energy and Mineral Resources Republic of Indonesia Citation2018). Damage to infrastructure and buildings also occurred in Pandeglang Regency and South Lampung Regency, where 556 houses were damaged, nine hotels were heavily affected, and 60 stalls were destroyed (Ministry of Energy and Mineral Resources Republic of Indonesia Citation2018).

The foregoing shows that tsunamis are dangerous disasters in Indonesia that can cause many losses, especially to buildings (Westen et al. Citation2009). One way to assess the potential losses from a disaster is by analyzing damage to buildings. Detecting such damage can be a first step toward recovery or reconstruction from a tsunami through tsunami mitigation efforts (Meilano et al. Citation2020). To support these activities, spatial data are needed to represent objects in the real world by referring to specific positions in Earth’s space (Burrough et al. Citation2015; Saha and Frøyen Citation2021). Spatial data provide the coordinate references of a certain location. The location affected by the tsunami can then be analyzed to minimize losses and speed the reconstruction of affected buildings. Spatial data must also adjust the dimensions and details of tsunami hazards by using data acquisition technology (Kerle and Damen Citation2009).

Various spatial data acquisition techniques can be used to detect post-tsunami building damage, such as field surveys, photogrammetry, and remote sensing (Kerle et al. Citation2019; Monfort et al. Citation2019; Koshimura et al. Citation2020). First, field surveys provide spatial data by making observations and taking measurements directly in the field (Suppasri et al. Citation2012; Hancilar et al. Citation2013). However, mapping the damage to each building using the field surveying method is time consuming. Second, aerial photography is a data acquisition technique that records Earth’s surface using aircraft such as drones (Aber et al. Citation2010; Ebert Citation2015). Several factors must be considered when acquiring aerial photo data, including the weather conditions during recording, because these can affect the accuracy of the data generated to classify building damage (Kedzierski et al. Citation2019; Naito et al. Citation2020; Wang et al. Citation2022). However, aerial photo acquisition techniques can quickly produce spatial data with a high enough resolution (DeFries and Levin SABT-E of B (Second E Citation2013; Zhang et al. Citation2020). Third, technology data acquisition with remote sensing records Earth’s surface using satellite vehicles (Aggarwal Citation2004). While using satellite imagery technology can cover a large area in a short time, the spatial resolution is much lower (Dong and Shan Citation2013). Hence, the use of satellite imagery must consider the spatial, spectral, and temporal resolution to obtain spatial data (Kerle and Damen Citation2009). Another advantage of satellite imagery is that it can analyze building damage in more detail by combining two-dimensional resolution imagery to improve the spatial and spectral imagery (Afida et al. Citation2020).

With the development of remote sensing technology, remote sensing can produce high-resolution imagery containing information that is more detailed and accurate. This allows us to detect building damage using high-resolution imagery (Meslem et al. Citation2011; Adriano et al. Citation2019; Moya et al. Citation2020). In line with these developments, classification methods using machine learning and deep learning can be employed to classify building damage, such as support vector machine, convolutional neural networks, artificial neural networks, and random forest (Radhika et al. Citation2015; Cooner et al. Citation2016; Khodaverdizahraee et al. Citation2020; Yeum et al. 2018; Gao et al. 2018; Cha et al. 2017). Machine learning and deep learning can be used to automatically analyze a certain amount of data (The Parliamentary Office of Science and Technology Citation2020). With machine learning, image classification can be automated to save time and avoid human error (Corbett-Davies and Goel Citation2018).

In this study, the machine learning method used to detect damage to buildings is random forest. Random forest is a classification method developed from bagging techniques that use the ensemble learning approach of Leo Breiman (Cutler et al. Citation2012; Liu et al. Citation2012). It consists of a group of decision trees that are free or independent to improve the accuracy of the classification of an object (Breiman Citation2001). The random forest classification process provides more accurate models than other methods (Fernández-Delgado et al. Citation2014). Random forest can also handle data with many predictor variables. However, to increase process efficiency, only the essential predictor variables are needed to construct the classification models (Speiser et al. Citation2019). Random forest has other advantages, as it maintains overfitting, offers non-linear processing data, provides stability against outliers, and offers opportunities for more efficient parallel processing. These advantages make it suitable for modeling using high-dimensional data (i.e. large databases), as random forest can handle thousands of variable inputs without omitting variables and is thus less computationally burdensome (Qi 2012; Rodriguez-Galiano et al. 2012; Sarica et al. 2017).

Therefore, this study compares random forest with other machine learning and deep learning methods to integrate each predictor into high-resolution images to detect building damage. In this process, random forest takes random samples, while the predictor variables are randomly selected to be the best sorter when determining the decision tree (Belgiu and Drăguţ Citation2016). Hence, in random forest, each decision tree model is trained based on a dataset that contains samples of data and randomly retrieved predictor variables.

Several previous studies have developed classifications of post-disaster building damage. First, the research conducted by Naito (Naito et al. Citation2020) used two models (i.e. the bag of visual word model and CNN model) to classify building damage into four levels resulting from the interpretation of the orthorectification aerial photos. However, this study’s use of data obtained from aerial photos reduces its accuracy because the data retrieval is affected by the weather conditions and building characteristics.

Second, (Widyatmanti Citation2020) compared the effect of imagery using feature-level fusion with satellite imagery data (Sentinel-1 and Sentinel-2) when classifying building damage caused by earthquakes. This study showed that the fusion method is more accurate overall than using Sentinel-1 or Sentinel-2 imagery. However, the fusion results are only suitable for severe building damage, and using imagery with a higher spatial resolution is recommended for other damage.

Third, the research by Mangalathu (Mangalathu et al. Citation2020) classified building damage due to the 2014 South Napa earthquake based on ATC-20 tags (red, yellow, and green) using machine learning with the random forest K-nearest neighbor and decision tree methods. They used several predictors as input parameters in the machine learning model, including spectral acceleration, fault distance, average shear-wave velocity-time up to a depth of 30 m, and building characteristics such as lifespan. They found that random forest provides an accurate classification of yellow tags of building damage. The limitation of this study is that the classification of buildings applies only to low-rise infrastructure and certain damage types (i.e. chimney damage, cripple wall, and porch damage); hence, it does not apply to other infrastructure and classes of damage.

However, in such studies, the use of imagery capabilities for the classification was suboptimal for detecting building damage. Therefore, this study uses two satellite imagery data sensors: Sentinel-2A, which has a sufficient spectral resolution to identify objects, and Worldview-2, which has a high spatial resolution. Moreover, as no previous research has thus far combined several predictors from random forest into 14 scenarios, we classify building damage using a random forest algorithm with 14 predictor scenarios. The novelty of this study is the use of four types of predictors, namely, geometry, statistics, texture, and vegetation index, with each predictor simulated into 14 predictor scenarios. The use of such scenarios is expected to improve the accuracy of the classification of building damage. Hence, this research is expected to streamline the detection of building damage as a first step toward recovery or reconstruction from a tsunami, thereby supporting the tsunami mitigation process.

2. Data and method

2.1. Study area

This research was carried out in Labuan District, located in Pandeglang Regency, Banten Province, Indonesia. Banten Province is the westernmost province on the island of Java, as shown in . Labuan means port or ship berth. This area is located at 6° 22′ 37.1064'' S and 105° 49′ 42.6216'' E (Department of Communications, Informatics Citation2022). Labuan was one of the areas most affected by the tsunami caused by the Anak Krakatau Volcano that hit the coast of the Sunda Strait in 2018.

Figure 1. Study area.

Figure 1. Study area.

2.2. Data

The data used in the study consisted of five spatial datasets: the Sentinel-2, Worldview-2, xBD, Digital Elevation Model (DEM), and Copernicus datasets. The data are explained in .

Table 1. Data used in this study.

2.2.1. Sentinel-2A

MSI’s Sentinel-2A is a medium-resolution satellite with 12 spectral bands consisting of Aerosol, Blue, Green, Red, Red Edge 1, Red Edge 2, Red Edge 3, NIR, Red Edge 4, Water Vapor, SWIR 1, and SWIR 2 (European Space Agency Citation2018). The number of spectral bands in Sentinel-2A allows this imagery to distinguish objects well spectrally. However, the produced resolution is only 10 m, making it difficult to interpret the geometry of smaller features such as buildings, as shown in . Sentinel-2A was used to record sightings before and after the tsunami, as shown in and (b), respectively.

Figure 2. Sentinel-2A: (a) Before the disaster; (b) After the disaster.

Figure 2. Sentinel-2A: (a) Before the disaster; (b) After the disaster.

2.2.2. Worldview-2

Maxar is a satellite image with a high spatial resolution (about 50 cm). Hence, this image can record the smallest object in the field at 50 cm (Maxar Citation2018), allowing researchers to interpret small features more clearly, as shown in . Therefore, these images can provide information about building damage. In this study, Worldview-2 satellite imagery was used to record sightings before and after the tsunami, as shown in and (b), respectively.

Figure 3. Worldview-2: (a) Before the disaster; (b) After the disaster.

Figure 3. Worldview-2: (a) Before the disaster; (b) After the disaster.

2.2.3. xBD

xBD data present satellite imagery before and after the disaster using building polygons containing satellite metadata and ordinal labels of the extent of the damage. The damage scale is divided into four levels: no damage, damage, destroyed, and unclassified (Gupta et al. Citation2019). In this study, the resulting polygon was blue, indicating that damage to buildings was included in the first level of classification, which means damage to small buildings. The resulting building polygon is in .

Figure 4. xBD dataset.

Figure 4. xBD dataset.

2.2.4. DEM

The DEM is a 3D representation of Earth’s topographic surface that does not include objects such as buildings and vegetation (USGS). It is obtained from the shuttle radar topography mission image. These DEM data have an altitude value that refers to WGS 84 in each cell structure (Farr et al. Citation2007). In this study, DEM data were used to obtain altitude information on the study area to enrich the predictor data, especially for the statistics and texture predictors, as shown in .

Figure 5. DEM of the Labuan region.

Figure 5. DEM of the Labuan region.

2.2.5. Copernicus data

This study also classified building damage by interpreting Copernicus imagery. Damage in the Copernicus database is classified into three classes: yellow indicates that buildings may have been damaged, orange indicates that buildings have been damaged, and red indicates that buildings have been destroyed (Copernicus Citation2019). We classified these as the damaged class, no damage class, and destroyed class. The building damage data from the Copernicus classification are in .

Figure 6. Copernicus data.

Figure 6. Copernicus data.

2.3. Method

As shown in , the work flow used in this study was divided into two stages, namely, the fusion and machine learning stages, as follows.

Figure 7. General flowchart of the research.

Figure 7. General flowchart of the research.

2.3.1. Remote sensing pre-processing

The first step was to fuse the Sentinel-2A and Worldview-2 imagery. Image fusion combines information from two or more sensory images into one composite image that is more informative and appropriate for visual perception or the process to be carried out on a computer (Goshtasby and Nikolov Citation2007; Zhang et al. Citation2020). The parameters that image fusion can combine are aperture ejection, dynamic range, spectral response, position (geometric), and polarization (Kaur et al. Citation2021). However, before fusion is performed on the image, the image must first be pre-processed by making radiometric corrections. The image fusion conducted in this study combined spatial (geometric) information with spectral information. Spatial/geometric information was obtained from Worldview-2 because of its high spatial resolution (Maxar Citation2018). Spectral information was obtained from Sentinel-2A, which has spectral information that comprises 12 spectral information bands (European Space Agency Citation2018). Therefore, the fusion results on both images are expected to produce high-resolution spatial information and considerable spectral information.

2.3.2. Machine learning with random forest

The machine learning method used to detect and classify the post-tsunami building damage was remote sensing. Random forest was specifically chosen as the classification method because of its ability to classify objects accurately. Reference data were shared for the training and testing. For each class of damage, 70% of the reference data were randomly selected for the training stage and the other 30% were used for the testing stage (Al-Abadi Citation2018; Zamani Joharestani et al. Citation2019).

The classification process using random forest includes four types of predictors: the geometry, statistics, vegetation index, and texture predictors. Each predictor was simulated into 14 combined predictor scenarios, as shown in . The geometry predictor is a calculation based on the analysis of polygon boundaries during the segmentation process. This calculation compares the results of multiscale segmentation with reference polygons and determines the polygon that best matches the reference by calculating geometry factors such as circularity, rectangularity, and elongation (Hu et al. Citation2021). The parameters of the geometry predictor that should be considered are compactness, elongation, circularity, rectangularity, convexity, solidity, form factor, major axis length, and minor axis length.

Table 2. Combination of predictors in the random forest classification.

The statistics predictor is a calculation based on the value of the pixels in an image that combines various attributes based on statistical analyses, such as minimum, maximum, average, and standard deviation (Tang et al. Citation2021).

The vegetation index predictor uses various mathematical combinations such as ratios, differences, and normalized differences obtained through hyperspectral and multispectral data related to vegetation (Fletcher 2016). It is a function of certain vegetation indices (i.e. GRVI, GI, VDI, RVI, NDVI, TDVI, SAVI, MSAVI2, GEMI, and LAI). The segmentation process produces several classes in one segmentation polygon. For example, there are two classes in one polygon, namely, buildings and vegetation. The vegetation index predictor is needed to differentiate vegetation objects around buildings in one segmentation polygon because the predictor can identify vegetation or non-vegetation (e.g. buildings and water) based on the spectral value (Sado & Islam 1996; Akbar et al. 2019; Sonawane & Bhagat 2017).

The texture predictor considers the texture size for a specific direction using all the pixels on the image based on the degree of grayness (Labombang Citation2011; Feng et al. Citation2015). To calculate the texture predictor, the parameters considered are the mean, standard deviation, entropy, angular second moment, and contrast value.

Using a combination of these predictors for the classification, we categorized damage into three types, namely, destroyed, damaged, and no damage, following several studies (Suppasri et al. Citation2013; Adriano et al. Citation2014; Le and Hsiung Citation2014). No damage buildings have no damage or cracks in their structure. To be classified into the damaged class, a building has a missing roof, visible cracks, or a partial wall or roof collapse, or the structure is surrounded by water or mud. For the class of destroyed buildings, the building has collapsed completely or partially, is wholly covered with water and mud, or no longer exists. The prediction results were then validated using the building classification from the xBD data, Copernicus data, and field survey data by calculating overall accuracy and the F1 score.

2.3.3. Field survey after the tsunami

A field survey of the location of the tsunami in the Labuan District was conducted in September 2021, two years and nine months after the disaster. The long time from the tsunami caused barriers to the field survey because the condition of a number of buildings had changed from the time of the incident. Therefore, the survey was conducted by integrating data from interviews with community members on reconstructing the damaged buildings after the tsunami.

Our field survey method was divided into three stages, as shown in . In the preparatory stage, we analyzed high-resolution imagery of the random forest classification results to determine the location of the survey. In addition, we collected statistical data and held interviews with local governments to support the field survey. The second stage was the acquisition of field survey data using paper-based and mobile GIS-based acquisition techniques. Paper-based methods are used as a backup scenario if mobile GIS-based methods cannot be used. The data acquisition process also engaged with affected communities to obtain information about the situation and conditions after the tsunami occurred. Finally, data processing and survey visualization can be carried out at the time of the survey by developing a temporary web application built before the field survey is conducted. The application is directly connected to the database as a cloud data store. The application interface is in .

Figure 8. Flowchart of the field survey process.

Figure 8. Flowchart of the field survey process.

Figure 9. User interface of the mobile GIS-based web app.

Figure 9. User interface of the mobile GIS-based web app.

The categorization of the building damage in the survey was generally the same as the random forest classification, namely, no damage, damaged, and destroyed. The damaged category represented that buildings had many crack, while the destroyed category represented that buildings were damaged to their foundations (Suppasri et al. Citation2013; Adriano et al. Citation2014; Le and Hsiung Citation2014).

3. Results

3.1. Fusion of sentinel-2A and worldview-2

The fusion of the Sentinel-2A and Worldview-2 imagery before and after the tsunami disaster is shown in and (b), respectively. We compare the fusion of the imagery results with the initial Sentinel-2A image data in and Worldview-2 imagery in , showing that the fusion results have a better geometric resolution than the Sentinel-2A imagery, which is derived from the geometric resolution of the Worldview-2 imagery. In addition, the data generated by the fusion method have more accurate spectral attributes than the Worldview-2 images.

Figure 10. The fusion of the Sentinel-2A and Worldview-2 imagery: (a) Before the tsunami; (b) After the tsunami.

Figure 10. The fusion of the Sentinel-2A and Worldview-2 imagery: (a) Before the tsunami; (b) After the tsunami.

shows the significant changes before and after the tsunami. For example, in areas (1) and (2), the area directly facing the beach has suffered severe damage. Before the tsunami, as shown in , some buildings are still standing strong along the coast. Conversely, after the tsunami, as shown in , the buildings in areas (a1) and (b1) are no longer there and have been razed to the ground. Another example is shown in area (a2) along the river estuary. In , neatly arranged ships are still docked and the buildings along the river stand strong. However, in , after the tsunami, area (b2) shows that many ships have been dragged up the river and damaged. also shows damaged buildings on the riverbank. The area of river flooding also seems to be wider after the tsunami than before.

3.2. Image segmentation results

The segmentation results for the fusion imagery are in . The results of this segmentation are based on objects grouped by considering their pixel characteristics (pixel-based grouping). shows that items dragged into the same class are represented by the shape of one polygon. Furthermore, sample training was selected as the input for the random forest classification through the segmentation results. The training sample is shown in , with yellow showing the sample of other objects and red showing the sample of destroyed buildings. Green represents the sample of no damage and orange represents the sample for damaged buildings.

Figure 11. Image segmentation results: (a) Segmentation results in the fusion imagery; (b) Selection of the sample training data.

Figure 11. Image segmentation results: (a) Segmentation results in the fusion imagery; (b) Selection of the sample training data.

3.3. Image classification using random Forest

shows the results of the damage classification based on the segmented buildings and shows the classification image results with the 14 scenarios and then extracted with the building polygon. The sequence of scenarios is in , which is a combination of the four predictors in the random forest classification. After the tsunami, illustrates scenario 2 (statistics) and scenario 6 (geometry and texture), with the area around the coastline and riverbanks showing greater damage to buildings than in other areas. Scenario 1 (geometry) results in a classification sufficiently dominated by unclassified classes compared with the other scenarios. Scenario 3 (texture) is dominated by the destroyed class compared with the other scenarios.

Figure 12. Segmented image by classification results: destroyed, damaged, and no damage.

Figure 12. Segmented image by classification results: destroyed, damaged, and no damage.

Figure 13. Classification of buildings: red = destroyed building, orange = damaged building, and green = no damage.

Figure 13. Classification of buildings: red = destroyed building, orange = damaged building, and green = no damage.

4. Discussion

4.1. Validation of the classification results using the xBD, copernicus, and field survey data

The classification of building damage using the 14 scenarios was compared using the xBD data, Copernicus data, and field survey data to assess its accuracy. The accuracy level in this study was expressed using overall accuracy and the F1 score. Overall accuracy represents the percentage of correctly classified pixels compared with the number of pixels used and the F1 score represents the harmonic mean of the precision and recall of the classification result. The overall accuracy and F1 score results from the 14 predictor scenarios compared with the xBD, Copernicus, and field survey data are in .

Table 3. Validation using the xBD, Copernicus, and field survey data.

First, for the comparison with the xBD data, the model with the most significant overall accuracy is scenario 2, where the statistics predictor is used, with an overall accuracy of 0.631. The random forest classification with the statistics predictor results from the segmentation that considers the pixel characteristics (pixel-based grouping). The values of objects from similar pixels are included in the same class and represented by the shape of one polygon. For the statistics predictor, the pixel values indicate damage based on changes in spectral values before and after the event, which are from calculations by segment (zonal statistics). Each segment includes the maximum, minimum, mean, and standard deviation of all the pixels. In the process, the values contained in each pixel on one polygon are generalized to one damage value. This process can result in a classification error in the range of building damage values, which could affect overall accuracy. The scenario that has the highest F1 score in the classification of buildings is based on the geometry predictor (0.586). This shows that the geometry predictor is good at classifying – based on the harmonic mean of the precision and recall – compared with the xBD data. The geometry predictor uses information on area, perimeter, circularity, rectangularity, and elongation to segment buildings.

Second, for the comparison with the Copernicus data, the highest overall accuracy and F1 score come from the combination of the vegetation index and texture predictors, which is 0.959 for overall accuracy and 0.783 for the F1 score. This shows that the vegetation index can detect building damage. The strong predictive nature of the vegetation index can separate buildings from non-buildings such as water bodies, vegetation, and open spaces. Texture information can enhance the accuracy at which the vegetation index can detect building damage.

Third, for the comparison with the field survey data, the highest overall accuracy comes from a combination of the geometry, statistics, and texture predictors (0.588). This shows that geometry, statistical, and texture information combined can enhance the accuracy of building detection. The highest F1 score is for the geometry predictor (0.578). This shows that according to the harmonic mean of the precision and recall, the geometry predictor is superior to the others compared with the field survey data. In the field survey, errors in defining building classes could have occurred because these results were obtained from interviews with local communities two years and nine months after the tsunami.

4.2. Building damage based on field surveys

shows that area (1) is around the coastline. The survey results show that area (1) suffered a tsunami height of up to 2–5 meters, which meant that the buildings around the beach suffered heavy damage. The buildings in areas (1) and (2), which are around the riverbank, also suffered heavy damage. The damage to the buildings in area (2) was not caused by the tsunami directly but rather by flooding due to the tsunami. In addition, area (3) was a relocation area built by the government for the permanent settlement of the victims of the tsunami.

Figure 14. Building area affected by the tsunami.

Figure 14. Building area affected by the tsunami.

shows photos of the building damage by class. The buildings in the no damage class in have no damage due to the tsunami. The damaged class of buildings has small cracks, which are not visible in . The damaged class has a large number of cracks on the sides of the buildings, as shown in . Buildings that have damage that is heavy enough for the sides and roofs of the buildings to be damaged are shown in . Buildings that are destroyed are in ; these are classified as the destroyed class. Damage to destroyed class buildings forced the community to repair or rebuild these buildings, as shown in .

Figure 15. Building area affected by tsunami based on the field survey: (a) no damage; (b) no damage; (c) damage; (d) damage; (e) damage; (f) damage; (g) damage; (h) damage; (i) destroyed; (j) destroyed; (k) rebuilding; and (l) rebuilding (Image taken: September 2021).

Figure 15. Building area affected by tsunami based on the field survey: (a) no damage; (b) no damage; (c) damage; (d) damage; (e) damage; (f) damage; (g) damage; (h) damage; (i) destroyed; (j) destroyed; (k) rebuilding; and (l) rebuilding (Image taken: September 2021).

4.3. Reconstruction of building damage based on field surveys

In 2020, a field survey was conducted to determine the handling of damaged buildings after the tsunami. The survey results are shown in . The construction activities are divided into four durations: buildings built/repaired in less than two weeks, between two weeks and three months, between three months and one year, and more than one year. The results of the field surveys show that over half (51.7%) of the buildings affected by the tsunami were repaired within three months to one year, followed by 32.9% taking more than one year (with residents living in shelters during that time). The length of time taken to handle this disaster is one consideration when carrying out post-disaster preparedness, especially after a tsunami. Therefore, better handling of damage to post-tsunami buildings is needed so that the recovery process of the affected location is quicker.

Figure 16. Reconstruction of buildings based on (a) time and (b) location.

Figure 16. Reconstruction of buildings based on (a) time and (b) location.

shows the buildings still affected by the tsunami after two years and nine months. The results show that 10% of buildings do not need repairs, 25% have been rebuilt in the same location, 23% have been reconstructed in different locations, 23% are still being repaired, and 19% remain damaged.

4.4. Limitations and future study

This study has several limitations. First, we classified buildings affected by the tsunami using all the predictor attributes, which means that errors in these attributes that are not appropriate for the classification of buildings affected by disasters are included in the processing. In the segmentation process, we used all the objects, not only those related to buildings. In future research, the segmentation process should focus only on buildings to enhance the accuracy of the segmentation classification. Second, the field survey was conducted two years and nine months after the event; hence, the tsunami’s impact on some buildings could only be explained by residents. At the time of the field survey, there was a possibility of errors in determining the position of the buildings using smartphone GPS. This research could be improved by selecting the best predictor attributes for classifying buildings affected by tsunamis as well as using high-resolution DEMs (e.g. digital terrain model and digital surface model) to detect building damage. The approach used in this research could also be adopted to classify buildings affected by other hazards such as earthquakes, landslides, and floods.

5. Conclusion

The machine learning method of remote sensing can detect building damage automatically. Based on the validation results, the xBD, Copernicus, and field survey data provided different overall accuracy and F1 scores. The highest overall accuracy for the xBD, Copernicus, and field survey data was 63.1%, 95.9%, and 58.8%, using the statistics predictor, texture/vegetation index predictors, and geometry/statistics/texture predictors, respectively. The highest F1 scores were 58.6%, 78.3%, and 57.8%, using the geometry, texture/vegetation index, and geometry predictors, respectively.

The field survey results are difficult to use as validation material for the classification results because of the time interval between the tsunami and field surveys. However, the characteristics of some of the buildings damaged by the tsunami could still be identified during the study because 10% of buildings were left damaged until the survey took place. The buildings most severely affected by the tsunami were around the coastline and riverbanks. In addition to the direct impact, the buildings around the riverbanks were indirectly affected by the floods due to the tsunami. Based on the field survey results, the community took an average of three months to one year to reconstruct the buildings damaged by the tsunami. The results of this study are expected to help classify building damage through remote sensing data to allow disaster mitigation in tsunami-prone areas to be more rapid.

Data availability statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Disclosure statement

The authors declare no conflict of interest.

Additional information

Funding

This research was funded by Lembaga Penelitian dan Pengabdian Masyarakat (LPPM) under the Riset Unggulan ITB 2021 and the other one LPDP grant scheme PRJ-102/LPDP/2021.

References

  • Aber JS, Marzolff I, Ries JB. 2010. Chapter 3 - photogrammetry. In: Aber JS, Marzolff I, Ries JB BT-S-FAP, editors. Amsterdam: Elsevier; p. 23–39.
  • Adriano B, Yokoya N, Xia J, Miura H, Liu W, Matsuoka M, Koshimura S. 2021. Learning from multimodal and multitemporal earth observation data for building damage mapping. ISPRS J Photogramm Remote Sens. 175:132–143.
  • Adriano B, Xia J, Baier G, Yokoya N, Koshimura S. 2019. Multi-source data fusion based on ensemble learning for rapid building damage mapping during the 2018 sulawesi earthquake and tsunami in Palu, Indonesia. Remote Sens. 11(7):886.
  • Adriano B, Mas E, Koshimura S, Estrada M, Jimenez C. 2014. Scenarios of earthquake and tsunami damage probability in Callao region, Peru using tsunami fragility functions. J. Disaster Res. 9(6):968–975.
  • Afida BA, Kamal M, Hadmoko DS. 2020. Identifikasi Kerusakan Bangunan Pasca Gempa Bumi Menggunakan Citra Satelit Worldview-2. JPK. 8(1):67–77.
  • Aggarwal S. 2004. Principles of remote sensing. Satell Remote Sens GIS Appl Agric Meteorol. 23(2):23–28.
  • Al-Abadi AM. 2018. Mapping flood susceptibility in an arid region of southern Iraq using ensemble machine learning classifiers: a comparative study. Arab J Geosci. 11(9):1–19.
  • Belgiu M, Drăguţ L. 2016. Random forest in remote sensing: a review of applications and future directions. ISPRS J Photogramm Remote Sens. 114:24–31.
  • Breiman L. 2001. Random forests. Mach Learn. 45(1):5–32.
  • Burrough PA, McDonnell RA, Lloyd CD. 2015. Principles of geographical information systems. United Kingdom: Oxford University Press.
  • Cartwright JHE, Nakamura H. 2008. Tsunami: a history of the term and of scientific understanding of the phenomenon in Japanese and Western culture. Notes Rec R Soc Lond. 62(2):151–166.
  • Cooner AJ, Shao Y, Campbell JB. 2016. Detection of urban damage using remote sensing and machine learning algorithms: revisiting the 2010 Haiti earthquake. Remote Sens. 8(10):868.
  • Copernicus 2019. Copernicus Emergency Management Service [Internet]. [accessed 2022 Apr 1]. https://emergency.copernicus.eu/mapping/ems/damage-assessment.
  • Corbett-Davies S, Goel S. 2018. The measure and mismeasure of fairness: a critical review of fair machine learning. arXiv Prepr arXiv180800023; New York: Cornell University; p. 1–25.
  • Cutler A, Cutler DR, Stevens JR. 2012. Random forests. In: ensemble Mach Learn. Boston, MA: Springer; p. 157–175.
  • DeFries R, Levin SABT-E of B (Second E 2013. Remote Sensing and Image Processing [Internet]. In:, editor. Waltham: academic Press; p. 389–399.
  • Department of Communications, Informatics P and SP 2022. Pandeglang One Data [Internet]. [accessed 2022 Apr 2]. https://satudata.pandeglangkab.go.id/index.php/kecamatan/detail/labuan.
  • Dong L, Shan J. 2013. A comprehensive review of earthquake-induced building damage detection with remote sensing techniques. ISPRS J Photogramm Remote Sens. [Internet]84:85–99.
  • Ebert JI. 2015. Photogrammetry, Photointerpretation, and Digital Imaging and Mapping in Environmental Forensics. Introd to Environ Forensics Third Ed. 39–64.
  • European Space Agency 2018. User Guides - Sentinel-2 MSI - Level-2A Product - Sentinel Online - Sentinel Online [Internet]. [accessed 2022 Apr 1]. https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/product-types/level-2a.
  • Farr TG, Rosen PA, Caro E, Crippen R, Duren R, Hensley S, Kobrick M, Paller M, Rodriguez E, Roth L. 2007. The shuttle radar topography mission. Rev Geophys. 45(2):1–33
  • Feng Q, Liu J, Gong J. 2015. UAV remote sensing for urban vegetation mapping using random forest and texture analysis. Remote Sens. 7(1):1074–1094.
  • Fernández-Delgado M, Cernadas E, Barro S, Amorim D. 2014. Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res. 15(1):3133–3181.
  • Goshtasby AA, Nikolov SG. 2007. Guest editorial: image fusion: advances in the state of the art. Inf Fusion Spec Issue Image Fusion Adv State Art. 8:114–118.
  • Gupta R, Goodman B, Patel N, Hosfelt R, Sajeev S, Heim E, Doshi J, Lucas K, Choset H, Gaston M. 2019. Creating XBD: a dataset for assessing building damage from satellite imagery. IEEE Comput Soc Conf Comput Vis Pattern Recognit Work. 2019-June: 10–17.
  • Hamzah L, Puspito NT, Imamura F. 2000. Tsunami Catalog and Zones in Indonesia. J Nat Disaster Sci. 22(1):25–43.
  • Hancilar U, Taucer F, Corbane C. 2013. Empirical fragility functions based on remote sensing and field data after the 12 January 2010 Haiti earthquake. Earthq Spectra. 29(4):1275–1310.
  • Hu Z, Shi T, Wang C, Li Q, Wu G. 2021. Scale-sets image classification with hierarchical sample enriching and automatic scale selection. Int J Appl Earth Obs Geoinf. 105:102605.
  • Kaur H, Koundal D, Kadyan V. 2021. Image fusion techniques: a survey. Arch Comput Methods Eng. 28(7):4425–4447.
  • Kedzierski M, Wierzbicki D, Sekrecka A, Fryskowska A, Walczykowski P, Siewert J. 2019. Influence of lower atmosphere on the radiometric quality of unmanned aerial vehicle imagery. Remote Sens. 11(10):1214.
  • Kerle N, Damen M. 2009. Guidance notes Session 2 : Obtaining spatial data for risk assessment. Guid B. 1–31.
  • Kerle N, Nex F, Gerke M, Duarte D, Vetrivel A. 2019. UAV-based structural damage mapping: a review. IJGI. 9(1):14–23.
  • Khodaverdizahraee N, Rastiveis H, Jouybari A. 2020. Segment-by-segment comparison technique for earthquake-induced building damage map generation using satellite imagery. Int J Disaster Risk Reduct. [Internet]46:101505.
  • Koshimura S, Moya L, Mas E, Bai Y. 2020. Tsunami damage detection with remote sensing: a review. Geosci. 10(5):177.
  • Labombang M. 2011. Manajemen Risiko Dalam Proyek Konstruksi. Bangunan. 9(1):39–46.
  • Le HQ, Hsiung B-CB. 2014. A novel mobile information system for risk management of adjacent buildings in urban underground construction. Geotech Eng J SEAGS AGSSEA. 45(3):52–63.
  • Liu Y, Wang Y, Zhang J. 2012. New Machine Learning Algorithm: random Forest. BT - Information Computing and Applications. In: liu B, Ma M, Chang J, editors. Berlin, Heidelberg: springer Berlin Heidelberg; p. 246–252.
  • Mangalathu S, Sun H, Nweke CC, Yi Z, Burton HV. 2020. Classifying earthquake damage to buildings using machine learning. Earthq Spectra. 36(1):183–208.
  • Maxar 2018. Satellite Imagery [Internet]. [accessed 2022 Apr 1]. https://www.maxar.com/imagery-leadership.
  • Maxar 2019. Open Data Program Disaster Response Geospatial Analytics [Internet]. [accessed 2022 Apr 1]. https://www.maxar.com/open-data.
  • Meilano I, Rahadian A, Suwardhi D, Suminar W, Atmaja F, Pratama C, Sunarti E, Haksama S. 2020. Analysis of damage to buildings affected by the tsunami in the Palu coastal area using deep learning.
  • Meslem A, Yamazaki F, Maruyama Y. 2011. Accurate evaluation of building damage in the 2003 Boumerdes, Algeria Earthquake from Quickbird satellite images. J Earthquake Tsunami. 05(01):1–18.
  • Ministry of Energy and Mineral Resources Republic of Indonesia 2018. Response to the Tsunami in the Sunda Strait, 22 December 2018 [Internet]. [accessed 2022 Apr 8]. https://www.esdm.go.id/id/media-center/arsip-berita/tanggapan-kejadian-tsunami-di-selat-sunda-tanggal-22-desember-2018.
  • Ministry of Marine Affairs and Fisheries 2019. The Nation’s Future Sea, Let’s Keep Up Together [Internet]. [accessed 2022 Apr 2]. https://kkp.go.id/artikel/12993-laut-masa-depan-bangsa-mari-jaga-bersama.
  • Monfort D, Negulescu C, Belvaux M. 2019. Remote sensing vs. field survey data in a post-earthquake context: potentialities and limits of damaged building assessment datasets. Remote Sens Appl Soc Environ. [Internet]. 14(February):46–59.
  • Moya L, Muhari A, Adriano B, Koshimura S, Mas E, Marval-Perez LR, Yokoya N. 2020. Detecting urban changes using phase correlation and ℓ1-based sparse model for early disaster response: a case study of the 2018 Sulawesi Indonesia earthquake-tsunami. Remote Sens Environ. 242:111743.
  • Naito S, Tomozawa H, Mori Y, Nagata T, Monma N, Nakamura H, Fujiwara H, Shoji G. 2020. Building-damage detection method based on machine learning utilizing aerial photographs of the Kumamoto earthquake. Earthq Spectra. 36(3):1166–1187.
  • NOAA. What is a tsunami? [Internet]. [accessed 2022 Mar 23]. https://oceanservice.noaa.gov/facts/tsunami.html.
  • Pattiaratchi C. 2014. Tsunamis — their causes and effects. (January 2005).
  • Radhika S, Tamura Y, Matsui M. 2015. Cyclone damage detection on building structures from pre- and post-satellite images using wavelet based pattern recognition. J Wind Eng Ind Aerodyn. [Internet]. 136:23–33.
  • Saha K, Frøyen YK. 2021. Learning GIS Using Open Source Software: an Applied Guide for Geo-spatial Analysis. London: Routledge India.
  • Speiser JL, Miller ME, Tooze J, Ip E. 2019. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst Appl. 134(336):93–101.
  • Suppasri A, Koshimura S, Imai K, Mas E, Gokon H, Muhari A, Imamura F. 2012. Damage characteristic and field survey of the 2011 Great East Japan tsunami in Miyagi Prefecture. Coast Eng J. [Internet]. 54(1):1250005–1250030.
  • Suppasri A, Mas E, Charvet I, Gunasekera R, Imai K, Fukutani Y, Abe Y, Imamura F. 2013. Building damage characteristics based on surveyed data and fragility curves of the 2011 Great East Japan tsunami. Nat Hazards. 66(2):319–341.
  • Tang Y, Qiu F, Jing L, Shi F, Li X. 2021. A recurrent curve matching classification method integrating within-object spectral variability and between-object spatial association. Int J Appl Earth Obs Geoinf. 101:102367.
  • The National Agency for Disaster Countermeasure 2016. Disasters Risk of Indonesia. Int J Disaster Risk Sci [Internet].:22.
  • The Parliamentary Office of Science and Technology 2020. Remote sensing and machine learning. UK Parliam POSTNOTE No 628.(628):p. 1–7.
  • USGS. What is a digital elevation model (DEM) [Internet]. [accessed 2022 Apr 1]. https://www.usgs.gov/faqs/what-digital-elevation-model-dem.
  • Wang Y, Li S, Teng F, Lin Y, Wang M, Cai H. 2022. Improved mask R-CNN for rural building roof type recognition from UAV high-resolution images: a case study in Hunan Province, China. Remote Sens. 14(2):265.
  • Westen CV, Kingma N, Montoya L. 2009. Guide book Session 4: elements at Risk. Guid B. 1–43.
  • Widyatmanti A. 2020. Sentinel-1 and Sentinel-2 Data Fusion Based on Random Forest Algorithm for Mapping Types of Damage to Buildings in Pemenang-Lombok Earthquake in 2018. Yogyakarta. Indonesia: Gajahmada University.
  • Zamani Joharestani M, Cao C, Ni X, Bashir B, Talebiesfandarani S. 2019. PM2. 5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere (Basel). 10(7):373.
  • Zhang R, Li H, Duan K, You S, Liu K, Wang F, Hu Y. 2020. Automatic detection of earthquake-damaged buildings by integrating UAV oblique photography and infrared thermal imaging. Remote Sens. 12(16):2621.