Publication Cover
International Journal of Architectural Heritage
Conservation, Analysis, and Restoration
Volume 16, 2022 - Issue 7
2,427
Views
8
CrossRef citations to date
0
Altmetric
Research Article

Crowdsource Drone Imagery – A Powerful Source for the 3D Documentation of Cultural Heritage at Risk

ORCID Icon
Pages 977-987 | Received 06 Aug 2020, Accepted 14 Nov 2020, Published online: 28 Dec 2020

ABSTRACT

Heritage at risk is a terminology used to describe the sites that are highly at risk of being lost as a result of intentional demolition, deterioration, negligence or subject to improper preservation or mistreatment. Iraq is one of the countries that suffered in the last decade from intentional demolition of highly valuable heritage sites and objects. As Iraq gradually recovering from wars and violence with limited resources and budgets, historical and heritage places are still at risk because of neglect, community ignorance, insufficient planning, and military actions. Therefore, in this paper, we propose the idea of using crowdsource drone images and videos which are captured by amateurs for the documentation of heritage sites. Those crowdsource images represent a great source of data that does not require significant financial and hard labor resources. It should be noted that there is no guarantee to have the captured data being sufficient for the 3D documentation and therefore it is proposed to integrate, when possible, multiple captured crowdsource data to ensure complete documentation.

In this paper, three Iraqi historical sites are 3D reconstructed using crowdsource drone videos, namely: Rabban Hormizd Monastery (AD 640), Taq Kasra (AD 242 to 272), and the Great Mosque of Samarra (AD 849–851). The experiments showed a successful 3D modeling of the three mentioned heritage objects using the crowdsource drone video images despite being captured for non-3D purposes which require high expertise and planning. With the absence of highly accurate reference data, the overall relative accuracy of the object’s dimensions is found to be less than 1 m.

1. Introduction

Heritage at risk is a terminology used to describe the sites that are highly at risk of being lost as a result of intentional demolition, deterioration, negligence or subject to improper preservation or mistreatment (Heritage at Risk World Report Citation2011–2013 on monuments and sites in danger 2014). Iraq has suffered in the last decade from the intentional destruction of highly valuable heritage sites and objects like the famous Al-Hadbaa leaning Minaret at Mosul city northern Iraq (UAE, UNESCO and Iraq conclude historic 50m partnership to reconstruct Mosul’s iconic al-Nouri Mosque and al-Hadba Minaret Citation2020).

Nevertheless, considering 30,000 archaeological sites and monuments in Iraq (Iraq’s Most Significant Ancient Sites and Monuments Citation2020), very limited 3D documentation has been applied like the work presented in (Sameer and Abed Citation2020; Shamkhi and Abed Citation2020) to model the statue of Al-Hatra lady (312–139 BC). This limitation is caused by several reasons like inadequate resources and budgets, the huge number of heritage places to be documented, site accessibility difficulties, security considerations, the lack of expertise and modern technologies, community ignorance, etc.

Accordingly, in cases of recent data lack or lost heritage sites, technicians resort to other data sources. Many researchers investigated the possibility of using archived image data for the reconstruction of the lost heritage like the famous work introduced by Grün, Remondino, and Zhang (Citation2004) to reconstruct the statue of Buddha in Afghanistan and the work introduced by Grussenmeyer and Al Khalil (Citation2017) to reconstruct the great mosque of Aleppo in Syria.

Technically, the standard 3D image-based reconstruction of objects comprises different successive steps as follows: 1) Image matching: in this step, the overlapped images are tested for common invariant corresponding keypoints by using state-of-the-art operators like SIFT (Lowe Citation2004) or SURF (Bay et al. Citation2008)., 2) Structure from Motion SfM: after matching, the internal and external geometry of the images and the object are reconstructed using sophisticated projective geometry techniques by relying on the epipolar geometry to estimate the relative camera motion (Hartley and Zisserman Citation2003).

A bundle adjustment technique (Förstner and Wrobel Citation2016) is used to refine the computed image orientations and the sparse tie points. 3) Dense reconstruction: mostly a pixel-based method is used to compute the depth (disparity) with high precision for every pixel of the oriented images like by using the Semi-Global Matching SGM technique (Hirschmuller Citation2008).

Recently, the crowdsource videos recorded by the amateurs became a common activity and either shared through social media sites or through other media internet websites like YouTube or Vimeo. Those crowdsource video images play a new uprising source of data that can be highly invested in such 3D documentation and mapping applications. Remarkably, crowdsource images and videos taken from the ground were already used for the 3D reconstruction of heritage objects as proposed by (Agarwal et al. Citation2011; Alsadik Citation2016; Dhonju et al. Citation2017; Snavely, Seitz, and Szeliski Citation2006).

Two types of crowdsource images can be classified: 1) images taken by tourists and hobbyists and 2) images taken intentionally by volunteers.

The first source of crowdsourced data is more challenging to process since they are: unorganized and published on different internet sites, they are captured from various camera sensors, a major amount of images are occluded by the persons themselves, etc. On the other hand, crowdsource data collected by volunteers are more suitable for processing because they are organized and uploaded to a specific repository on the internet. Furthermore, they are captured for the purpose of the site documentation and with more care despite being taken by nonprofessionals.

Nowadays, very common to capture still images and videos from drones. A drone is essentially an aircraft without a human pilot on board and is widely known as Unmanned Aerial Vehicle UAV. Different types of drones are available in the market and can be classified according to the type of application like for photography, aerial mapping, inspection, surveillance, etc. (The Different Types of Drones You Should Know About Citation2020; Types of Drones — Explore the Different Models of UAV’s Citation2020).

On the other hand, there are different types of platforms like the multi-rotor, fixed-wing, and the single rotor helicopter ()). Moreover, drones can be classified as high-end grade equipped with professional sensors and used mainly for mapping ()) like Sensefly eBee (Citation2020), Trimble ZX5 (Citation2020), and Shenzhen Eagle Brother (Citation2020) which is developed for agricultural applications. Other types are considered consumer-grade drones, which are widely used for general filming, sport, hobbies, tourism, and cinematic videos. DJI Drones like Phantom 4 and Mavic Air (DJI Citation2020) are well-known examples of the consumer-grade drones ()).

Figure 1. (a) Professional mapping drones. (b) Consumer grade drones

Figure 1. (a) Professional mapping drones. (b) Consumer grade drones

Recently, the drone videos recorded by amateurs in countries like Iraq became a common activity and shared through social media sites and other media internet websites like YouTube. Some of those publicly shared videos captured by the drones are taken for some historical and heritage sites in Iraq.

Three main difficulties are faced using crowdsource drone videos compared to conventional drone photogrammetry projects. Firstly, the images are captured without a flight plan satisfying the overlap percentages and the required final accuracy of the reconstruction. Therefore, it is quite possible that the image matching and reconstruction pipeline fail even when using state-of-the-art tools. Secondly, the video images are considered of a low resolution taken at a high definition HD of 1920 × 1080 pixels or 1280 × 720 pixels. However, manufacturers were starting to use camera sensors that are able to capture 4K resolution video images which should have a positive impact on video image-based modeling. This higher resolution is expected to add a significant improvement to the final 3D reconstructed models in terms of accuracy, interpretation, and details. Thirdly, drone videos are captured by consumer-grade camera sensors which have a significant lens distortion and produce some blurry images as well. This can be avoided by running the camera self-calibration approaches to fix the distortion on one hand and to discard the blurry images on the other hand since the video images are quite redundant and normally captured at a rate of up to 60 frames per second (DJI Citation2020).

Accordingly, the major aim of this paper is to use the crowdsource drone videos for the 3D documentation of the heritage sites using state-of-the-art image-based modeling techniques (Quan Citation2010; Remondino and El-Hakim Citation2006). Furthermore, to urge the interested community in countries like Iraq to continue the amateur drone video filming for the historical sites and objects and to share them through the internet cloud services either for free or for some low-cost prices. This will represent a great source of data that does not require significant financial and labor resources. Worth mentioning that this effort should be in compliance with the regulations raised by the authorities. The proposed integration between amateurs data collection and professional data processing can be administrated by an academic institution, non-governmental organizations (NGOs), or by governmental associations. The proposed concept is illustrated in .

Figure 2. Proposed concept

Figure 2. Proposed concept

2. Method

Image-based modeling techniques reached an advanced step in terms of automation and computational processing speed. Several tools are currently available either open source or commercial like Metashape (Agisoft Citation2020), Pix4D (Citation2020), nFrames (Citation2020), and Photomodeler (Citation2020). In this paper, the Metashape tool will be used for the task of 3D reconstruction and modeling. As mentioned, image-based modeling can be applied through successive steps as image matching, image orientation using Structure from Motion SfM (Remondino and El-Hakim Citation2006; Wang and Wu Citation2011), and then dense matching using advanced techniques like the Semi-Global Matching SGM (Hirschmuller Citation2008). The depth maps created in the dense matching step are used to create point clouds which are then converted into textured 3D models. The proposed workflow is described in where videos are converted into image sequences. Then, blurry images are discarded relying on the high redundancy offered by the video. It is also better to have multiple videos captured for the same site by different people, and these could be combined to increase the chances of successful reconstruction and better coverage. However, this may add other challenges to apply the image orientation regarding the illumination, colors, resolution, distortion, etc. Furthermore, shared high resolution still images taken either by photographers or tourists can be integrated to compensate for the limited video resolution and to complete the coverage angles for some cases.

Figure 3. Proposed methodology

Figure 3. Proposed methodology

As an advanced step when the taken drone videos are not successfully oriented to compose a 3D model, image data can be divided according to its imaged features like modeling some walls separately from buildings or roads. Then, all the sub-models or point clouds are registered together either manually or automatically to compose a final one piece of the object model. Ultimately, the 3D model is scaled into reality by knowing either reference points or dimensions acquired from archived maps and literature when possible.

3. Results and discussion

Three experiments are presented in this paper for the reconstruction of 3D cultural heritage models using Metashape software. Experiments are applied on crowdsource drone videos taken for valuable world heritage sites in Iraq, namely: Arch of Ctesiphon (Taq Kasra), Rabban Hormizd Monastery, and the Great Mosque of Samarra.

Worth to mention that the first experiment was applied using a workstation Xeon with a CPU E5-2643 v3 of 3.40 GHz, 128 GB RAM with a Quadro K2200 GPU. The other two experiments were processed using a Dell laptop Core i7-9750 H with a CPU of 2.60 GHz and 16 GB RAM.

3.1. Taq Kasra — Arch of Ctesiphon

Taq Kasra which is also known as the Arch of Ctesiphon has located nearby the city of Salman Pak, middle of Iraq about 35 km south of Baghdad. It is the world’s largest remaining brick single-span arch and the symbol of the Persian Empire in the Sasanian era (AD 224–651) (Taq Kasra Citation2020; Taq Kasra: 3rd-century Persian Monument in Iraq Citation2020). The palace was captured by the Muslims during the conquest of Persia in AD 637 where they used it as a mosque for a while. In the early 10th century, the Abbasid caliph al-Muktafi dug up the ruins of the palace to use again its bricks in the construction of the Taj Palace in Baghdad (Taq Kasra Citation2020). In 1888, a severe flood damaged a larger part of the construction.

Architecturally, the monument consists of a large arch of 43.50 m depth by 25.50 m width, in the middle of a facade that extends 46 m in either direction and stands 35 m above ground level. The facade is formed by a series of six layers of brickwork, consisting of columns, entablatures, and arched corners (AYVĀN-E KESRĀ Citation2020).

The Iraqi Ministry of Culture also contracted a Czech company, Avers, to restore the site. This restoration was completed in 2017. However, in March 2019, a limited collapse added more damage to the structure, just 2 years after its latest restoration was completed (Taq Kasra Citation2020). Accordingly, this important historical monument is still at risk and needs urgent attention from the authorities to preserve it from any further damages. Additionally, this object is at risk because of the community’s ignorance about the value of this object and its historical and architectural importance. This is something that needs to educate the site visitors from harming the structure parts and keep a safe distance from it. Placing fences and arranging the entrance and exit passages for the site visitors are important to be considered by the authorities. In , it is obvious how the local visitors are climbing the walls of the structure and trying to take photos to memorize without realizing the danger of harming themselves or the structure.

Figure 4. Left) White paint of a personal memory on the structure bricks. Right) Visitors climbing the structure

Figure 4. Left) White paint of a personal memory on the structure bricks. Right) Visitors climbing the structure

Hence, the detailed documentation and 3D modeling are playing an important role in future rehabilitations and renovations of this site. Crowdsource videos, especially taken by drones are rare for this object; however, two are found (MBC Citation2020; Tak Kasra Citation2020) beside videos taken from the ground by tourists and hobbyists (N. A. f. Iraq Citation2020; ا. ا. _vlogs Citation2020). Those videos (1280 × 720 pixels) were converted into image sequences and the redundant and blurry images are filtered out as described earlier in section 2. Because of the difficulty to orient the images altogether, it was better to divide them into chunks and then register them either manually using the Cloud Compare tool or using the reference marks in the Metashape tool (). The 3D reconstructed model based on the drone video images is sufficient to give an overall and low detailed information. However, to give more details, the model is integrated with the ground-based model derived from videos taken by tourists.

Figure 5. The 3D image-based model of the Taq Kasra site derived from crowdsourced drone videos. a) Image matching samples. (left) Matching between the drone video frame and ground frame. (right) Image matching between two challenging views. b) the oriented images taken from the crowdsource drone (red) and ground-based videos (blue). c) Final reality-scaled 3D model

Figure 5. The 3D image-based model of the Taq Kasra site derived from crowdsourced drone videos. a) Image matching samples. (left) Matching between the drone video frame and ground frame. (right) Image matching between two challenging views. b) the oriented images taken from the crowdsource drone (red) and ground-based videos (blue). c) Final reality-scaled 3D model

Although this approach of integrating multiple video data may degrade the overall accuracy of the composed model, it is still in the relative accuracy limits of 1 m in the worst-case scenario. It is worth mentioning that the RMSE resulted, in this case, is 27 cm at the reference points. The 3D model is finally composed using 1353 images oriented successfully with an average ground sample distance GSD of 6 cm/pixel and average tie point multiplicity of 5 images per tie point. The final 3D point cloud of 5 million points is created with an average density of 25 pts/m2 and the derived 3D model is published on the Sketchfab website in the following link https://skfb.ly/6TYwW

3.2. Rabban Hormizd Monastery

Rabban Hormizd Monastery is an important monastery of the Chaldean Catholic Church, established about AD 640, located about 45 km north of Mosul in Iraq. In the 18th century, the monastery became the official residence of the patriarchs of the Church of the East. After the Ottoman-Persian War (1743–1746) the monastery was abandoned. In 1808 the abandoned monastery was revived and rebuilt. The monastery was built upon an enormous rock halfway up the range of mountains which encloses the plain of Mosul on the north and can be reached by a rocky path which has been paved by the monks for many years. The church is built of reddish color stone and located in the hills surrounded by rows of caves inside the solid rocks (Budge 2003; Rabban Hormizd Monastery Citation2020; Rabban Hormizd Monastery.Tel Kaif, Iraq Citation2020).

Shared videos (Burns Citation2020; Studios Citation2020) taken in 2016 and 2107 by a drone and few tourist images are used to create successfully the 3D model of the monastery (). The total number of images extracted from the videos is 404 images taken at a flying altitude of around 150 m with an average GSD of 15 cm/pixels. The projection error of the oriented images is 0.6 pixels with an average tie point multiplicity of 5 images. A dense point cloud of about 18 million points is created from the images with a density of 36 pts/m2 and the 3D reconstructed model is shared in Sketchfab at the following link https://skfb.ly/6TXFy.

Figure 6. 3D reconstructed model of Rabban Hormizd monastery. a) Image matching samples. Matching between the drone video frame and ground frame (left). Image matching between two drone views (right). b) the oriented images taken from the crowdsource images. c) Final reality-scaled 3D model

Figure 6. 3D reconstructed model of Rabban Hormizd monastery. a) Image matching samples. Matching between the drone video frame and ground frame (left). Image matching between two drone views (right). b) the oriented images taken from the crowdsource images. c) Final reality-scaled 3D model

3.3. The Great Mosque of Samarra

The Great Mosque of Samarra is an important lasting Islamic monument from the Abbasid caliphate era which was established in the ninth century in the city of Samarra about 127 km north of Baghdad in Iraq (Great Mosque of Samarra Citation2020). Samarra was the second Abbasid capital after Baghdad and the Great Mosque with its famous lasting minaret represents the only physical trace of that era in its glorious time (Samarra Archaeological City Citation2020).

Structurally, the Great Mosque covered a rectangular area of nearly 38,000 m2, where its walls extend 239 m in length by 156 m in width. The walls’ thickness is 2.65 m and includes 44 semi-circular towers. The minaret height is 52 m with a squared base side length of 33 m. This base strengthens the ramp which goes up the tower anticlockwise in five laps with 2 m stair width until reaching the minaret top where a 12 m diameter cylindrical room existed (Hussein Citation2020). Unfortunately, the minaret top was partially demolished during a bombing attack in 2005. It should be mentioned that in 2003, this historical site has been occupied by the multi-national forces and used it as a military base. However, no site damages have been recorded during that period.

Conservations in the site have been suspended since 2003 and the site is considered at risk. The authorities were unable to implement control over the management and preservation of the site. The site was added to the UNESCO World Heritage list in 2007 (Samarra Archaeological City Citation2020; UNESCO Citation2020). Shared videos (3addn Citation2020; Samarra minaret Citation2020) taken in 2019 by a drone and from the ground beside few tourist images are used to successfully create the 3D model of the minaret (). The total number of drone images extracted is 107 images taken at a flying altitude of around 100 m with an average GSD of 10 cm/pixels. The reprojection error of the oriented images is 5 pixels with an average tie point multiplicity of 4 images. A dense point cloud of 2 million points is created from the images with a density of 60 pts/m2 and the 3D reconstructed model is shared in Sketchfab at the following link https://skfb.ly/6UpnE

Figure 7. 3D reconstructed model of Samarra great mosque minaret using drone and ground-based crowdsource video images

Figure 7. 3D reconstructed model of Samarra great mosque minaret using drone and ground-based crowdsource video images

The point cloud densities of the three reconstructed models are shown in where the density is color-coded for visualization. The higher density achieved in the third experiment of the Samarra mosque is caused by the addition of the high-resolution images taken by tourists from the ground.

Figure 8. The density of the reconstructed point clouds of the three heritage sites. a) The Great Mosque of Samarra. b) Rabban Hormizd Monastery. c) Taq Kasra

Figure 8. The density of the reconstructed point clouds of the three heritage sites. a) The Great Mosque of Samarra. b) Rabban Hormizd Monastery. c) Taq Kasra

is prepared to summarize the results in the three experiments in terms of number of videos used, number of extracted images, GSD, reprojection error, number of points, point density, processing time, and the tie point multiplicity.

Table 1. The summary of the results of the three experiments

Obviously, a powerful computer with a graphics processing unit GPU is more suitable in terms of processing time for the 3D reconstruction as shown in the first test of Taq Kasra site. Nevertheless, a large amount of images does not mean to have a higher density of points.

The achieved point densities (20–60 pts/m2) and attained positional accuracies (5–25 cm) in the three case studies are suitable for the general mapping of the Geographic information systems GIS and also for virtual tours as stated by the American Society of Civil Engineers ASCE guidelines (38–02) (Olsen et al. Citation2013; Standard Guideline for the Collection and Depiction of Existing Subsurface Utility Data (38-02) Citation2002).

4. Conclusions

Crowdsource and web-shared drone videos taken for the historical sites and heritage valuable objects are representing a potential source for the 3D documentation of these objects. Such data sources are highly valuable in places where heritage is at risk like in Iraq and Syria. In this paper, the concept of using these crowdsource videos for the 3D documentation of three historical sites in Iraq was presented. The 3D model was created for the famous Taq Kasra site () using the crowdsource videos. One thousand three hundred and fifty-three images were successfully oriented using the Metashape software tool and the RMSE at the local reference points was 27 cm. A 3D point cloud was created with an average density of 60 pts/m2.

The second experiment was applied for the monastery of Rabban Hormizd in northern Iraq (). Crowdsource videos taken from a drone at an altitude of about 150 m were used and only 404 images were processed. The images were successfully oriented with a projection error of 0.6 pixels and a dense point cloud was created with an average density of 36 pts/m2. The third experiment was applied for the Great Mosque of Samarra (). Crowdsource video taken from a drone at an altitude of about 100 m was used in a combination with tourist images taken from the ground. The images were successfully oriented with a projection error of 5 pixels and a dense point cloud was created with an average density of 60 pts/m2. All the reconstructed 3D models are shared on the Sketchfab site and a midlevel of details was represented in the three experiments as shown in where the point cloud densities were in the range of 25–60 pts/m2 and the attained positional accuracies (5–25 cm). According to the ASCE guidelines for utility and as-built surveying, the achieved accuracies and point densities are suitable for general mapping applications and virtual tours.

Worth mentioning how difficult to search and gather all the published videos and images regarding a specific cultural heritage site especially in different languages. Accordingly, managing a repository for volunteered persons to upload their drone videos and images is of high importance to guarantee an organized uploading and downloading process.

Another important issue is the role of professionals in the suggested workflow. Image-based modeling tools are available but the concept of using the crowdsource images for the 3D documentation should be applied by professionals because of the difficulty to 1) extract the sufficient number of images, 2) filter out redundant and blurry images, 3) apply the image orientation successfully, 4) integrate the different parts of the object 3D model, 5) scale properly the object to reality, etc. Accordingly, the proposed workflow is an integration between the community to collect the data and the specialists to process these data which enables further detailed applications like the historical preservation applications. This integration can be administrated by an academic institution, non-governmental organizations (NGOs), or by governmental associations.

Future work is invited to continue for the 3D documentation of the valuable heritage objects at risk using the crowdsource and volunteer videos taken from drones and the ground. The interested community in countries like Iraq is encouraged to continue the amateur drone video filming for historical sites and objects and to share them through the internet cloud services or social media.

Technically, consumer-grade drones equipped with 4K video of a resolution of 3840 × 2160 pixel cameras are increasing in the market. This is expected to add a significant improvement of the final 3D reconstructed models with point densities of more than 100 pts/m2 which enables for further detailed representation and extends the applications from virtual tours to include historical preservations.

Disclosure statement

No potential conflict of interest was reported by the author.

References