1,560
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Dimensionality reduction method for hyperspectral image analysis based on rough set theory

, ORCID Icon, , , &
Pages 192-200 | Received 26 Apr 2019, Accepted 18 Jun 2020, Published online: 02 Jul 2020

ABSTRACT

High-dimensional features often cause computational complexity and dimensionality curse. Feature selection and feature extraction are the two mainstream methods for dimensionality reduction. Feature selection but not feature extraction can preserve the critical information and maintain the physical meaning simultaneously. Herein, we proposed a dimensionality reduction method based on rough set theory (DRM-RST) for feature selection. We defined the hyperspectral image as a decision system, extracted the features as decision attributes, and selected the effective features based on information entropy. We used the Washington D.C. Mall dataset and New York dataset to evaluate the performance of DRM-RST on dimensionality reduction. Compared with full band classification, 184 or 185 redundant bands were removed in DRM-RST, respectively. DRM-RST achieved similar accuracy (overall accuracy >94%) by SVM classifier and reduced computing time by about 85%. We further compared the dimensionality reduction efficiency of DRM-RST against other popular methods, including ReliefF, Sequential Backward Elimination (SBE) and Information Gain (IG). The Producer’s accuracy (PA) and User’s accuracy (UA) of DRM-RST was greater than that of ReliefF and IG. DRM-RST showed greater stability of accuracy than SBE in dimensionality reduction when using for different datasets. Collectively, this study provides a new method for dimensionality reduction that can reduce computational complexity and alleviate dimensionality curse.

Introduction

Hyperspectral imaging is an emerging technology that integrates conventional imaging and spectroscopy to obtain a wealth of spectral information. It can provide the hyperspectral images with a large number of spectral bands. Hyperspectral imaging has advanced the application of remote sensing in many fields, such as estuarine and coastal water quality estimation (Brando & Dekker, 2003), wetland vegetation investigation (Adam et al., Citation2010) and coral reef feature identification (Mohanty et al., Citation2016). Hyperspectral imaging can detect and classify the spatial information at high accuracy. However, the high-dimensional feature often causes computational complexity and dimensionality curse (Ferrari et al., Citation2013; Gu et al., Citation2017; Sakarya, Citation2014). In many cases, it is not required to process the information of all spectral bands since many bands are highly correlated. Thus, it is required to remove redundant bands and decrease computational complexity.

Dimensionality reduction is an important pre-processing step in hyperspectral image analysis. It decreases computational complexity and ameliorates statistical ill conditioning by discarding redundant features. Two mainstream methods are available for dimension reduction, including feature extraction and feature selection (Bioucas-Dias et al., Citation2013; Yin et al., Citation2012). Feature extraction transforms the original features of hyperspectral images to construct lower-dimension feature spaces, which can be achieved by independent component analysis (ICA) (Wang, Citation2006), fisher’s linear discriminant analysis (FLDA) (Wu et al., Citation2017), graph embedding analysis (J.P. Guo et al., Citation2020) and principle component analysis (PCA) (Jolliffe, Citation1986). Feature extraction preserves the critical information of hyperspectral image. However, it alters the physical meaning of each spectral band (Dash & Liu, 2000).

Feature selection not only preserves the critical information of hyperspectral image, but also maintains the physical meaning of each spectral band. Several methods have been used for feature selection, including ReliefF (Peker et al., Citation2015), sequential forward search (SFS) (Lee et al., Citation2017) and sequential backward elimination (SBE) (Vergara & Estévez, Citation2014). ReliefF method estimates the optimal attributes relying on how well their values distinguish among the instances that are near each other (Kira & Rendell, Citation1992). SFS method selects the best attribute at first, then the second best, and so forth, trying to add the possible attribute to the one previously selected, until no improvement is achieved when adding a new attribute. SBE method aims to select a subset of size k among n variables (kn), which maximizes the performance of the predictor, removing one feature at a time until k features remain (Maldonado & Weber, Citation2009). ReliefF is highly dependent on data samples. Many small data samples are sampled from the data. By contrast, SFS and SBE choose the best band combination of hyperspectral images through the iterative process. The reduction performances of these methods, such as accuracy and computing time, display highly variable for different datasets (Huang & He, Citation2005).

As for feature selection, the selection of possible band combinations is a critical issue. Band combination selection is a vague and quantitative problem. Rough set theory is a mathematical tool for dealing with the vague, imprecise and uncertain data (Liu et al., Citation2010). It has been applied successfully in the fields characterized by vagueness and uncertainty (Cervante et al., Citation2013; Yellasiri et al., Citation2007; Z.H. Guo et al., Citation2016). However, the application of rough set theory in dimensionality reduction is still unavailable. In this study, we proposed a dimensionality reduction method for hyperspectral image processing based on rough set theory (DRM-RST). DRM-RST can reduce computational complexity and alleviate dimensionality curse during hyperspectral image processing.

Material and method

Material

The first hyperspectral image dataset was acquired using the Hyperspectral Digital Imagery Collection Experiment (HYDICE) sensor and collected over Washington D.C. Mall on 23 August 1995. This scene consists of 1280 × 307 pixels and covers the 401.2881 ~ 2473.16 nm range with 191 spectral bands at approximately 10-nm spectral resolution and 3-m spatial resolution. shows the image of the data cube at the spectral band #60, #27 and #17.

Figure 1. Hyperspectral image of Washington D.C. Mall

(Spectral band #60, 27 and 17). The original image is available at http://lesun.weebly.com/hyperspectral-data-set.html.
Figure 1. Hyperspectral image of Washington D.C. Mall

The second hyperspectral image dataset was acquired using the EO-1 Hyperion scene and collected over New York on 25 August 2001. This scene consists of 613 × 1335 pixels and covers the 400 ~ 2500 nm range with 198 spectral bands at approximately 10-nm spectral resolution and 30-m spatial resolution. shows the image of the data cube at the spectral band #29, #20 and #12.

Figure 2. Hyperspectral image of New York

(Spectral band #29, 20 and 12). The original image is available at https://earthexplorer.usgs.gov/.
Figure 2. Hyperspectral image of New York

Method

is the flow chart of DRM-RST method. The hyperspectral image is defined as a decision system, S*. The bands of hyperspectral image are recognized as the condition attributes, B. Then, the redundant bands are removed based on the discernibility matrix, which is deduced by the decision system, S*. Finally, the retained bands are ranked by calculating the information entropy of each band, and the required bands are selected according to the sequence of ranked bands.

Figure 3. Flow chart of DRM-RST method

Figure 3. Flow chart of DRM-RST method

Step 1: decision system definition

Hyperspectral image is defined as a decision system. Let S = ( U , A , V , F ) be an information system, where U = { x 1 , x 2 , , x n } is a finite set of objects (pixels in the image), which is called sample space. A = { a 1 , a 2 , , a m } is a set of attributes; a A determines a function f = U × A V . V = a B V a i , where V a i is the set of values of a i . A decision system is a particular case of information system. B is a condition attribute, D is a decision attribute, where A = B D and B D = .

The decision system is denoted as S = ( U , A = B D , V , f ) . A is a set of attributes, which the element of B = { B 1 , B 2 , , B m } is a condition attribute and D = { d i } is a decision attribute. B and D represent the set of spectral bands and the type of a sample pixel, respectively. is the structure of decision system.

Table 1. Decision system of hyperspectral image

Step 2: removing redundant bands

Redundant bands are removed based on the discernibility matrix. The discernibility matrix represents the sets of attributes that discerns the pairs of objects (Skowron & Rauszer, Citation1992). It calculates the attribute reduction in rough set theory. Attribute reduction is an essential operation of decision system (Shu & Qian, Citation2015). In the decision system S = ( U , A , V , f ) , B and D are the sets of attributes including equivalence relations over U. B-positive region can be defined as the Equationequation (1). It refers to that all objects of U can be classified to classes of U/D using the information contained within attribute B.

(1) P O S B ( D ) = x U / D B _ ( x ) (1)

If P O S B ( D ) = P O S ( B { a } ) ( D ) , the attribute a ( a i B ) is relatively dispensable, where B is a condition attribute. Otherwise, a is relatively indispensable in B.

Given B as a subset of B ( B B ), if every attributes a ( a B ) is indispensable in B. B is relatively independent in S . If B is relatively independent, and P O S B ( D ) = P O S B ( D ) , the subset B ( B B ) is called relative reduction in S . The collection of all relatively indispensable attributes in B is called the relative core of B, denoted as C O R E D ( B ) . C O R E D ( B ) is the union of matrix element defined as

(2) C O R E D ( B ) = { m i j B , 1 i , j n } (2)

The discernibility matrix is focused on the ability of attribute subsets to distinguish the objects belonging to different classes. Here, we use the discernibility matrix to remove the redundant information. Discernibility matrix has the dimension n × n , where n denotes the number of objects. An element m i j for an object pair ( x i , x j ) is defined as

(3) m i j = a | a B : V a ( x i ) V a ( x j ) , D ( x i ) D ( x j ) } , x i , x j U , o t h e r s (3)

Because of symmetric, only half elements of discernibility matrix is computed, and the retained elements of this discernibility matrix elements are . The meaning of the matrix element m i j is that objects x i and x j can be distinguished by any attributes in m i j . The pair ( x i , x j ) can be discerned if m i j . shows the part result of discernibility matrix.

Table 2. Part result of discernibility matrix of decision attribute

Discernibility function corresponding to discernibility matrix above is defined as follow:

(4) ρ = { m i j } (4)

Where “ ” (disjunction), “ ” (conjunction) are the propositional connectives in mathematical logic. They are read as “or” and “and”. Assume m = { B 1 , B 3 , B K } ,and m denotes B 1 B 3 B k , This function ρ is disjunctive normal form of conjunction m i j , which presents the removed bands (Yao & Zhao, Citation2009).

Step 3: hyperspectral band selection

Hyperspectral bands are selected based on the information entropy. The weight of retained bands is calculated based on the information entropy (Zhang & Jiang, Citation2008). According to the decision attribute (D), the sample space (U) is divided into different type of classifications. Given events D 1 , D 2 , , D m occurring with the probabilities P ( D 1 ) , P ( D 2 ) , , P ( D m ) , the average uncertainty associates with each event is defined by the information entropy

(5) H ( D ) = H ( D 1 , D 2 , , D m ) = i = 1 m P D i | U | log 2 P D i | U | (5)

The condition attribute (B) has n dimension. a is the value domain of B. For each a j ( j = 1 , 2 , n ), the information entropy is defined by

(6) H ( a ) = i = 1 m u i j | U | j = 1 n H ( u 1 j , , u m j ) = j = 1 n u 1 j + u 2 j + + u m j | U | H ( u 1 j , , u m j ) (6)

u j is the subset of U which contains all objects with the same value of a j . u i j is the number of the objects with the classification of D i type.

Based on the information entropy of each band, the retained bands are ranked in a descending sequence. The required bands can be selected according to the sequence of retained bands. Finally, a new spectral image with the lower dimension is created.

Experimental results

In this section, DRM-RST method is used for selecting the effective band combinations in the hyperspectral image. The result of dimensionality reduction is used for the classification of hyperspectral image. Two experiments were conducted to evaluate the performance of DRM-RST method on dimensionality reduction based on rough set theory.

In the experiment 1, DRM-RST method was compared against full band classification to estimate whether DRM-RST method preserved the critical information of hyperspectral images. In the experiment 2, DRM-RST method was compared against other dimensionality methods, including ReliefF, sequential backward elimination (SBE), and information gain (IG), to evaluate the performance of dimensionality reduction.

Dimensionality reduction for Washington D.C. Mall dataset by DRM-RST method

By visual interpretation, the hyperspectral image of Washington D.C. Mall sense was classified into 7 different land-covers, including water, artificiality, road, grass, shadow, forest and trail, which was considered as the value of decision attribute (D) of S*. And then, 10 training sampled pixels for each land-cover was selected as the training sampled pixels. The total number of training sampled pixels was 70, which accounted for 2/1000 of total pixels. shows the part result of decision system from hyperspectral image of Washington D.C. Mall dataset.

Table 3. Decision system of Washington D.C. Mall dataset

Based on the Equationequation (3), the discernibility matrix with the size of 70 × 70 was generated. The result of discernibility function ρ* with the Equationequation (4) was the removed bands. 107 redundant bands were removed, while 84 bands were retained as below.

ρ = ( m 1 , 2 ) ( m 1 , 3 ) ( m 190 , 191 )

={B1,B5,B8,B14,B17,B18,B20,B22,B27,B31~B35,B38,B40~B46,B48,B49,B50,B52~B58,B60,B62~B64,B66~B86,B88~B90,B92~B97,B101,B108,B110,B111,B113,B114~B117,B119,B120~B125,B128,B130,B150}

Based on the Equationequation 5 and Equation6, the information entropy for each retained band was calculated. The bands were ranked according to the value of information entropy. In this study, the Top 7 bands were selected (). The reason why we selected seven bands was that the selected bands could be sufficient to accommodate the specific problem.

Table 4. Top seven selected bands of Washington D.C. Mall dataset

Dimensionality reduction for New York scene dataset by DRM-RST method

The hyperspectral image of New York sense was classified into 6 different land-covers as the domain U and the land-covers, including water, artificiality, bare, beach and forest, form the decision attribute D. And then, 10 training sampled pixels for each land-cover was selected as the training sampled pixels. The total number of the training sampled pixels was 60, which accounted for 1/1000 of total pixels.

Based on the Equationequation (3), the discernibility matrix with the size of 60 × 60 was generated. The result of discernibility function ρ* with the Equationequation (4) was the removed bands. 163 redundant bands were removed, while 35 bands were retained as below.

ρ = ( m 8 , 9 ) ( m 8 , 10 ) ( m 223 , 224 )

={B17~B21,B23,B24,B26,B28,B29,B31,B32,B34~B39,B43,B45,B46,B49,B51,B57,B77~B79,B80,B82~B84,B88,B93,B115,B116}

And then, the 13 top bands were selected as shown in .

Table 5. Top 13 selected bands of New York dataset

Result of experiment 1: comparing against full band classification

The classification result of hyperspectral image by the Support Vector Machine classification (SVM) classifier was created through full band classification as the true value. We calculated the overall accuracy (OA) to estimate whether DRM-RST method preserved the critical information of hyperspectral images. Compared with full band classification, DRM-RST method had a comparable overall accuracy in dimensionality reduction of hyperspectral image. However, the computing time of DRM-RST method was significantly shorter than that of full band classification. When using DRM-RST method, the computing time for Washington D.C. Mall dataset or New York dataset was 16.46 s or 38.96 s, respectively. Compared with full band classification, DRM-RST method decreased the computational complexity, showing >85% reduction of computing time ().

Table 6. Comparison of computing time and overall accuracy between full band selection and DRM-RST method

Moreover, we used the length (the perimeter length of a land cover, L), area (the total area of a land cover, A), and shape index ( S I = L / A ) to further estimate whether DRM-RST method preserved the most critical information of hyperspectral images. Shape index was mathematically scale independent, based on the ratio of the perimeter to the square root of area (Jorge & Garcia, Citation1997). The result showed that the classification result of DRM-RST method was nearly similar as the result of full band classification (, 5, , and 8).

Figure 4. Classification result of Washington D.C. Mall dataset by full band classification and DRM-RST method. (a) Full band classification; (b) DRM-RST method

Figure 4. Classification result of Washington D.C. Mall dataset by full band classification and DRM-RST method. (a) Full band classification; (b) DRM-RST method

Figure 5. Classification results of New York dataset by full band classification and DRM-RST method. (a) Full band classification; (b) DRM-RST method

Figure 5. Classification results of New York dataset by full band classification and DRM-RST method. (a) Full band classification; (b) DRM-RST method

Table 7. Comparison of accuracy parameters of Washington D.C. Mall dataset by full band selection and DRM-RST method

Result of experiment 2: Comparing against other popular dimensionality reduction methods

We then compared the DRM-RST method against other popular dimensionality methods, including ReliefF, sequential backward elimination (SBE), information gain (IG) to estimate the performance of dimensionality reduction. We used two different parameters, Producer’s accuracy (PA) and User’s accuracy (UA), to estimate the performance of dimensionality reduction. shows the selected bands by different dimensionality reduction methods for the Washington D.C. Mall dataset and New York scene dataset. and 10 shows the result of Producer’s accuracy (PA) and User’s accuracy (UA) for the two datasets respectively, which were classified by SVM classifier.

Table 8. Comparison of accuracy parameters of New York dataset by full band selection and DRM-RST method

Table 9. Selected bands by different dimensionality reduction methods

Table 10. Comparison of PA and UA of Washington D.C. Mall dataset by different hyperspectral image reduction methods

From and 11, the PA and UA of DRM-RST method was greater than that of ReliefF and IG. DRM-RST method showed greater stability of reduction accuracy than SBE method. As for Washington D.C. Mall dataset, the PA and UA of DRM-RST method was approximate to the result of SBE method. As for New York dataset, the PA and UA of DRM-RST method was much greater than that of SBE method. These results suggest that DRM-RST is not dependent on the types of datasets and shows good stability of reduction accuracy.

Table 11. Comparison of PA and UA of New York dataset by different hyperspectral image reduction methods

Conclusion

The high-dimensional data poses many problems, such as computational complexity, dimensionality curse, etc. (Ferrari et al., Citation2013; Gu et al., Citation2017). Due to the dimensionality curse and the diminishment of specificity in similarities between points in the hyperspectral image, the complexity of existing methods is exponential with respect to the number of dimensions (Sakarya, Citation2014). With increasing dimensionality, the existing methods become computationally intractable and inapplicable in the real application. Therefore, dimension reduction becomes a critical issue for hyperspectral image processing.

Feature selection is one of the mainstream methods for dimension reduction. It not only preserves the critical information but also maintains the physical meaning of hyperspectral image. The critical issue for feature selection is the selection of possible band combinations. However, band selection is a vague problem.

The major innovation of this study is that rough set theory is employed for dimension reduction of hyperspectral data. Rough set theory is an important method for vague, imprecise and uncertain data processing. In this study, we proposed a feature selection method for hyperspectral data analysis based on rough set theory (DRM-RST). Compared with full band classification, DRM-RST can remove the redundant bands and reduce computational burden. Compared with other popular dimensionality reduction methods, DRM-RST shows better reduction accuracy than IG and ReliefF and has greater stability of accuracy than SBE method in hyperspectral data analysis. Collectively, rough set theory-based feature selection has great potentials for dimensionality reduction of hyperspectral data.

Acknowledgments

This work was generously supported by the grants from the Capacity Development for Local College Project (Grant No. 19050502100 to W.Z.H.). The authors declare no conflicts of interest.

Disclosure statement

The authors declare no conflict of interest.

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [41501419,41671431]; Capacity Development for Local College Project [17050501900, 19050502100].

References