Search in:

GIScience & Remote Sensing Volume 56, 2019 - Issue 8

Submit an article Journal homepage

Free access

718

Views

CrossRef citations to date

Altmetric

Listen

Articles

A probabilistic fusion of a support vector machine and a joint sparsity model for hyperspectral imagery classification

Qishuo GaoSchool of Civil and Environmental Engineering, University of New South Wales, Sydney, NSW, AustraliaCorrespondence[email protected]

https://orcid.org/0000-0002-9249-4065

Samsung LimSchool of Civil and Environmental Engineering, University of New South Wales, Sydney, NSW, Australia

Pages 1129-1147 | Received 17 Oct 2018, Accepted 21 May 2019, Published online: 30 May 2019

Cite this article
https://doi.org/10.1080/15481603.2019.1623003
CrossMark

In this article

1. Introduction
2. Fusion of support vector machine and joint sparsity model
3. Experimental results and impact of parameters
4. Conclusion
Acknowledgements
Disclosure statement
References

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Hyperspectral imagery (HSI) is now in use for a wide range of applications such as land cover classification, climate change studies and environmental monitoring; however, the acquisition of HSI is still costly, and the curse of dimensionality i.e. the phenomenon that the amount of required training samples increases exponentially as the dimensionality increases linearly, makes it difficult to exploit the full potential of machine learning in HSI classification. To resolve the problem, we propose a novel framework for spatial-spectral HSI classification in this article. A soft support vector machine (SVM) and a probabilistic joint sparsity model (JSM) are proposed to compute a posteriori probabilities of the test pixels, respectively; and the probability scores are then fused by a linear opinion pool. Furthermore, a Markov random field (MRF) model is used as a maximum a posteriori (MAP) segmentation method for further regularization of the neighbor information to derive the labels for pixels. Extensive experiments conducted on three commonly-used benchmarking data sets show that the proposed probabilistic fusion method outperforms a number of well-known spatial-spectral HSI classification techniques.

Keywords:

Hyperspectral imagery (HSI)
classification
support vector machine (SVM)
joint sparsity model (JSM)
Markov random field (MRF)
decision fusion

1. Introduction

Hyperspectral imagery (HSI) has been known as a useful source for understanding the scene of remotely sensed data. Many methods have been introduced to perform HSI classification using the rich spectral information available in the images. For example, random forest (RF) (Ham et al. Citation2005), support vector machines (SVMs) (Mountrakis, Im, and Ogole Citation2011; Melgani and Bruzzone Citation2004) and neural network (NN) (Ratle, Camps-Valls, and Weston Citation2010) have achieved some competitive results. However, the insufficient training samples in contrast to the high dimensionality are the main obstacles to achieving high-accuracy HSI classification (Plaza et al. Citation2009). The higher dimensionality of HSI leads to the curse of dimensionality which is also known as Hughes phenomenon that the amount of required training samples increases exponentially as the dimensionality increases linearly (Landgrebe Citation2005).

Figure 4. Classification maps obtained by different classifiers for Data II with 9 classes: (a) SVM-EMAP; (b) SVM-MRF; (C) SR-MRF; (d) JSM; (e) JSM-MRF; (f) Proposed.

Figure 7. Classification maps of Data II after SVM, JSM and fusion.

On the other hand, SVMs with kernel functions can be applied to overcome the dimensionality problem by projecting the pixels into a high dimensional feature space and the decision boundary can be finally expressed as a function of a subset of training samples (Gualtieri and Cromp Citation1999; Huang, Davis, and Townshend Citation2002; Gao et al. Citation2015). In fact, SVMs have been widely used in HSI applications in this context. For example, (Xun and Wang Citation2015) adopted an object-based SVM method that incorporates an optimal segmentation scale estimation strategy for mapping salt cedar with QuickBird imagery. (Lin and Yan Citation2016) combined a polynomial kernel function and a radial basis kernel function to form a new kernel function-based SVM model for HSI classification. (Chu et al. Citation2016; Zhang, Smith, and Fang Citation2018) utilized SVMs to classify the fused images of lidar and HSI for tea classification and urban land-cover mapping, respectively. (Tesfamichael et al. Citation2018) used SVM to identify plant invasions by classifying raw hyperspectral and simulated multispectral images. Moreover, due to the presence of mixed pixels in the data, probabilistic SVMs have been used for the estimation of class-wise probability for each pixel in HSI classification (Villa et al. Citation2011; Jia et al. Citation2010). This kind of classification is also known as soft classification. The results from a soft classification method can induce the probability of the presence of each class in the pixel, and the results can be further processed by other probabilistic methods.

As can be seen in (Srinivas et al. Citation2013; Chen, Nasrabadi, and Tran Citation2013; Fu et al. Citation2017), another category of strategies to deal with the high dimensionality of HSI has been exploited by using sparse representation (SR). SR assumes that HSI pixels that belong to the same class lie in a same subspace formed from the training samples. In SR, a so-called “dictionary” is constructed by the training samples, and the labeling of test pixels can be determined by recovering the coefficients corresponding to the weight values of the selected elements from the dictionary. In order to better explore the merits of SR in HSI classification, a joint sparsity model (JSM) (Chen, Nasrabadi, and Tran Citation2011) was proposed to process the spatial contextual information of HSI. JSMs assume that a test pixel has the similar properties with its neighboring pixels. Based on the SR theory, the neighboring pixels can be simultaneously expressed by a few elements selected from the defined dictionary. The spatial information is utilized in this way by involving the neighboring pixels. JSMs have achieved very competitive results for HSI classification (Chen, Nasrabadi, and Tran Citation2011; Fang et al. Citation2014) in recent years. In the report of (Gao, Lim, and Jia Citation2018a), a JSM was adopted as a probabilistic classifier, and then a probabilistic relaxation method was successfully applied to improve the results for HSI classification. Some variations such as (Tu et al. Citation2018b, Citation2018a) have also been successfully applied for this classification topic.

It is well known that Markov random fields (MRFs) can help smooth over the hyperspectral data and reduce the noise. After a soft classification, an MRF is usually applied to explore the continuity of labels of neighboring pixels. The involvement of spatial contextual information is realized by assigning an appropriate weight to the spatial contribution of neighboring pixels around the central pixel in the minimization of an energy function. It is important to choose an appropriate MRF representation for the labeling process, and the maximum a posteriori (MAP) strategy has been widely used to represent the MRF energy function (Cao et al. Citation2017; Li, Bioucas-Dias, and Plaza Citation2012). In (Tarabalka et al. Citation2010; Moser and Serpico Citation2013), the results obtained by a pixel-wise probabilistic SVM was refined by some variations of MRFs. In the work of (Cao et al. Citation2017), a MRF was used to model the local spatial correlation of neighboring pixels and refine the probabilistic results based on the 3-dimensional discrete wavelet transform features that were obtained by a soft SVM. In (Zhang et al. Citation2018), semantic representation-based multiple features were extracted by a probabilistic SVM and an extended MRF was proposed to process the information in a semantic space. Probabilistic SR has also been integrated with an MRF (Xu and Li Citation2014) recently for high-accuracy HSI classification. More information about recent advances of this technique can be found in (Ghamisi et al. Citation2018).

SVMs and JSMs are both successful machine learning techniques for HSI classification. However, SVMs in HSI classification are prone to errors since classes are likely to lie in a lower space than the original data dimension where SVMs cannot have sufficient training samples. Moreover, the constructed dictionary for a JSM may not be complete, which can lead to a compromised classification accuracy, and the over-smoothing effect limits the performance of JSMs in non-homogenous areas. To alleviate the negative impact of these problems, and to keep a balance between the high dimensionality and the limited training samples in HSI classification, we propose a novel framework that integrates both JSM and SVM methods in a probabilistic way. Specifically, an SVM is applied to process the spectral information and obtain a global probability distribution for all test pixels. Then, a JSM is introduced to integrate the spatial information and obtain a probability distribution for test pixels with local information involved. That is, we use the SVM and JSM methods to obtain a posteriori probabilities and fuse them with a linear opinion pool. Finally, an MRF as a maximum a posteriori (MAP) segmentation method is applied to regularize the final probabilities and model the spatial contextual information. The probability distribution obtained by the proposed SVM is based on the global spectral information, and the proposed JSM integrates the spatial information in a local neighborhood for the test pixels. The spatial contextual information characterized by an MRF-MAP can further help reduce the negative effect of insufficient training samples. Therefore the main contributions of this work are twofold: 1) taking the probabilistic information at spectral and spatial levels into account, and 2) the fusion strategy along with the characterization of spatial information that can maximize the classification accuracy.

The rest of this paper is composed of three sections: Section 2 describes the proposed framework, Section 3 shows the experimental results as well as discussion of some relevant parameters, and Section 4 draws the conclusion.

2. Fusion of support vector machine and joint sparsity model

The workflow of the proposed framework is shown in in detail. It comprises four steps: 1) generation of a probabilistic distribution of the test pixels via a probabilistic SVM; 2) generation of a posteriori probability distribution of the test pixels via a probabilistic JSM; 3) fusion of the probabilistic distributions obtained in the previous steps; 4) regularization via an MRF-MAP. The details for each step are presented in the following subsections.

Figure 1. The illustration of the proposed fusion algorithm that integrates JSM and SVM in a probabilistic sense for HSI classification.

2.1. Probabilistic joint sparsity model

According to the principle that HSI pixels tend to have a common sparsity pattern with their neighbors, the spatial correlation among pixels can be exploited by a JSM (Chen, Nasrabadi, and Tran Citation2011).

Let $I$ be a HSI cube, and $x_{i} \in R^{L \times 1}$ be a pixel in the image cube with $L$ representing the spectral dimensionality. Based on SR, $x_{i}$ can be expressed as:

(1)

x_{i} = D r_{i}

(1)

where $D = [D_{1}, . . ., D_{c}, . . ., D_{C}] \in R^{L \times N}$ is the structural dictionary constructed from the training samples, $D_{c} \in R^{L \times N_{c}}$ is the c-th sub-dictionary, and $c = 1, . . ., C$ denotes the predefined class labels. $N = \sum_{c = 1}^{C} N_{c}$ denotes the total number of elements in the whole dictionary, $N_{c}$ corresponds to the number of elements in the c-th sub-dictionary, and $r_{i} \in R^{N \times 1}$ denotes the sparse coefficient vector for the pixel $x_{i}$ .

The sparse coefficient vector $r_{i}$ can be computed as:

(2)

{\hat{r}}_{i} = \underset{r_{i}}{arg min} {∥x - D r_{i}∥}_{2} s u b j e c t t o {∥r_{i}∥}_{0} \leq K

(2)

where $K$ and ${∥r_{i}∥}_{0}$ are the upper bound and the total number of nonzero rows in $r_{i}$ , respectively.

A JSM (Chen, Nasrabadi, and Tran Citation2011) that extends Equation (1) to a neighborhood region was proposed on the assumption that the pixel $x_{i}$ shares a common sparsity pattern with its neighboring pixels. Let $X = {x_{i}}, i = 1, 2, . . ., W \times W$ be the pixels within a predefined neighborhood region with a center at $x_{i}$ , and $W \times W$ is the predefined neighborhood region size. Equation (1) can be re-defined as:

(3)

\begin{matrix} X = [D r_{1}, D r_{2}, . . ., D r_{W \times W}] \\ = D R \end{matrix}

(3)

where $R \in R^{N \times W^{2}}$ denotes the sparse coefficient matrix, and it is recovered in this paper as:

(4)

\hat{R} = \underset{R}{arg min} {∥X - D R∥}_{F} s u b j e c t t o {∥R∥}_{r o w, 0} \leq K

(4)

where ${∥∥}_{F}$ denotes the Frobenius norm, while ${∥R∥}_{r o w, 0}$ is the representative non-zero coefficients in $R$ . After $\hat{R}$ is obtained, the pixel $x_{i}$ can be labeled as a class that has the minimal reconstruction error:

(5)

\begin{aligned} e_{i, c} = {∥X - D_{c} {\hat{R}}_{c}∥}_{F} \\ y_{i} = \underset{c}{arg min} e_{i, c} \end{aligned}

(5)

where $e_{i, c}$ denotes the reconstruction error for $x_{i}$ associated with the c-th class, $y_{i}$ refers to the label of $x_{i}$ , and ${\hat{R}}_{c}$ is the sparse coefficient corresponding to the c-th class. The optimization problem of Equation (5) is solved in this paper by a simultaneous orthogonal matching pursuit (SOMP) (Chen, Nasrabadi, and Tran Citation2011) algorithm.

The main aim of using the JSM in this paper is to obtain a class-wise probabilistic distribution of the pixel $x_{i}$ . Because the pixel $x_{i}$ is likely to be assigned to the label corresponding to the minimum reconstruction error, the posterior probability corresponding to each class is likely to be inversely proportional to the reconstruction error (Li, Zhang, and Zhang Citation2014; Gao, Lim, and Jia Citation2018a):

(6)

p_{i, c} = \frac{1}{σ e_{i, c}}

(6)

where $p_{i, c}$ denotes the c-th class-specific posterior probability of the pixel $x_{i}$ , and $σ$ denotes the normalized constant. In this way, the class-wise posterior probability distribution for all test pixels can be computed by the proposed probabilistic JSM. In this paper, $σ$ is set as 1.

2.2. Probabilistic SVM

A typical kernel-based SVM classifier can be defined by:

(7)

f (x_{i}) = \sum_{j} α_{j} y_{j} K (x_{i}, x_{j}) + b

(7)

where $x_{i}$ and $x_{j}$ represent HSI pixels, $b$ indicates the bias, $α_{j} \in R^{n \times 1}$ denotes the Lagrange multiplier with $n$ being the number of training samples. $K (x_{i}, x_{j})$ represents a linear or nonlinear function of the input test pixel. In this paper, a Gaussian radial basis function kernel is applied:

(8)

K (x_{i}, x_{j}) = exp (- γ {∥x_{i} - x_{j}∥}^{2})

(8)

where $γ$ denotes the width control parameter. A probabilistic SVM implemented by the libsvm library (Chang and Lin Citation2011) is used in this work.

2.3. Decision fusion

The probability distributions respectively obtained by the probabilistic JSM and SVM are combined to learn the final probability distribution. For this purpose, a linear opinion pool (Carvalho and Larson Citation2013) is applied:

(9)

p ({y_{i}}_{c} | x_{i}) = λ p_{i c}^{J S M} + (1 - λ) p_{i c}^{S V M}

(9)

where $p_{i c}^{J S M}$ and $p_{i c}^{S V M}$ are the probabilities for the c-th class obtained by the JSM and SVM for the test pixel, respectively; $λ$ is a weight parameter which controls the influence of the two items in Equation (9). Then the probability distribution of the test pixel can be obtained as $p (y_{i} | x_{i}) = [p (y_{i 1} | x_{i}), . . ., p (y_{i c} | x_{i}), . . ., p (y_{i C} | x_{i})]$ . It should be noted that $0 \leq λ \leq 1$ . If $λ = 1$ , only the probabilistic JSM is considered. If $λ = 0$ , the results remain as the ones obtained by probabilistic SVM. The impact of the factor $λ$ will be discussed in Section 3.

2.4. Maximum a posteriori segmentation

In this section, we apply an MRF to further model the spatial contextual information (Li, Bioucas-Dias, and Plaza Citation2012) and derive the final labels for the test pixels. The MRF work is expressed as a MAP segmentation method in this paper, and it is defined as a minimization of a typical energy function:

(10)

{\hat{y}}_{i} = \underset{y_{i}}{arg max} (\sum_{j \in N (i)} log p (y_{i} | x_{i}) + log p (y))

(10)

where $N (i)$ is a defined neighborhood region for the test pixel $x_{i}$ , and $p (y_{i} | x_{i})$ denotes the energy function observed from the data. For the proposed framework, $p (y_{i} | x_{i})$ is the posterior probability distribution results obtained by the decision fusion, and $p (y)$ is the spatial energy. In this paper, the spatial energy term is modelled by a multi-level logistic (MLL) (Li, Bioucas-Dias, and Plaza Citation2013) approach:

(11)

p (y) = e^{μ \sum_{i \sim j} δ (y_{i} - y_{j})}

(11)

where $μ$ controls the balance between the two terms in Equation (10). $δ (y_{i} - y_{j})$ is a unit pulse function where it is equal to 1 when $y_{i}$ and $y_{j}$ have the same value, otherwise, it is equal to 0. According to Equation (11), Equation (10) can be rewritten as:

(12)

{\hat{y}}_{i} = \underset{y_{i}}{arg max} (\sum_{j \in N (i)} log p (y_{i} | x_{i}) + μ \sum_{i \sim j} δ (y_{i} - y_{j}))

(12)

Based on the aforementioned MRF-MAP approach, the label of the test pixel $x_{i}$ can be finalized by maximizing the probability distribution. As shown in , this MRF-MAP work is applied to the result obtained by the decision fusion step.

2.5. Graph cut model

Equation (12) can be treated as a multiple label problem in a graph cut model which has a form of:

(13)

U = U_{d a t a} + U_{s m o o t h}

(13)

where $U_{d a t a}$ describes the data cost energy denoting the disagreement between the test pixel $x_{i}$ and its label $y_{i}$ , $U_{s m o o t h}$ refers to the spatial energy which measures the spatial coherence between the test pixel and its neighbors in the predefined neighborhood region. In our case, $U_{d a t a}$ and $U_{s m o o t h}$ are equivalent to the two terms in Equation (12), respectively. The $α$ -expansion (Kolmogorov and Zabin Citation2004) is employed to solve this optimization problem.

The computational complexity of the proposed framework is the sum of the three individual steps. The SVM and JSM have a polynomial complexity with respect to the number of training samples, and the adopted $α$ -expansion has a polynomial time complexity of the number of nodes and edges in the graph (Juan and Boykov Citation2007; Boykov and Kolmogorov Citation2004) . The worst case running time complexity is $O (m n^{2})$ where $n$ is the number of nodes i.e. random variables and $m$ is the number of edges i.e. the connection of the variables in the graph.

3. Experimental results and impact of parameters

Data sets, experimental settings and results are described in this section. For the evaluation purposes, we tested the proposed framework on three widely used HSI data sets (Campbell and Wynne Citation2011) in different analysis scenarios.

3.1. Data sets

Three data sets were tested in this paper: 1) the airborne visible/infrared imaging spectrometer (AVIRIS) Indian Pines data set (Data I); 2) the reflective optics system imaging spectrometer (ROSIS) University of Pavia data set (Data II); and 3) the AVIRIS Salinas data set (Data III). The information about the data sets is summarized below:

Data I has a spatial resolution of 20 m. It consists of 145 $\times$ 145 pixels with 220 spectral bands in the range of 0.4–2.5 $μ$ m. 20 water absorption bands were removed prior to the experiments. Data I contains 16 labeled classes.

Data II consists of 610 $\times$ 340 pixels and nine labeled classes with a spatial resolution of 1.3 m. 103 spectral bands from 0.43 to 0.86 $μ$ m are used in the experiments.

Data III consists of 512 $\times$ 217 pixels. Each pixel has 204 data channels with 20 bands removed prior to the experiments due to the water absorption. The spatial resolution of this data set is 3.7 m. 16 labeled classes are used as reference in this paper.

In the experiments, the training samples for each class were randomly selected, and the remaining were used as the test set. The class information for the three data sets and the numbers of training and test samples are listed in -, respectively. shows the ground truth and false color images of the data sets.

Table 1. Sixteen ground truth classes and the numbers of training and test samples for Data I.

Display Table

Table 2. Nine ground truth classes and the numbers of training and test samples for Data II.

Display Table

Table 3. Sixteen ground truth classes and the numbers of training and test samples for Data III.

Display Table

Figure 2. False color composite image with bands 50–27-17 (left) and ground truth (right) for three data sets: (a) Data I; (b) Data II; (c) Data III.

3.2. Experimental setting

The proposed framework is compared with several spectral-spatial methods to evaluate the performance. SVM with extended multi-attribute profiles (EMAP) (referred to as SVM-EMAP) (Mura et al. Citation2011), probabilistic SVM with MRF (referred to as SVM-MRF), JSM with SOMP (referred to as JSM) (Chen, Nasrabadi, and Tran Citation2011), probabilistic SR with MRF (referred to as SR-MRF), and probabilistic JSM with MRF (referred to as JSM-MRF) are used as the benchmarks. SVM was implemented using the libsvm library with the Gaussian kernel (Chang and Lin Citation2011). The spatial information for SVM-EMAP features was extracted by the technique presented in (Dalla Mura et al. Citation2011). The sparsity level $K$ was set as 3 for all three data sets as suggested in (Chen, Nasrabadi, and Tran Citation2011; Ghamisi et al. Citation2018). The optimal values for the neighborhood region are set differently for the three data sets (i.e. 7 $\times$ 7 for Data I, 11 $\times$ 11 for Data II, and 15 $\times$ 15 for Data III) as demonstrated in (Gao, Lim, and Jia Citation2018b). The weight parameter was set to 0.2, 0.3 and 0.6 for the three data sets from cross-validation using the leave-one-out strategy, respectively. Three well-known quantitative metrics, namely overall accuracy (OA), average accuracy (AA) and kappa coefficient (k) are selected for the quantitative validation in this paper. Ten random sampling-based repeated experiments are conducted for each data set.

3.3. Experimental results

The experimental results for Data I are illustrated in and . As can be observed from the classification maps in , the spectral-spatial classifiers produced relatively smooth and accurate classification results. However, SVM-EMAP and SR-MRF produced comparatively nosily results and failed to detect some meaningful areas, such as the near-boundary regions. Although JSM, SVM-MRF and JSM-MRF performed the classification task reasonably well, noisy appearance is still evident on the classification maps. In contrast, the proposed method reduced the noise and preserved the near-boundary regions. The quantitative results tabulated in clearly show that the proposed method outperformed the other classifiers with respect to the three metrics. It can be observed that the accuracy of our method is 1.54% and 4.49% higher than JSM-MRF and SVM-MRF, respectively, which confirms the effectiveness of the proposed fusion strategy.

Table 4. Classification results obtained by different classifiers on Data I for each class.

Download CSV Display Table

Figure 3. Classification maps obtained by different classifiers for Data I with 16 classes: (a) SVM-EMAP; (b) SVM-MRF; (C) SR-MRF; (d) JSM; (e) JSM-MRF; (f) Proposed.

The classification results on the other two data sets are shown in - and –, respectively. Compared with SVM-MRF and JSM-MRF, our method improved the overall accuracy up to 12.30% and 2.17% for Data II, and 1.92% and 0.4% for Data III, respectively. The visual inspection from the classification maps is consistent with the resultant tables. The fusion strategy smoothed the classification maps, however, it tends to over-smooth some of the boundary lines within each classified region, which can also be observed from the classification maps. It also should be noticed that there are still some small misclassified patches because of the limited number of training samples.

Table 5. Classification results obtained by different classifiers on Data II for each class.

Download CSV Display Table

Table 6. Classification results obtained by different classifiers on Data III for each class.

Download CSV Display Table

Figure 5. Classification maps obtained by different classifiers for Data III with 16 classes: (a) SVM-EMAP; (b) SVM-MRF; (C) SR-MRF; (d) JSM; (e) JSM-MRF.

3.4. Discussion

It is more sensible to compare the results obtained by SVM-MRF, SR-MRF, JSM-MRF and the proposed method, because they are all MRF-based probabilistic methods. Moreover, SVM-MRF performed better than SR-MRF which can be observed from the resultant tables. All in all, the proposed method achieved the highest accuracy on all three benchmark data sets with respect to the widely used quantitative metrics. - show that the results obtained by other classifiers vary from classifier to classifier and from data set to data set, and one can conclude from the results that the proposed method is highly reliable and stable regardless of data sets.

In order to help understanding of the scheme of the proposed framework, - illustrate the maps after each step i.e. SVM, JSM, decision fusion and the MRF-MAP work for the three data sets. It can be observed that the maps obtained by SVM are more noisy due to the lack of consideration of spatial information, while the ones obtained by JSM are smoother. However, due to the over-smoothing effect, the classification maps by JSM in near-edge areas are blurred. Although the classification maps after decision fusion have more scattered points than JSM in homogenous areas, the near-edge areas are clearly refined. Therefore, the MRF-MAP work can be effectively used to refine the results in homogenous areas and obtain a more accurate classification map for the whole area.

Figure 6. Classification maps of Data I after SVM, JSM and fusion.

Figure 8. Classification maps of Data III after SVM, JSM and fusion.

The proposed method eliminates the noise and enhances the results from SVM and JSM via the fusion strategy. These aspects together with the joint characterization of spatial-contextual information by MRF-MAP make the framework prominent in integrating local and global probabilities. To draw a conclusion, the proposed method delivers more uniform results. When compared to the results in (Li et al. Citation2015; Chen, Nasrabadi, and Tran Citation2011), the proposed method shows consistently better performance for the three data sets. The fusion method takes the advantages of both SVM and JSM and can overcome the situation in which one of the methods does not provide good performance. This is also the reason why the proposed method can provide better performance than the comparative classifiers.

3.5. Impact of parameters

This section explores the impact of training samples on the experimental results. To minimize possible sampling bias, the experiments were run ten times repeatedly with randomly chosen training samples each time. The OA values under different conditions of training numbers are evaluated. For Data I, 5% to 40% of the total samples were used as training samples. For Data II and Data III, the number of training samples varies from 50 to 400. The remaining is used as test sets. depicts the results for different classification approaches with different numbers of training samples. As can be observed from the figure, the performances of SVM-EMAP, SVM-MRF, SR-MRF, JSM, JSM-MRF and the proposed methods generally improve as the numbers of training samples increase. SVM-EMAP shows the least stability with some small fluctuations in the experiments. It can be also seen that SVM-MRF performs better on Indian Pines than the other two data sets, and this may be due to the fact that the mixed pixels present in this data set are more helpful in the class discrimination of SVM-MRF. JSM-MRF can achieve a very high accuracy for all three data sets with an adequate number of training samples, and it benefits from the spatial information being considered in both the probabilistic classification and post-processing. In addition, the proposed classifier can consistently achieve the best results on all training samples. The trends observed from also confirm that our method is working effectively even if a small number of training samples are available, which is very promising for practical applications.

Figure 9. The impact of training samples on OA results of the test methods for three test data sets.

4. Conclusion

We proposed a novel framework which integrates the SVM and JSM methods in a probabilistic sense for HSI classification. The proposed framework focuses on different contributions of the two methods. SVM is used to obtain the global probability distribution for all pixels, and JSM is used to determine better representation of test pixels than the common SR-based methods by exploiting the spatial correlations among neighboring pixels. In this paper, JSM is applied to compute the reconstruction errors of different classes, and then the posterior probability distributions corresponding to the different classes are obtained by an inverse formula of the reconstruction errors. Under the assumption of the independence of each class, the posterior probabilities are used as prior label probability distributions in MRF, and the final labels are derived by solving a MAP problem. The proposed framework maintains the complementary information of different techniques and keeps the spatial smoothness of the images. The validation on three benchmark data sets shows that the proposed method yielded a very high accuracy, even though a smaller number of training samples are used.

Highlights

We developed and applied a new framework for HSI classification.
The new method consists of a SVM, a JSM and a MRF.
The results of SVM and JSM are fused by a linear opinion pool.
A MRF is applied to regularize the final results.
The proposed method performs very well with limited training samples.

Acknowledgements

The authors would like to thank Prof. D. Landgrebe from Purdue University, for providing the free downloads of the hyperspectral AVIRIS data set, Prof. Paolo Gamba from the Telecommunications and Remote Sensing Laboratory for providing the Pavia University data set, the California Institute of Technology for providing the Salinas data set. The authors would like to thank the Associate Editor and anonymous reviewers for their careful reading and valuable comments which were very helpful to improve this paper.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

Boykov, Y., and V. Kolmogorov. 2004. “An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision.” IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (9): 1124–1137. doi:10.1109/TPAMI.2004.60.
PubMed Web of Science ®Google Scholar
Campbell, J. B., and R. H. Wynne. 2011. Introduction to Remote Sensing. New York, NY: Guilford Press.
Google Scholar
Cao, X., X. Lin, D. Meng, Q. Zhao, and X. Zongben. 2017. “Integration of 3-Dimensional Discrete Wavelet Transform and Markov Random Field for Hyperspectral Image Classification.” Neurocomputing 226: 90–100. doi:10.1016/j.neucom.2016.11.034.
Web of Science ®Google Scholar
Carvalho, A., and K. Larson. 2013. “A Consensual Linear Opinion Pool.” Paper presented at the IJCAI, Beijing, China.
Google Scholar
Chang, C.-C., and C.-J. Lin. 2011. “LIBSVM: A Library for Support Vector Machines.” ACM Transactions on Intelligent Systems and Technology (TIST) 2 (3): 27.
Web of Science ®Google Scholar
Chen, Y., N. M. Nasrabadi, and T. D. Tran. 2011. “Hyperspectral Image Classification Using Dictionary-Based Sparse Representation.” IEEE Transactions on Geoscience and Remote Sensing 49 (10): 3973–3985. doi:10.1109/TGRS.2011.2129595.
Web of Science ®Google Scholar
Chen, Y., N. M. Nasrabadi, and T. D. Tran. 2013. “Hyperspectral Image Classification via Kernel Sparse Representation.” IEEE Transactions on Geoscience and Remote Sensing 51 (1): 217–231. doi:10.1109/TGRS.2012.2201730.
Web of Science ®Google Scholar
Chu, H.-J., C.-K. Wang, S.-J. Kong, and K.-C. Chen. 2016. “Integration of Full-Waveform LiDAR and Hyperspectral Data to Enhance Tea and Areca Classification.” GIScience & Remote Sensing 53 (4): 542–559. doi:10.1080/15481603.2016.1177249.
Web of Science ®Google Scholar
Fang, L., L. Shutao, X. Kang, and J. A. Benediktsson. 2014. “Spectral–Spatial Hyperspectral Image Classification via Multiscale Adaptive Sparse Representation.” IEEE Transactions on Geoscience and Remote Sensing 52 (12): 7738–7749. doi:10.1109/TGRS.2014.2318058.
Web of Science ®Google Scholar
Fu, W., L. Shutao, L. Fang, and J. A. Benediktsson. 2017. “Adaptive Spectral–Spatial Compression of Hyperspectral Image with Sparse Representation.” IEEE Transactions on Geoscience and Remote Sensing 55 (2): 671–682. doi:10.1109/TGRS.2016.2613848.
Web of Science ®Google Scholar
Gao, L., L. Jun, M. Khodadadzadeh, A. Plaza, B. Zhang, H. Zhijian, and H. Yan. 2015. “Subspace-Based Support Vector Machines for Hyperspectral Image Classification.” IEEE Geoscience and Remote Sensing Letters 12 (2): 349–353. doi:10.1109/LGRS.2014.2341044.
Web of Science ®Google Scholar
Gao, Q., S. Lim, and X. Jia. 2018a. “Hyperspectral Image Classification Using Joint Sparse Model and Discontinuity Preserving Relaxation.” IEEE Geoscience and Remote Sensing Letters 15 (1): 78–82. doi:10.1109/LGRS.2017.2774253.
Web of Science ®Google Scholar
Gao, Q., S. Lim, and X. Jia. 2018b. “Improved Joint Sparse Models for Hyperspectral Image Classification Based on a Novel Neighbour Selection Strategy.” Remote Sensing 10 (6): 905. doi:10.3390/rs10060905.
Web of Science ®Google Scholar
Ghamisi, P., E. Maggiori, L. Shutao, R. Souza, Y. Tarablaka, G. Moser, A. De Giorgi, L. Fang, Y. Chen, and M. Chi. 2018. “New Frontiers in Spectral-Spatial Hyperspectral Image Classification: The Latest Advances Based on Mathematical Morphology, Markov Random Fields, Segmentation, Sparse Representation, and Deep Learning.” IEEE Geoscience and Remote Sensing Magazine 6 (3): 10–43. doi:10.1109/MGRS.2018.2854840.
Web of Science ®Google Scholar
Gualtieri, J. A., and R. F. Cromp. 1999. “Support Vector Machines for Hyperspectral Remote Sensing Classification.” Paper presented at the 27th AIPR Workshop: Advances in Computer-Assisted Recognition, Washington, DC.
Google Scholar
Ham, J., Y. Chen, M. M. Crawford, and J. Ghosh. 2005. “Investigation of the Random Forest Framework for Classification of Hyperspectral Data.” IEEE Transactions on Geoscience and Remote Sensing 43 (3): 492–501. doi:10.1109/TGRS.2004.842481.
Web of Science ®Google Scholar
Huang, C., L. S. Davis, and J. R. G. Townshend. 2002. “An Assessment of Support Vector Machines for Land Cover Classification.” International Journal of Remote Sensing 23 (4): 725–749. doi:10.1080/01431160110040323.
Web of Science ®Google Scholar
Jia, X., C. Dey, D. Fraser, L. Lymburner, and A. Lewis. 2010. “Controlled Spectral Unmixing Using Extended Support Vector Machines.” Paper presented at the Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Reykjavik, Iceland, 2010 2nd Workshop.
Google Scholar
Juan, O., and Y. Boykov. 2007. “Capacity scaling for graph cuts in vision.” .
Google Scholar
Kolmogorov, V., and R. Zabin. 2004. “What Energy Functions Can Be Minimized via Graph Cuts?” IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (2): 147–159. doi:10.1109/TPAMI.2004.1262177.
PubMed Web of Science ®Google Scholar
Landgrebe, D. A. 2005. Signal Theory Methods in Multispectral Remote Sensing. Vol. 29. Milton QLD, Australia: John Wiley & Sons.
Google Scholar
Li, J., H. Zhang, and L. Zhang. 2014. “Supervised Segmentation of Very High Resolution Images by the Use of Extended Morphological Attribute Profiles and a Sparse Transform.” IEEE Geoscience and Remote Sensing Letters 11 (8): 1409–1413. doi:10.1109/LGRS.2013.2294241.
Web of Science ®Google Scholar
Li, J., J. M. Bioucas-Dias, and A. Plaza. 2012. “Spectral–Spatial Hyperspectral Image Segmentation Using Subspace Multinomial Logistic Regression and Markov Random Fields.” IEEE Transactions on Geoscience and Remote Sensing 50 (3): 809–823. doi:10.1109/TGRS.2011.2162649.
Web of Science ®Google Scholar
Li, J., J. M. Bioucas-Dias, and A. Plaza. 2013. “Semisupervised Hyperspectral Image Classification Using Soft Sparse Multinomial Logistic Regression.” IEEE Geoscience and Remote Sensing Letters 10 (2): 318–322. doi:10.1109/LGRS.2012.2205216.
Web of Science ®Google Scholar
Li, J., X. Huang, P. Gamba, J. M. Bioucas-Dias, L. Zhang, J. A. Benediktsson, and A. Plaza. 2015. “Multiple Feature Learning for Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 53 (3): 1592–1606. doi:10.1109/TGRS.2014.2345739.
Web of Science ®Google Scholar
Lin, Z., and L. Yan. 2016. “A Support Vector Machine Classifier Based on A New Kernel Function Model for Hyperspectral Data.” GIScience & Remote Sensing 53 (1): 85–101. doi:10.1080/15481603.2015.1114199.
Web of Science ®Google Scholar
Melgani, F., and L. Bruzzone. 2004. “Classification of Hyperspectral Remote Sensing Images with Support Vector Machines.” IEEE Transactions on Geoscience and Remote Sensing 42 (8): 1778–1790. doi:10.1109/TGRS.2004.831865.
Web of Science ®Google Scholar
Moser, G., and S. B. Serpico. 2013. “Combining Support Vector Machines and Markov Random Fields in an Integrated Framework for Contextual Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 51 (5): 2734–2752. doi:10.1109/TGRS.2012.2211882.
Web of Science ®Google Scholar
Mountrakis, G., J. Im, and C. Ogole. 2011. “Support Vector Machines in Remote Sensing: A Review.” ISPRS Journal of Photogrammetry and Remote Sensing 66 (3): 247–259. doi:10.1016/j.isprsjprs.2010.11.001.
Web of Science ®Google Scholar
Mura, D., A. V. Mauro, J. A. Benediktsson, J. Chanussot, and L. Bruzzone. 2011. “Classification of Hyperspectral Images by Using Extended Morphological Attribute Profiles and Independent Component Analysis.” IEEE Geoscience and Remote Sensing Letters 8 (3): 542–546. doi:10.1109/LGRS.2010.2091253.
Web of Science ®Google Scholar
Plaza, A., J. A. Benediktsson, J. W. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls, J. Chanussot, M. Fauvel, P. Gamba, and A. Gualtieri. 2009. “Recent Advances in Techniques for Hyperspectral Image Processing.” Remote Sensing of Environment 113: S110–S22. doi:10.1016/j.rse.2007.07.028.
Web of Science ®Google Scholar
Ratle, F., G. Camps-Valls, and J. Weston. 2010. “Semisupervised Neural Networks for Efficient Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 48 (5): 2271–2282. doi:10.1109/TGRS.2009.2037898.
Web of Science ®Google Scholar
Srinivas, U., Y. Chen, V. Monga, N. M. Nasrabadi, and T. D. Tran. 2013. “Exploiting Sparsity in Hyperspectral Image Classification via Graphical Models.” IEEE Geoscience and Remote Sensing Letters 10 (3): 505–509. doi:10.1109/LGRS.2012.2211858.
Web of Science ®Google Scholar
Tarabalka, Y., M. Fauvel, J. Chanussot, and J. A. Benediktsson. 2010. “SVM-and MRF-based Method for Accurate Classification of Hyperspectral Images.” IEEE Geoscience and Remote Sensing Letters 7 (4): 736–740. doi:10.1109/LGRS.2010.2047711.
Web of Science ®Google Scholar
Tesfamichael, S. G., S. W. Newete, E. Adam, and B. Dubula. 2018. “Field Spectroradiometer and Simulated Multispectral Bands for Discriminating Invasive Species from Morphologically Similar Cohabitant Plants.” GIScience & Remote Sensing 55 (3): 417–436. doi:10.1080/15481603.2017.1396658.
Web of Science ®Google Scholar
Tu, B., S. Huang, L. Fang, G. Zhang, J. Wang, and B. Zheng. 2018a. “Hyperspectral Image Classification via Weighted Joint Nearest Neighbor and Sparse Representation.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,11 (11): 4063–4075.
Google Scholar
Tu, B., X. Zhang, X. Kang, G. Zhang, J. Wang, and W. Jianhui. 2018b. “Hyperspectral Image Classification via Fusing Correlation Coefficient and Joint Sparse Representation.” IEEE Geoscience and Remote Sensing Letters 15 (3): 340–344. doi:10.1109/LGRS.2017.2787338.
Web of Science ®Google Scholar
Villa, A., J. Chanussot, J. A. Benediktsson, and C. Jutten. 2011. “Spectral Unmixing for the Classification of Hyperspectral Images at a Finer Spatial Resolution.” IEEE Journal of Selected Topics in Signal Processing 5 (3): 521–533. doi:10.1109/JSTSP.2010.2096798.
Web of Science ®Google Scholar
Xu, L., and J. Li. 2014. “Bayesian Classification of Hyperspectral Imagery Based on Probabilistic Sparse Representation and Markov Random Field.” IEEE Geoscience and Remote Sensing Letters 11 (4): 823–827. doi:10.1109/LGRS.2013.2279395.
Web of Science ®Google Scholar
Xun, L., and L. Wang. 2015. “An Object-Based SVM Method Incorporating Optimal Segmentation Scale Estimation Using Bhattacharyya Distance for Mapping Salt Cedar (Tamarisk Spp.) With QuickBird Imagery.” GIScience & Remote Sensing 52 (3): 257–273. doi:10.1080/15481603.2015.1026049.
Web of Science ®Google Scholar
Zhang, C., M. Smith, and C. Fang. 2018. “Evaluation of Goddard’s LiDAR, Hyperspectral, and Thermal Data Products for Mapping Urban Land-Cover Types.” GIScience & Remote Sensing 55 (1): 90–109. doi:10.1080/15481603.2017.1364837.
Web of Science ®Google Scholar
Zhang, X., Z. Gao, L. Jiao, and H. Zhou. 2018. “Multifeature Hyperspectral Image Classification with Local and Nonlocal Spatial Information via Markov Random Field in Semantic Space.” IEEE Transactions on Geoscience and Remote Sensing 56 (3): 1409–1424. doi:10.1109/TGRS.2017.2762593.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

A probabilistic fusion of a support vector machine and a joint sparsity model for hyperspectral imagery classification

Abstract

1. Introduction