ABSTRACT
This study introduces the coupled transformer Network (CTNet), an architecture designed to enhance the robustness and effectiveness of hyperspectral unmixing (HSU) tasks, addressing key limitations of traditional autoencoder (AE) frameworks. Traditional AEs, consisting of an encoder and a decoder, effectively learn and reconstruct low-dimensional abundance relationships from high-dimensional hyperspectral data but often struggle with spectral variability (SV) and spatial correlations, which can lead to uncertainty in the resulting abundance estimates. CTNet improves upon these limitations by incorporating a two-stream half-Siamese network with an additional encoder trained on pseudo-pure pixels, and further integrates a cross-attention module to leverage global information. This configuration not only guides the AE towards more accurate abundance estimates by directly addressing SV, but also enhances the network’s ability to capture complex spectral information. To minimize the typical reconstruction errors associated with AEs, a transcription loss constraint is applied, which preserves essential details and material-related information often lost during pixel-level reconstruction. Experimental validation on synthetic and three widely-used datasets confirms that CTNet outperforms several state-of-the-art methods, providing a more robust and effective solution for HSU challenges.
Acknowledgements
This work was supported in part by the National Natural Science Foundations of China under grant 41730422, in part by the Education Department of Jilin Province under grant JJKH20240741KJ and JJKH20220595CY. The authors would like to thank the developers of the applied unmixing method, who kindly provided their codes for our comparative experiments.
Disclosure statement
No potential conflict of interest was reported by the author(s).