Scale-wised feature enhancement network for change captioning of remote sensing images

Fengwei ZhangCollege of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China

Wenjing ZhangCollege of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China

Kai XiaCollege of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, ChinaCorrespondence[email protected]

Hailin FengCollege of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, ChinaCorrespondence[email protected]

ABSTRACT

The Remote Sensing Image Change Captioning (RSICC) has recently emerged in the field of remote sensing image interpretation; it aims to automatically predict natural language captions of significant semantic changes in bi-temporal remote sensing images. Recent studies of RSICC have improved the accuracy of change captions of bi-temporal remote sensing images to a large extent. Nevertheless, there still remain challenges in multi-scale perception of ground objects and feature enhancement of bi-temporal remote sensing images. To address these challenges and further improve the accuracy of RSICC, a novel deep learning–based end-to-end scale-wised feature enhancement network (SFEN) is proposed in this paper. SFEN integrates four efficient blocks: 1) the siamese backbone network (SBN) to extract initial features of bi-temporal remote sensing images, 2) the siamese receptive field fusion (SRFF) block to explicitly capture multi-scale semantic information of ground objects in bi-temporal feature maps, 3) the siamese global feature enhancement (SGFE) block to adaptively enhance key information and filtering redundant features of bi-temporal feature maps in both channel and spatial dimensions, 4) the change caption decoder (CCD) to map bi-temporal feature maps into natural language. The SFEN aims to precisely capture significant semantic information of ground objects in bi-temporal remote sensing images and predict accurate change captions. Experimental results on LEVIR-CC dataset demonstrate our SFEN outperforms recent state-of-the-art (SOTA) approach in RSICC by 5.2% on CIDEr-D and achieves a new SOTA.

KEYWORDS:

Disclosure statement

No potential conflict of interest was reported by the author(s).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Scale-wised feature enhancement network for change captioning of remote sensing images

Information for

Open access

Opportunities

Help and information

Scale-wised feature enhancement network for change captioning of remote sensing images

ABSTRACT

Disclosure statement

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature