MHAMD-MST-CNN: multiscale head attention guided multiscale density maps fusion for video crowd counting via multi-attention spatial-temporal CNN

Santosh Kumar Tripathya Computing and Vision Lab, Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, UP, IndiaCorrespondence[email protected]

Subodh Srivastavab Department of Electronics and Communication Engineering, National Institute of Technology, Patna, Bihar, India

Rajeev Srivastavaa Computing and Vision Lab, Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, UP, India

ABSTRACT

Video-based crowd counting and density estimation (CCDE) is vital for crowd monitoring. The existing solutions lack in addressing issues like cluttered background and scale variation in crowd videos. To this end, a multiscale head attention-guided multiscale density maps fusion for video-based CCDE via multi-attention Spatial-Temporal CNN (MHAMD-MST-CNN) is proposed. The MHAMD-MST-CNN has three modules: a multi attention spatial stream (MASS), a multi attention temporal stream (MATS), and a final density map generation (FDMG) module. The spatial head attention modules (SHAMs) and temporal head attention modules (THAMs) are designed to eliminate the background influence from the MASS and the MATS, respectively, by mapping the multiscale spatial or temporal features to head maps. The multiscale de-backgrounded features are utilised by the density map generation (DMG) modules to generate multiscale density maps to deal with scale variation due to perspective distortion. The multiscale density maps are fused and fed into the FDMG module to obtain the final crowd density map. The MHAMD-MST-CNN has been trained and validated on three publicly available benchmark datasets: the Venice, the Mall, and the UCSD. The MHAMD-MST-CNN provides competitive results as compared with the state-of-the-arts in terms of mean absolute error (MAE) and root mean squared error (RMSE).

KEYWORDS:

Acknowledgements

The support and the resources provided by ‘PARAM Shivay Facility’ under the National Supercomputing Mission, Government of India at the Indian Institute of Technology, Varanasi, are gratefully acknowledged.

Disclosure statement

No potential conflict of interest was reported by the authors.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

MHAMD-MST-CNN: multiscale head attention guided multiscale density maps fusion for video crowd counting via multi-attention spatial-temporal CNN

Information for

Open access

Opportunities

Help and information

MHAMD-MST-CNN: multiscale head attention guided multiscale density maps fusion for video crowd counting via multi-attention spatial-temporal CNN

ABSTRACT

Acknowledgements

Disclosure statement

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature