CrossRef citations to date
Research Article

Deep Learning Technique Based Surveillance Video Analysis for the Store

, , &


  • Babaee, E., N. B. Anuar, A. W. A. Wahab, S. Shamshirband and A. T. Chronopoulos.  An overview of audio event detection methods from feature extraction to classification. Applied Artificial Intelligence 31(9-10): 661-714.
  • Berclaz, J., F. Fleuret, E. Türetken. and P. Fua. 2011. Multiple object tracking using k-shortest paths optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (9):1806–19.
  • Bewley, A., Z.Ge, L. Ott, F. Ramos, B. Upcroft. 2016. Simple online and realtime tracking. In:IEEE International Conference on Image Processing(pp. 3464–68) Phoenix, AZ, USA.
  • Beyan, C., and A. Temizel. 2012. Adaptive mean-shift for automated multi object tracking. Iet Computer Vision 6 (11):1–12. doi:10.1049/iet-cvi.2011.0054.
  • Boominathan, L., S. S. S. Kruthiventi, and R. V. Babu 2016. Crowdnet: A deep convolutional network for dense crowd counting. In:Proceedings of the 2016 ACM on Multimedia Conference(pp. 640–44) Amsterdam, Netherlands.
  • Brahimi, M., K. Boukhalfa, and A. Moussaoui. 2017. Deep learning for tomato diseases: Classification and symptoms visualization. Applied Artificial Intelligence 31(4): 299-315.
  • Cao, Z., T. Simon, S.E. Wei and Y. Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In:2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR2017)(pp. 1302-10)Honolulu, Hawaii.
  • Chen, K., C. L. Chen, S. Gong and T. Xiang. 2012. Feature mining for localised crowd counting. In: Proceedings of British Machine Vision Conference (pp. 21.1-11), Surrey, UK.
  • Chen, Y., Z. Huo, and C. Hua. 2016. Multi-directional saliency metric learning for person re-identification. Iet Computer Vision 10 (10):623–33. doi:10.1049/iet-cvi.2015.0343.
  • Dolata, P., M. Mrzygłód, and J. Reiner. 2018. Double-stream convolutional neural networks for machine vision inspection of natural products. Applied Artificial Intelligence (3):1–17.
  • Enzweiler, M., and D. M. Gavrila. 2009. Monocular pedestrian detection: Survey and experiments. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (12):2179–95. doi:10.1109/TPAMI.2008.260.
  • Felzenszwalb, P. F., R. B. Girshick, D. Mcalleste and D. Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (9):1627–45. doi:10.1109/TPAMI.2009.167.
  • Ferreira, B. V., E. Carvalho, M. R. Ferreira, P. A. Vargas, J. Ueyama and G. Pessin. 2017. Exploiting the use of convolutional neural networks for localization in indoor environments. Applied Artificial Intelligence 31 (2010):1–9.
  • Fortmann, T. E., Y. Barshalom, and M. Scheffe. 2003. Sonar tracking of multiple targets using joint probabilistic data association. Ieee Journal of Oceanic Engineering 8 (3):173–84. doi:10.1109/JOE.1983.1145560.
  • Fradi, H., and J. L. Dugelay. 2015. Towards crowd density-aware video surveillance applications. Information Fusion 24:3–15. doi:10.1016/j.inffus.2014.09.005.
  • Gall, J., A. Yao, N. Razavi, L. Van Gool and V. Lempitsky. 2011. Hough forests for object detection, tracking, and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (11):2188–202. doi:10.1109/TPAMI.2011.70.
  • Girshick, R. 2015. Fast r-cnn. In:2015 IEEE International Conference on Computer Vision (ICCV)(pp. 1440–1448) Santiago, Chile.
  • Girshick, R., J. Donahue, T. Darrell and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In:2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR2014)(pp. 580–87) Columbus, OH, USA.
  • Guo, H., J. Wang, and H. Lu. 2016. Multiple deep features learning for object retrieval in surveillance videos. Iet Computer Vision 10 (4):268–72. doi:10.1049/iet-cvi.2015.0291.
  • Hbali, Y., S. Hbali, L. Ballihi and M. Sadgal. 2018. Skeleton-based human activity recognition for elderly monitoring systems. Iet Computer Vision 12 (10):16–26. doi:10.1049/iet-cvi.2017.0062.
  • He, K., X. Zhang, S. Ren and J. Sun. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (9):1904–16. doi:10.1109/TPAMI.2015.2389824.
  • Hou, Y. L., and G. K. H. Pang. 2011. People counting and human detection in a challenging situation. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 41 (1):24–33. doi:10.1109/TSMCA.2010.2064299.
  • Hu, Y., T. Li, Y.C. Hu, F.D. Nian, Y. Wang. 2016. Dense crowd counting from still images with convolutional neural networks. Journal of Visual Communication and Image Representation 38 (C):530–39. doi:10.1016/j.jvcir.2016.03.021.
  • Jacques Junior, J. C. S., S. Raupp Musse, and C. R. Jung. 2010. Crowd analysis using computer vision techniques. Signal Processing Magazine IEEE 27 (5):66–77.
  • Jing, S., C. L. Chen, K. Kai and X. Wang. 2017. Crowded scene understanding by deeply learned volumetric slices. IEEE Transactions on Circuits & Systems for Video Technology 27 (3):613–23. doi:10.1109/TCSVT.2016.2593647.
  • Jones, M., and P. Viola. 2007. Detecting pedestrians using patterns of motion and apprearance in videos. International Journal of Computer Vision 63 (2):153–61.
  • Kim, C., F. Li, A. Ciptadi and J. M. Rehg. 2015. Multiple hypothesis tracking revisited. In:IEEE International Conference on Computer Vision(pp. 4696–704) Santiago, Chile.
  • Lempitsky, V. S., and A. Zisserman. 2010. Learning to count objects in images.In: Proceedings of the 23rd International Conference on Neural Information Processing Systems(pp. 1324–32), Whistler, Canada.
  • Li, B., K. C. P. Wang, A. Zhang E. Yang and G. Wang. 2018. Automatic classification of pavement crack using deep convolutional neural network. International Journal of Pavement Engineering 21(4):457-463.
  • Li, M., Z. Zhang, K. Huang, T. Tan. 2008. Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In:2008 19th International Conference on Pattern Recognition(pp. 1–4) Tampa, FL, USA.
  • Lin, S. F., J. Y. Chen, and H. X. Chao. 2001. Estimation of number of people in crowded scenes using perspective transformation. Systems Man & Cybernetics Part A Systems & Humans IEEE Transactions On 31 (6):645–54. doi:10.1109/3468.983420.
  • Liu, L., W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu and M. Pietikäinen. 2020. Deep Learning for Generic Object Detection: A Survey. International Journal of Computer Vision volume 128: 261–318.
  • Liu, W., D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu and A.C. Berg. 2016. Ssd: Single shot multibox detector. In:2016 European Conference on Computer Vision(pp. 21–37) Amsterdam, Netherlands.
  • Manzi, A., L. Fiorini, R. Limosani, R. Limosani, P. Dario and F. Cavallo. 2018. Two-person activity recognition using skeleton data. Iet Computer Vision 12 (8):27–35. doi:10.1049/iet-cvi.2017.0118.
  • Milan, A., K. Schindler, and S. Roth 2013. Detection- and trajectory-level exclusion in multiple object tracking. In:2013 IEEE Conference on Computer Vision and Pattern Recognition(pp. 3682–89) Portland, OR, USA.
  • Morris, B. T., and M. M. Trivedi. 2008. A survey of vision-based trajectory learning and analysis for surveillance. IEEE Transactions on Circuits & Systems for Video Technology 18 (8):1114–27. doi:10.1109/TCSVT.2008.927109.
  • Nagar, Y., and T. Malone 2011. Making business predictions by combining human and machine intelligence in prediction markets. In:Thirty Second International Conference on Information Systems (pp. 1–16) Shanghai, China.
  • Negash, S., and P. Gray. 2013. Business intelligence. Communications of the Association for Information Systems 13 (13):177–95.
  • Nevatia, R. 2012. An online learned crf model for multi-target tracking. In:2012 IEEE Conference on Computer Vision and Pattern Recognition(pp. 2034–41) Providence, RI, USA.
  • Oñoro-Rubio, D., and R. J. López-Sastre 2016. Towards perspective-free object counting with deep learning. In:European Conference on Computer Vision(pp. 615–29), Amsterdam, Netherlands.
  • Pirsiavash, H., D. Ramanan, and C. C. Fowlkes 2011) Globally-optimal greedy algorithms for tracking a variable number of objects. In:IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011. (pp. 1201–08) Colorado Springs, CO, USA.
  • Ratre, A., and V. Pankajakshan. 2018. Tucker tensor decomposition-based tracking and gaussian mixture model for anomaly localisation and detection in surveillance videos. Iet Computer Vision 12 (7):933–40. doi:10.1049/iet-cvi.2017.0469.
  • Redmon, J., S. Divvala, R. Girshick and A. Farhadi. 2016. You only look once: Unified, real-time object detection. In:2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(pp. 779–88) Las Vegas, NV, USA.
  • Redmon, J., and A. Farhadi 2017. Yolo9000: Better, faster, stronger. In:2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(pp. 6517–25) Honolulu, HI, USA.
  • Redmon, J., and A. Farhadi. 2018. Yolov3: An incremental improvement. arXiv abs/1804.02767.
  • Reid, D. B. 1979. An algorithm for tracking multiple targets. IEEE Transaction on Automatics Control 24 (6):1202–11.
  • Ren, S., K. He, R. Girshick, and J. Sun. 2017. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6):1137–49. doi:10.1109/TPAMI.2016.2577031.
  • Rezatofighi, S. H., A. Milan, Z. Zhang, Q. Shi, A. Dick and I. Reid. 2015. Joint probabilistic data association revisited. In:IEEE International Conference on Computer Vision(pp. 3047–55) Santiago, Chile.
  • Roth, S. 2012. Discrete-continuous optimization for multi-target tracking. In:2012 IEEE Conference on Computer Vision and Pattern Recognition(pp. 1926–33) Providence, RI, USA.
  • Sabzmeydani, P., and G. Mori 2007. Detecting pedestrians by learning shapelet features. In:IEEE Conference on Computer Vision and Pattern Recognition, CVPR2007(pp. 1–8) Minneapolis, MN, USA.
  • Saleh, S. A. M., S. A. Suandi, and H. Ibrahim. 2015. Recent survey on crowd density estimation and counting for visual surveillance. Engineering Applications of Artificial Intelligence 41:103–14. doi:10.1016/j.engappai.2015.01.007.
  • Shao, J., K. Kang, C. L. Chen and X. Wang. 2015. Deeply learned attributes for crowded scene understanding. In:2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(pp. 4657–66) Boston, MA, USA.
  • Sindagi, V. A., and V. M. Patel. 2017. A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recognition Letters 107:3–16. doi:10.1116/j.patrec.2017.07.007.
  • Sindagi, V. A., and V. M. Patel. 2018. A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recognition Letters 107:3–16. doi:10.1016/j.patrec.2017.07.007.
  • Tesfaye, Y. T., E. Zemene, M. Pelillo and A. Prati. 2016. Multi-object tracking using dominant sets. Iet Computer Vision 10 (9):289–98. doi:10.1049/iet-cvi.2015.0297.
  • Tuzel, O., F. Porikli, and P. Meer. 2008. Pedestrian detection via classification on riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (10):1713–27. doi:10.1109/TPAMI.2008.75.
  • Viola, P., and M. J. Jones. 2004. Robust real-time face detection. International Journal of Computer Vision 57 (2):137–54. doi:10.1023/B:VISI.0000013087.49260.fb.
  • Walach, E., L. Wolf. 2016. Learning to count with cnn boosting. In: The 14th European Conference on Computer Vision (pp. 660–76),  Amsterdam, The Netherlands.
  • Wei, S. E., V. Ramakrishna, T. Kanade and Y. Sheikh. 2016. Convolutional pose machines. In:2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR2016)(pp. 4724–32) Las Vegas, NV, United States.
  • Wojke, N., A. Bewley, and D. Paulus. 2017. Simple online and realtime tracking with a deep association metric. CoRR abs/1703.07402.
  • Wu, B., and R. Nevatia. 2007. Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. International Journal of Computer Vision 75 (2):247–66. doi:10.1007/s11263-006-0027-7.
  •  Xie, C., J. Tan, P. Chen, J. Zhang and L. He. 2013. Multiple instance learning tracking method with local sparse representation. Iet Computer Vision 7 (14):320–34. doi:10.1049/iet-cvi.2012.0228.
  • Yang, B., and R. Nevatia 2012. Multi-target tracking by online learning of non-linear motion patterns and robust appearance models. In:2012 IEEE Conference on Computer Vision and Pattern Recognition(pp. 1918–25) Providence, RI, USA.
  • Zhan, B., D. N. Monekosso, P. Remagnino, S. A.Velastin and L. Q. Xu.. 2008. Crowd analysis: A survey. Machine Vision and Applications 19 (5–6):345–57. doi:10.1007/s00138-008-0132-4.
  • Zhang, A. Z., and M. Li. 2012. Crowd density estimation based on statistical analysis of local intra-crowd motions for public area surveillance. Optical Engineering 51 (4):7204.
  • Zhang, C., K. Kang, H. Li, X. Wang, R. Xie and X. Yang. 2016. Data-driven crowd understanding: A baseline for a large-scale crowd dataset. Ieee Transactions On Multimedia 18 (6):1048–61. doi:10.1109/TMM.2016.2542585.
  • Zhang, C., H. Li, X. Wang and X. Yang.. 2015. Cross-scene crowd counting via deep convolutional neural networks. In:IEEE Conference on Computer Vision and Pattern Recognition(pp. 833–41) Boston, MA, USA.
  • Zhang, L., Y. Li, and R. Nevatia 2008. Global data association for multi-object tracking using network flows. In:IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008. (pp. 1–8) Anchorage, AK, USA.
  • Zhang, X., H. Luo, X. Fan, W. Xiang, Y. Sun, Q. Xiao, W. Jiang, C. Zhang and J. Sun. 2017. Alignedreid: Surpassing human-level performance in person re-identification. arXiv prePrint 1711.08184.
  • Zhang, Y., D. Zhou, S. Chen, S. Gao and Y. Ma. 2016. Single-image crowd counting via multi-column convolutional neural network. In:2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(pp. 589–97) Las Vegas, NV, USA.
  • Zhou, B., X. Wang, and X. Tang 2011. Random field topic model for semantic region analysis in crowded scenes from tracklets. In:IEEE Conference on Computer Vision and Pattern Recognition(pp. 3441–48) Colorado Springs, CO, USA.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.