7,977
Views
14
CrossRef citations to date
0
Altmetric
Research Article

A Review of Deep Learning-based Human Activity Recognition on Benchmark Video Datasets

ORCID Icon, ORCID Icon, , ORCID Icon &
Article: 2093705 | Received 11 Mar 2022, Accepted 21 Jun 2022, Published online: 11 Jul 2022

References

  • Abdelbaky, A., and S. Aly. 2020. “Human Action Recognition Based on Simple Deep Convolution Network PCANet.” Proceedings of 2020 International Conference on Innovative Trends in Communication and Computer Engineering, ITCE 2020, Aswan, Egypt, 257–2901. doi:10.1109/ITCE48509.2020.9047769.
  • Baccouche, M., F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt. 2011. Sequential Deep Learning for Human Action Recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7065 LNCS:29–39. doi:10.1007/978-3-642-25446-8_4.
  • Bay, H., T. Tuytelaars, and L. Van Gool. 2006. SURF: Speeded up Robust Features. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3951 LNCS:404–17. doi:10.1007/11744023_32.
  • Beddiar, D. R., B. Nini, M. Sabokrou, and A. Hadid. 2020. Vision-Based Human Activity Recognition: A Survey. Multimedia Tools and Applications 79 (41–42):30509–55. doi:10.1007/s11042-020-09004-3.
  • Bojanowski, P., R. Lajugie, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. 2014. Weakly Supervised Action Labeling in Videos under Ordering Constraints. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8693 LNCS:628–43. doi:10.1007/978-3-319-10602-1_41.
  • Carmona, J. M., and J. Climent. 2018. Human Action Recognition by Means of Subtensor Projections and Dense Trajectories. Pattern Recognition 81:443–55. doi:10.1016/j.patcog.2018.04.015.
  • Carreira, J., and A. Zisserman. 2017. “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset.” Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, Hawaii 2017 Janua: 4724–33. doi:10.1109/CVPR.2017.502.
  • Carreira, J., E. Noland, A. Banki-Horvath, C. Hillier, and A. Zisserman. 2018. A Short Note about Kinetics-600. ArXiv. http://activity-net.org/challenges/2018/evaluation.html.
  • Carreira, J., E. Noland, C. Hillier, and A. Zisserman. 2019. A Short Note on the Kinetics-700 Human Action Dataset. ArXiv: 1907.06987.
  • Chakraborty, S., R. Mondal, P. Kumar Singh, R. Sarkar, and D. Bhattacharjee. 2021. Transfer Learning with Fine Tuning for Human Action Recognition from Still Images. Multimedia Tools and Applications 80 (13):20547–78. doi:10.1007/s11042-021-10753-y.
  • Chen, D., P. Wang, L. Yue, Y. Zhang, and T. Jia. 2020. Anomaly Detection in Surveillance Video Based on Bidirectional Prediction. Image and Vision Computing 98:103915. doi:10.1016/j.imavis.2020.103915.
  • Christoph, F., A. Pinz, and R. P. Wildes. 2017. “Spatiotemporal Multiplier Networks for Video Action Recognition.” Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, Hawaii 2017-Janua: 7445–54. doi:10.1109/CVPR.2017.787.
  • Christopher, R., F. Niemann, F. Moya Rueda, G. A. Fink, and M. ten Hompel. 2019. Human Activity Recognition for Production and Logistics-a Systematic Literature Review. Information (Switzerland) 10 (8):1–28. doi:10.3390/info10080245.
  • Chunhui, G., C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru, L. Yeqing, and S. Vijayanarasimhan, et al. 2018. “AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions.” In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 6047–56. doi:10.1109/CVPR.2018.00633.
  • Claudio, C., S. Cosar, D. R. Faria, and N. Bellotto. 2020. Social Activity Recognition on Continuous RGB-D Video Sequences. International Journal of Social Robotics 12 (1):201–15. doi:10.1007/s12369-019-00541-y.
  • Crasto, N., P. Weinzaepfel, K. Alahari, and C. Schmid. 2019. “Mars: Motion-Augmented Rgb Stream for Action Recognition.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA 2019 June: 7874–83. doi:10.1109/CVPR.2019.00807.
  • Dai, C., X. Liu, and J. Lai. 2020. Human Action Recognition Using Two-Stream Attention Based LSTM Networks. Applied Soft Computing Journal 86:105820. doi:10.1016/j.asoc.2019.105820.
  • Dai, X., X. Yuan, and X. Wei. 2021. Data Augmentation for Thermal Infrared Object Detection with Cascade Pyramid Generative Adversarial Network. Appl Intell 52: 967–981. https://doi.org/10.1007/s10489-021-02445-9
  • Dalal, N., B. Triggs, N. Dalal, and B. Triggs. 2005. “Histograms of Oriented Gradients for Human Detection To Cite This Version : Histograms of Oriented Gradients for Human Detection.” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 886–93. http://lear.inrialpes.fr.
  • Damen, D., H. Doughty, G. Maria Farinella, S. Fidler, A. Furnari, E. Kazakos, and D. Moltisanti, J. Munro, T. Perrett, W. Price, and M. Wray . 2018. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11208 LNCS 753–71. doi: 10.1007/978-3-030-01225-0_44.
  • Deng, J., R. S. Wei Dong, L. Li-Jia, L. Kai, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–55. doi: 10.1109/cvprw.2009.5206848.
  • Donahue, J., L. Anne Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, and T. Darrell. 2017. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (4):677–91. doi:10.1109/TPAMI.2016.2599174.
  • Duta, I. C., B. Ionescu, K. Aizawa, and N. Sebe. 2017. Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos. MultiMedia Modeling 1 (November):226–37. doi:10.1007/978-3-319-51811-4.
  • Feichtenhofer, C., A. Pinz, and A. Zisserman. 2016. “Convolutional Two-Stream Network Fusion for Video Action Recognition.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016-Dec: 1933–41. doi:10.1109/CVPR.2016.213.
  • Gao, H., Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. 2017. “Densely Connected Convolutional Networks.” Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, Hawaii, 2017-Janua: 2261–69. doi:10.1109/CVPR.2017.243.
  • Goyal, R., S. Ebrahimi Kahou, V. Michalski, J. Materzynska, S. Westphal, H. Kim, and V. Haenel, I. Fruend, P. Yianilos, M. Mueller-Freitag, F. Hoppe, C. Thurau, I. Bax, and R. Memisevic . 2017. “The ‘Something Something’ Video Database for Learning and Evaluating Visual Common Sense.” In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017October:5843–51. doi:10.1109/ICCV.2017.622.
  • Hao, Y., C. Yuan, L. Bing, D. Yang, J. Xing, H. Weiming, and S. J. Maybank. 2018. Asymmetric 3D Convolutional Neural Networks for Action Recognition. Pattern Recognition 85:1–12. doi:10.1016/j.patcog.2018.07.028.
  • Hao, W., and Z. Zhang. 2019. Spatiotemporal Distilled Dense-Connectivity Network for Video Action Recognition. Pattern Recognition 92:13–24. doi:10.1016/j.patcog.2019.03.005.
  • Heilbron, F. C., and J. Carlos Niebles. 2014. “Collecting and Annotating Human Activities in Web Videos.” ICMR 2014 - Proceedings of the ACM International Conference on Multimedia Retrieval 2014, Glasgow, Scotland, 377–84. doi:10.1145/2578726.2578775.
  • Heilbron, F. C., V. Escorcia, B. Ghanem, and J. Carlos Niebles. 2015. “ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, 07-12June: 961–70. doi:10.1109/CVPR.2015.7298698.
  • Herath, S., M. Harandi, and F. Porikli. 2017. Going Deeper into Action Recognition: A Survey. Image and Vision Computing 60:4–21. doi:10.1016/j.imavis.2017.01.010.
  • Htike, K. K., O. O. Khalifa, H. Adibah Mohd Ramli, and M. A. M. Abushariah. 2014. “Human Activity Recognition for Video Surveillance Using Sequences of Postures.” 2014 3rd International Conference on E-Technologies and Networks for Development, ICeND 2014, Beirut, Lebanon, 79–82. doi:10.1109/ICeND.2014.6991357.
  • Huang, C. Di, C. Yao Wang, and J. Ching Wang. 2016. “Human Action Recognition System for Elderly and Children Care Using Three Stream ConvNet.” Proceedings of 2015 International Conference on Orange Technologies, ICOT 2015, HONG KONG, 5–9. doi:10.1109/ICOT.2015.7498476.
  • Huerta, E. A., A. Khan, E. Davis, C. Bushell, W. D. Gropp, D. S. Katz, and V. Kindratenko, S. Koric, William T. C. Kramer, B. McGinty, K. McHenry and A. Saxton . 2020. Convergence of Artificial Intelligence and High Performance Computing on NSF-Supported Cyberinfrastructure. Journal of Big Data 7 (1):1. doi:10.1186/s40537-020-00361-2.
  • Ilya, S., O. Vinyals, and Q. V. Le. 2014. Sequence to Sequence Learning with Neural Networks. Advances in Neural Information Processing Systems 4:3104–12.
  • Jegham, I., A. Ben Khalifa, I. Alouani, and M. Ali Mahjoub. 2020. Vision-Based Human Action Recognition: An Overview and Real World Challenges. Forensic Science International: Digital Investigation 32:200901. doi:10.1016/j.fsidi.2019.200901.
  • Jégou, H., F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. 2012. Aggregating Local Image Descriptors into Compact Codes. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 34 (9):1704–16. doi:10.1109/TPAMI.2011.235.
  • Jhuang, H., J. Gall, S. Zuffi, C. Schmid, and M. J. Black. 2013. “Towards Understanding Action Recognition.” In Proceedings of the IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 3192–99. doi:10.1109/ICCV.2013.396.
  • Karpathy, A., G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei Fei 2014a. “Large-Scale Video Classification with Convolutional Neural Networks.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA. doi:10.1109/CVPR.2014.223.
  • Karpathy, A., G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. 2014b. “Large-Scale Video Classification with Convolutional Neural Networks.” In 2014 IEEE Conference on Computer Vision and Pattern Recognition. CVPR, Columbus, OH, USA. doi:10.1007/978-981-15-7062-9_69.
  • Kay, W., J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, and F. Viola, T. Green, T. Back, P. Natsev, M. Suleyman, and A. Zisserman . 2017. The Kinetics Human Action Video Dataset. ArXiv
  • Kläser, A., M. Marszałek, and C. Schmid. 2008. “A Spatio-Temporal Descriptor Based on 3D-Gradients.” In BMVC 2008 - Proceedings of the British Machine Vision Conference 2008, Leeds, UK. doi:10.5244/C.22.99.
  • Koohzadi, M., and N. Moghadam Charkari. 2017. Survey on Deep Learning Methods in Human Action Recognition. IET Computer Vision 11 (8):623–32. doi:10.1049/iet-cvi.2016.0355.
  • Krizhevsky, A., I. Sutskever, and G. E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM 60 (6):84–90. doi:10.1145/3065386.
  • Kuehne, H., H. Jhuang, E. Garrote, T. Poggio, and T. Serre. 2011. HMDB: A Large Video Database for Human Motion Recognition, 2011 International Conference on Computer Vision, 2011, pp. 2556-2563, doi: 10.1109/ICCV.2011.6126543.
  • Kuehne, H., A. Arslan, and T. Serre. 2014. “The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities.” In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 780–87. IEEE Computer Society. doi:10.1109/CVPR.2014.105.
  • Laptev, Ivan, M. Marszałek, C. Schmid, and B. Rozenfeld. 2008. “Learning Realistic Human Actions from Movies.” 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Anchorage, AK, USA, 0–7. doi:10.1109/CVPR.2008.4587756.
  • Lei, W., D. Q. Huynh, and P. Koniusz. 2020. A Comparative Review of Recent Kinect-Based Action Recognition Algorithms. IEEE Transactions on Image Processing 29:15–28. doi:10.1109/TIP.2019.2925285.
  • Lin, S., K. Jia, D. Yan Yeung, and B. E. Shi. 2015. “Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks.” Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 2015 Inter: 4597–605. doi:10.1109/ICCV.2015.522.
  • Lin, S., K. Jia, K. Chen, D. Yan Yeung, B. E. Shi, and S. Savarese. 2017. “Lattice Long Short-Term Memory for Human Action Recognition.” Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017October: 2166–75. doi:10.1109/ICCV.2017.236.
  • Liu, B., H. Cai, J. Zhaojie, and H. Liu. 2019. RGB-D Sensing Based Human Action and Interaction Analysis: A Survey. Pattern Recognition 94:1–12. doi:10.1016/j.patcog.2019.05.020.
  • Liu, Z., and H. Haifeng. 2019. Spatiotemporal Relation Networks for Video Action Recognition. IEEE Access 7:14969–76. doi:10.1109/ACCESS.2019.2894025.
  • Lowe, D. G. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60 (2):91–110. doi:10.1023/B:VISI.0000029664.99615.94.
  • Majumder, S., and N. Kehtarnavaz. 2021. Vision and Inertial Sensing Fusion for Human Action Recognition: A Review. IEEE Sensors Journal 21 (3):2454–67. doi:10.1109/JSEN.2020.3022326.
  • Michalis, V., C. Nikou, and I. A. Kakadiaris. 2015. A Review of Human Activity Recognition Methods. Frontiers Robotics AI 2 (NOV):1–28. doi:10.3389/frobt.2015.00028.
  • Minh Dang, L., K. Min, H. Wang, M. Jalil Piran, C. Hee Lee, and H. Moon. 2020. Sensor-Based and Vision-Based Human Activity Recognition: A Comprehensive Survey. Pattern Recognition 108:107561. doi:10.1016/j.patcog.2020.107561.
  • Monfort, M., A. Andonian, B. Zhou, K. Ramakrishnan, S. Adel Bargal, T. Yan, and L. Brown, Q. Fan, D. Gutfreund, C. Vondrick, and A. Oliva . 2020. Moments in Time Dataset: One Million Videos for Event Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2):502–08. doi:10.1109/TPAMI.2019.2901464.
  • Naeem, B., Hajra, F. Murtaza, M. Haroon Yousaf, and S. A. Velastin. 2021. T-VLAD: Temporal Vector of Locally Aggregated Descriptor for Multiview Human Action Recognition. Pattern Recognition Letters. doi:10.1016/j.patrec.2021.04.023.
  • Najeera, P. M., P. D. Anu, and M. Sadiq. 2018. “An Intelligent Action Predictor from Video Using Deep Learning.” 2018 International Conference on Emerging Trends and Innovations In Engineering And Technological Research, ICETIETR 2018, Ernakulam, India, 2018–21. doi:10.1109/ICETIETR.2018.8529076.
  • Ng, J. Y. H., M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. 2015. “Beyond Short Snippets: Deep Networks for Video Classification.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 07-12June: 4694–702. doi:10.1109/CVPR.2015.7299101.
  • Nweke, H. F., Y. Wah Teh, M. Ali Al-garadi, and U. Rita Alo. 2018. Deep Learning Algorithms for Human Activity Recognition Using Mobile and Wearable Sensor Networks: State of the Art and Research Challenges. Expert Systems with Applications 105:233–61. doi:10.1016/j.eswa.2018.03.056.
  • Özyer, T., A. Duygu Selin, and R. Alhajj. 2021. Human Action Recognition Approaches with Video Datasets—A Survey. Knowledge-Based Systems 222:106995. doi:10.1016/j.knosys.2021.106995.
  • Peng, X., L. Wang, X. Wang, and Y. Qiao. 2016. Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice. Computer Vision and Image Understanding 150 (September):109–25. doi:10.1016/j.cviu.2016.03.013.
  • Scovanner, P., S. Ali, and M. Shah. 2007. “A 3-Dimensional Sift Descriptor and Its Application to Action Recognition.” In Proceedings of the ACM International Multimedia Conference and Exhibition, Augsburg,Germany, 357–60. doi:10.1145/1291233.1291311.
  • Serpush, F., and M. Rezaei. 2021. Complex Human Action Recognition Using a Hierarchical Feature Reduction and Deep Learning-Based Method. SN Computer Science 2 (2):1–15. doi:10.1007/s42979-021-00484-0.
  • Sheng, Y., L. Xie, L. Liu, and D. Xia. 2020. Learning Long-Term Temporal Features with Deep Neural Networks for Human Action Recognition. IEEE Access 8:1840–50. doi:10.1109/ACCESS.2019.2962284.
  • Shuiwang, J., X. Wei, M. Yang, and Y. Kai. 2013. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1):221–31. doi:10.1109/TPAMI.2012.59.
  • Sigurdsson, G. A., G. Varol, X. Wang, A. Farhadi, I. Laptev, and A. Gupta. 2016. “Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding.” http://allenai.org/plato/charades/.
  • Simonyan, K., and A. Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. Advances in Neural Information Processing Systems 1 (January):568–76.
  • Simonyan, K., and A. Zisserman. 2015a. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, San Diego, CA, 1–14.
  • Simonyan, K., and A. Zisserman. 2015b. “VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION.” http://www.robots.ox.ac.uk/.
  • Singh, T., and D. Kumar Vishwakarma. 2019. Video Benchmarks of Human Action Datasets: A Review. Artificial Intelligence Review 52 (2):1107–54. doi:10.1007/s10462-018-9651-1.
  • Soomro, K., A. Roshan Zamir, and M. Shah. 2012. “UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild”. November. http://arxiv.org/abs/1212.0402.
  • Srivastava, N., E. Mansimov, and R. Salakhutdinov. 2015. “Unsupervised Learning of Video Representations Using LSTMs.” 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 1: 843–52.
  • Sun, B., W. Yong, K. Zhao, H. Jun, Y. Lejun, H. Yan, and A. Luo. 2021. Student Class Behavior Dataset: A Video Dataset for Recognizing, Detecting, and Captioning Students’ Behaviors in Classroom Scenes. Neural Computing & Applications 0123456789. doi:10.1007/s00521-020-05587-y.
  • Szegedy, C., W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. “Going Deeper with Convolutions.” In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, 07-12June:1–9. doi:10.1109/CVPR.2015.7298594.
  • Taylor, G. W., R. Fergus, Y. LeCun, and C. Bregler. 2010. Convolutional Learning of Spatio-Temporal Features. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6316 LNCS (PART 6:140–53. doi:10.1007/978-3-642-15567-3_11.
  • Thameri, M., A. Kammoun, K. Abed-Meraim, and A. Belouchrani. 2011. “Fast Principal Component Analysis and Data Whitening Algorithms.” 7th International Workshop on Systems, Signal Processing and Their Applications, WoSSPA 2011, Tipaza, Algeria, 139–42. doi:10.1109/WOSSPA.2011.5931434.
  • Tran, D., L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. 2015. “Learning Spatiotemporal Features with 3D Convolutional Networks.” Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile 2015 Inter: 4489–97. doi:10.1109/ICCV.2015.510.
  • Varol, G., I. Laptev, and C. Schmid. 2018. Long-Term Temporal Convolutions for Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (6):1510–17. doi:10.1109/TPAMI.2017.2712608.
  • Verma, K. K., H. L. M. Brij Mohan Singh, and P. Chauhan. 2020. Two-Stage Human Activity Recognition Using 2D-ConvNet. International Journal of Interactive Multimedia and Artificial Intelligence 6 (2):11. doi:10.9781/ijimai.2020.04.002.
  • Verma, K. K., and B. Mohan Singh. 2021. Deep Multi-Model Fusion for Human Activity Recognition Using Evolutionary Algorithms. International Journal of Interactive Multimedia and Artificial Intelligence 7 (2):44–58. doi:10.9781/ijimai.2021.08.008.
  • Verma, K. K., B. Mohan Singh, and A. Dixit. 2022. A Review of Supervised and Unsupervised Machine Learning Techniques for Suspicious Behavior Recognition in Intelligent Surveillance System. International Journal of Information Technology (Singapore) 14 (1):397–410. doi:10.1007/s41870-019-00364-0.
  • Wan, S., Q. Lianyong, X. Xiaolong, C. Tong, and G. Zonghua. 2020a. Deep Learning Models for Real-Time Human Activity Recognition with Smartphones. Mobile Networks and Applications 25 (2):743–55. doi:10.1007/s11036-019-01445-x.
  • Wan, Y., Y. Zujun, Y. Wang, and L. Xingxin. 2020b. Action Recognition Based on Two-Stream Convolutional Networks with Long-Short-Term Spatiotemporal Features. IEEE Access 8:85284–93. doi:10.1109/ACCESS.2020.2993227.
  • Wang, H., A. Klaser, C. Schmid, and C.-L. Liu. 2011. Action Recognition by Dense Trajectories. IEEE Access. doi:10.16182/j.1004731x.joss.201709023.
  • Wang, H., and C. Schmid. 2013. “Action Recognition with Improved Trajectories.” Proceedings of the IEEE International Conference on Computer Vision, 3551–58. doi:10.1109/ICCV.2013.441.
  • Wang, H., A. Kläser, C. Schmid, and C. Lin Liu. 2013a. Dense Trajectories and Motion Boundary Descriptors for Action Recognition. International Journal of Computer Vision 103 (1):60–79. doi:10.1007/s11263-012-0594-8.
  • Wang, H., C. Schmid, H. Wang, C. Schmid, A. Recognition, T. Iccv, H. Wang, and C. Schmid. 2013b. “Action Recognition with Improved Trajectories To Cite This Version : HAL Id : Hal-00873267 Action Recognition with Improved Trajectories.” ICCV - IEEE International Conference on Computer Vision, Sydney, NSW, Australia December: 3551–58.
  • Wang, L., Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. van Gool. 2016. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9912 LNCS: 20–36. doi:10.1007/978-3-319-46484-8_2.
  • Wang, X., Z. Miao, R. Zhang, and S. Hao. 2019. “I3D-LSTM: A New Model for Human Action Recognition.” IOP Conference Series: Materials Science and Engineering 569 (3). doi:10.1088/1757-899X/569/3/032035.
  • Wang, S., Y. Liu, J. Wang, S. Gao, and W. Yang. 2021. A Moving Track Data-Based Method for Gathering Behavior Prediction at Early Stage. Applied Intelligence 51 (11):8498–518. doi:10.1007/s10489-021-02244-2.
  • Wenjin, T., Z. Hao Lai, M. C. Leu, and Z. Yin. 2018. Worker Activity Recognition in Smart Manufacturing Using IMU and SEMG Signals with Convolutional Neural Networks. Procedia Manufacturing 26:1159–66. Elsevier B.V. doi:10.1016/j.promfg.2018.07.152.
  • Zawar, H., Q. Z. Sheng, and W. Emma Zhang. 2020. A Review and Categorization of Techniques on Device-Free Human Activity Recognition. Journal of Network and Computer Applications 167:102738. December 2019. doi:10.1016/j.jnca.2020.102738.
  • Zhang, S., Z. Wei, J. Nie, L. Huang, S. Wang, and Z. Li. 2017. A Review on Human Activity Recognition Using Vision-Based Method. Journal of Healthcare Engineering 2017:1–31. doi:10.1155/2017/3090343.
  • Zhang, H.-B., Y.-X. Zhang, B. Zhong, Q. Lei, L. Yang, D. Ji-Xiang, and D.-S. Chen. 2019. A Comprehensive Survey of Vision-Based Human Action Recognition Methods. Mpdi. doi:10.3390/s19051005.
  • Zhao, H., and X. Jin. 2020. “Human Action Recognition Based on Improved Fusion Attention CNN and RNN.” Proceedings - 2020 5th International Conference on Computational Intelligence and Applications, ICCIA 2020, Beijing, China, 108–12. doi:10.1109/ICCIA49625.2020.00028.
  • Zheng, H., and X.-M. Zhang. 2020. “A Cross-Modal Learning Approach for Recognizing Human Actions.” IEEE Systems Journal, 1–9. doi:10.1109/jsyst.2020.3001680.
  • Zhenyang, L., K. Gavrilyuk, E. Gavves, M. Jain, and C. G. M. Snoek. 2018. VideoLSTM Convolves, Attends and Flows for Action Recognition. Computer Vision and Image Understanding 166:41–50. doi:10.1016/j.cviu.2017.10.011.
  • Zhu, F., L. Shao, J. Xie, and Y. Fang. 2016. From Handcrafted to Learned Representations for Human Action Recognition: A Survey. Image and Vision Computing 55:42–52. doi:10.1016/j.imavis.2016.06.007.