References
- Wang L, Xiong Y, Wang Z, et al. Temporal segment networks: towards good practices for deep action recognition, in: Proc. European conference on computer vision, 2016, pp. 20–36.
- Veeriah V, Zhuang N, Qi GJ. Differential recurrent neural networks for action recognition, in: Proc. IEEE international conference on computer vision, 2015, pp. 4041–4049.
- Poppe R. A survey on vision-based human action recognition, in: Image and vision computing, 2010, 28(6): 976–990.
- Ke Q, Bennamoun M, An S, et al. Learning clip representations for skeleton-based 3d action recognition, in: IEEE Transactions on Image Processing, 2018, 27(6): 2842–2855.
- Liu H, Tu J, Liu M. Two-stream 3d convolutional neural network for skeleton-based action recognition, 2017, arXiv:1705.08106.
- Li B, Dai Y, Cheng X, et al. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN, in: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2017, pp. 601–604.
- Ke Q, Bennamoun M, An S, et al. A new representation of skeleton sequences for 3d action recognition, in: Proc. IEEE conference on computer vision and pattern recognition, 2017, pp. 3288–3297.
- Kim TS, Reiter A. Interpretable 3d human action analysis with temporal convolutional networks, in: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), 2017, pp. 1623–1631.
- Zheng W, Li L, Zhang Z, et al. Skeleton-based relational modeling for action recognition, 2018, arXiv:1805.02556.
- Du Y, Wang W, Wang L. Hierarchical recurrent neural network for skeleton based action recognition, in: Proc. IEEE conference on computer vision and pattern recognition, 2015, pp. 1110–1118.
- Li S, Li W, Cook C, et al. Independently recurrent neural network (indrnn): Building a longer and deeper rnn, in: Proc. IEEE conference on computer vision and pattern recognition, 2018, pp. 5457–5466.
- Monti F, Boscaini D, Masci J, et al. Geometric deep learning on graphs and manifolds using mixture model cnns, in: Proc. IEEE conference on computer vision and pattern recognition, 2017, pp. 5115–5124.
- Li C, Zhong Q, Xie D, et al. Skeleton-based action recognition with convolutional neural networks, in: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2017, pp. 597–600.
- Li M, Chen S, Chen X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition, in: Proc. IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3595–3603.
- Shi L, Zhang Y, Cheng J, et al. Skeleton-based action recognition with directed graph neural networks, in: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7912–7921.
- Shi L, Zhang Y, Cheng J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition, in: Proc. IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12026–12035.
- Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition, in: Thirty-second AAAI conference on artificial intelligence, 2018, pp. 1–10.
- Aggarwal JK, Ryoo MS. Human activity analysis: A review, in: ACM Computing Surveys (CSUR), 2011, 43(3): 1–43.
- Maas AL, Hannun AY, Ng Y. Rectifier nonlinearities improve neural network acoustic models, in Proc. icml. 2013, 30(1): 3.
- Song S, Lan C, Xing J, et al. An end-to-end spatio-temporal attention model for human action recognition from skeleton data, in: Proc. AAAI conference on artificial intelligence, 2017, 31(1).
- Cheng K, Zhang Y, He X, et al. Skeleton-based action recognition with shift graph convolutional network, in: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 183–192.
- Song Y F, Zhang Z, Shan C, et al. Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition, in: Proc. 28th ACM International Conference on Multimedia, 2020, pp. 1625–1633.
- Weinland D, Ronfard R, Boyer E. A survey of vision-based methods for action representation, segmentation and recognition, in: Computer vision and image understanding, 2011, 115(2): 224–241.
- Liu J, Shahroudy A, Xu D, et al. Spatio-temporal lstm with trust gates for 3d human action recognition, in: European conference on computer vision. Springer, Cham, 2016: 816–833.
- Wen Y H, Gao L, Fu H, et al. Graph CNNs with motif and variable temporal block for skeleton-based action recognition, in: Proc. AAAI Conference on Artificial Intelligence, 2019, 33(01): 8989–8996.
- Li B, Li X, Zhang Z, et al. Spatio-temporal graph routing for skeleton-based action recognition, in: Proc. AAAI Conference on Artificial Intelligence, 2019, 33(01): 8561–8568.
- Si C, Chen W, Wang W, et al. An attention enhanced graph convolutional lstm network for skeleton-based action recognition, in: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1227–1236.
- Chaolong L, Zhen C, Wenming Z, et al. Spatio-temporal graph convolution for skeleton based action recognition, in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- Cheng K, Zhang Y, Cao C, et al. Decoupling gcn with dropgraph module for skeleton-based action recognition, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, Springer International Publishing, 2020, pp. 536–553.
- Veličković P, Cucurull G, Casanova A, et al. Graph attention networks, 2017, arXiv:1710.10903.
- Liu J, Wang G, Duan LY, et al. Skeleton-based human action recognition with global context-aware attention LSTM networks, in: IEEE Transactions on Image Processing, 2017, 27(4): 1586–1599.
- Liu J, Shahroudy A, Wang G, et al. Skeleton-based online action prediction using scale selection network, in: IEEE transactions on pattern analysis and machine intelligence, 2019, 42(6): 1453–1467.
- Song Y F, Zhang Z, Wang L. Richly activated graph convolutional network for action recognition with incomplete skeletons, in: 2019 IEEE International Conference on Image Processing (ICIP), 2019, pp. 1–5.
- Liu Z, Zhang H, Chen Z, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition, in: Proc. IEEE conference on computer vision and pattern recognition, 2020, pp. 143–152.
- Baradel F, Wolf C, Mille J. Human action recognition: pose-based attention draws focus to hands, in: Proc. IEEE international conference on computer vision workshops, 2017, pp. 604–613.
- Shi L, Zhang Y, Cheng J, et al. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks, in: IEEE Transactions on Image Processing, 2020, 29: 9532–9545.
- Shahroudy A, Liu J, Ng TT, et al. Ntu rgb+ d: A large scale dataset for 3d human activity analysis, in: Proc. IEEE conference on computer vision and pattern recognition, 2015, pp. 1010–1019.
- Liu J, Shahroudy A, Perez M, et al. Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding, in: IEEE transactions on pattern analysis and machine intelligence, 2019, 42(10): 2684–2701.
- Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift, in: International conference on machine learning, 2015, pp. 448–456.
- Ioffe S. Batch renormalization: Towards reducing minibatch dependence in batch-normalized models, 2017, arXiv:1702.03275, 2017.
- Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution, in: European conference on computer vision, 2016, pp. 694–711.
- Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization, in: Proc. IEEE international conference on computer vision, 2017, pp. 1501–1510.
- Ba JL, Kiros JR, Hinton GE. Layer normalization, 2016, arXiv:1607.06450.
- Luo P, Ren J, Peng Z, et al. Differentiable learning-to-normalize via switchable normalization, 2018, arXiv:1806.10779.
- Zhou J, Cui G, Hu S, et al. Graph neural networks: A review of methods and applications, in: AI open, 2020, 1: 57–81.
- Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks, in: IEEE transactions on neural networks and learning systems, 2020, 32(1): 4–24.
- Li C, Zhong Q, Xie D, et al. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation, 2018, arXiv:1804.06055.
- Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks, 2016, arXiv:1609.02907.
- Niepert M, Ahmed M, Kutzkov K. Learning convolutional neural networks for graphs, in: International conference on machine learning, 2016, pp. 2014-2023.
- Plizzari C, Cannici M, Matteucci M. Skeleton-based action recognition via spatial and temporal transformer networks, in: Computer Vision and Image Understanding, 2021, 208: 103219.
- Bai R, Li M, Meng B, et al. GCsT: Graph convolutional skeleton transformer for action recognition, 2021, arXiv:2109.02860.
- Zhang P, Lan C, Zeng W, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition, in: Proc. IEEE conference on computer vision and pattern recognition, 2020, pp. 1112–1121.
- Liu W, Ma X, Zhou Y, et al. p-Laplacian regularization for scene recognition. IEEE Trans Cybern. 2019;49(8):2927–2940. doi:10.1109/TCYB.2018.2833843.
- Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- Zhang B, Xiao J, Jiao J, et al. Affinity attention graph neural network for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in neural information processing systems, 2017, 30.
- Wang F, Jiang M, Qian C, et al. Residual attention network for image classification, in: Proc. IEEE conference on computer vision and pattern recognition, 2017, pp. 3156–3164.
- Ying X, Wang Q, Li X, et al. Multi-attention object detection model in remote sensing images based on multi-scale. IEEE Access. 2019;7:94508–94519.
- Bertasius G, Wang H, Torresani L. Is space-time attention all you need for video understanding?, ICML. 2021, 2(3): 4.
- Chu X, Yang W, Ouyang W, et al. Multi-context attention for human pose estimation, in: Proc. IEEE conference on computer vision and pattern recognition, 2017, pp. 1831–1840.