Abstract
Interaction recognition in videos with body pose is gaining remarkable attention due to its speed and robustness. Recently proposed recurrent neural network (RNN) and deep ConvNets-based methods are showing good performances in learning sequential information. Despite these good performances, RNN lags behind in learning spatial relation between body parts, while deep ConvNets requires huge amount of data for training. We propose a traversal-based three-layer neural network (TNN), followed by pairwise interaction framework (PIF) for interaction recognition. We also propose a novel algorithm for tracking humans in successive frames. The proposed algorithm computes collective traversal of individual body parts across the frames and feeds to TNN to learn effective representation of complex actions. The PIF model combines confidence scores of a pair of action labels corresponding to an interaction for final interaction prediction. We evaluate the approach on two publicly available datasets i.e. UT-Interaction and SBU Kinect Interaction. Results show that our proposed approach outperforms the state-of-the-art methods.
Notes
1 There may possibility of interaction class with two active performers such as Hug and Handshake. We have divided these classes into two labels i.e. Left and Right as shown in Figure (c).
Additional information
Notes on contributors
![](/cms/asset/cc83553a-f835-4e50-9537-666f68a41b1d/tijr_a_1802355_ilg0001.gif)
Amit Verma
Amit Verma is currently a PhD research scholar in the Department of Electronics and Communication at National Institute of Technology Raipur, India. His research interest includes image processing, computer vision, and neural network. Email: [email protected]
![](/cms/asset/56f654a8-1533-425b-b06c-000837695857/tijr_a_1802355_ilg0002.gif)
Toshanlal Meenpal
Toshanlal Meenpal is currently an assistant professor in the Department of Electronics and Communication at National Institute of Technology Raipur, India. He obtained his PhD from Bhabha Atomic Research Centre (BARC), Mumbai under the aegis of HBNI University, Mumbai. He did his master’s degree in automation and computer vision engineering from Indian Institute of Technology, Kharagpur in 2005. Before switching to academics, he has also worked for 5 years as a design engineer at ST Microelectronics and Nvidia Graphics in the Multimedia playback-related R&D groups. His research interests include multimedia security techniques like digital watermarking, steganography, cryptography as well as image processing and image analysis.
![](/cms/asset/ce5fc607-f230-4187-8ff3-99cf03455df8/tijr_a_1802355_ilg0003.gif)
Bibhudendra Acharya
Bibhudendra Acharya was born in India, on June 30, 1978. He graduated from Dr B A Marathawada University, Aurangabad, India in electronics and telecommunication engineering in 2002 and did his MTech from National Institute of Technology, Rourkela, Odisha, India in telematics and signal processing in 2004 and the PhD degree from National Institute of Technology, Rourkela, Odisha, India, in 2015. He is currently serving as an assistant professor in the Department of Electronics and Communication, NIT, Raipur. He has more than 40 research publications in national/international journals and conferences. His research areas of interest are cryptography and network security, microcontroller and embedded system, signal processing and soft computing. Email: [email protected]