ABSTRACT
One of the most essential steps in the surgical workflow analysis is recognition of surgical tool presence. We propose a method to detect the presence of surgical tools in laparoscopic surgery videos, called LapFormer. The novelty of LapFormer is to use a Transformer architecture, which is a feed-forward neural network architecture with attention mechanism, growing in popularity for natural language processing, for analysing inter-frame correlation in videos instead of using recurrent neural network families. To the best of our knowledge, no methods using a Transformer architecture for analysing laparoscopic surgery videos have been proposed. We evaluate our method on a dataset called Cholec80, which contains 80 videos of cholecystectomy surgeries. We confirm that our proposed method outperforms the conventional methods such as single-frame analysis with convolutional neural networks or multiple frame analysis with recurrent neural networks by 20.3 and 17.3 points in macro-F1 score, respectively. We also conduct an ablation study on how hyper-parameters for Transformer block in our proposed method affect the performance of the detection.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Notes on contributors
Satoshi Kondo
Satoshi Kondo received his B.S., M.S. and Ph.D degrees at Osaka Prefecture University in 1990, 1992 and 2004, respectively. He joined Matsushita Electric Industrial Co., Ltd. (now Panasonic Corporation) in 1992. He mainly developed video coding and computer vision technologies during he was with Panasonic Corporation. He holds over 100 patents on H.264/MPEG-4 AVC video coding standard. Since 2014, he is with Konica Minolta, Inc. and he is the head of AI technology development department now. His current research interests are in the fields of image processing and computer vision, especially for medical images.