487
Views
5
CrossRef citations to date
0
Altmetric
Research Article

Towards accurate surgical workflow recognition with convolutional networks and transformers

, , , , , , & show all
Pages 349-356 | Received 26 Oct 2021, Accepted 01 Nov 2021, Published online: 24 Nov 2021
 

ABSTRACT

Recognising workflow phases from endoscopic surgical videos is crucial to deriving indicators that convey the quality, efficiency, outcome of the surgery, and offering insights into surgical team skills. Additionally, workflow information is used to organise large surgical video libraries for training purposes. In this paper, we explore different deep networks that capture spatial and temporal information from surgical videos for surgical workflow recognition. The approach is based on a combination of two networks: The first network is used for feature extraction from video snippets. The second network is performing action segmentation to identify the different parts of the surgical workflow by analysing the extracted features. This work focuses on proposing, comparing, and analysing different design choices. This includes fully convolutional, fully transformer, and hybrid models, which consist of transformers used in conjunction with convolutions. We evaluate the methods against a large dataset of endoscopic surgical videos acquired during Gastric Bypass surgery. Both our proposed fully transformer method and fully convolutional approach achieve state-of-the-art results. By integrating transformers and convolutions, our hybrid model achieves 93% frame-level accuracy and 85 segmental edit distance score. This demonstrates the potential of hybrid models that employ both transformers and convolutions for accurate surgical workflow recognition.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Bokai Zhang

Bokai Zhang received his MSc degree in Electrical and Computer Engineering at Georgia Institute of Technology in 2018. He is currently working at Johnson & Johnson as a Senior Computer Vision Engineer. He focuses on solving computer vision problems in surgery, such as surgical workflow recognition and surgical instrument recognition. His research interests and expertise are in image classification, object detection, image segmentation, video action recognition, and video action segmentation.

Julian Abbing

Julian Abbing received his MSc degree in Technical Medicine at the University of Twente in 2020. He is currently working as a Technical Physician and is a Ph.D. candidate in the Meander Medical Centre/the University of Twente, the Netherlands. As a Technical Physician/Ph.D. candidate, he is working on gaining insight into surgical procedures using artificial intelligence. His research focuses on conventional minimally invasive surgery and the applications of AI in the operating theatre to enable benchmarking and objectively compare surgical performance.

Amer Ghanem

Amer Ghanem is a director of Machine Learning with the Digital Solutions division at Johnson & Johnson MedTech where he leads a group of scientists and engineers that are developing Machine Learning models and AI infrastructure. Amer’s background is in Computer Science and he holds a Ph.D. in Computer Science and Engineering from the University of Cincinnati.

Danyal Fer

Danyal Fer is a general surgery resident at the University of California San Francisco East Bay and a Captain in the United States Air Force. He has worked as the Surgical Translational Research Lead in Applied Research at Johnson & Johnson Medical Devices providing insight into the development of robotic and AI systems. He is also a visiting scholar at the University of California Berkeley Automation Laboratory where he provides clinical guidance in the automation of surgical tasks. He also works with NASA and SpaceX for the development of systems for surgical care in extreme environments. The summation of these research engagements is to push developments in the transmission of surgical knowledge and action, utilising robotics and artificial intelligence for the development of surgical systems for extreme and terrestrial surgical environments.

Jocelyn Barker

Jocelyn Barker received her Ph.D. in BioPhysics at Stanford University. She has worked in data science for six years and is currently the AI/ML Modeling Lead at Johnson & Johnson Digital Solutions. She focuses on developing AI models to better quantify and understand surgical procedures and provide educational information to surgeons. Her work uses deep learning methods in computer vision and video processing combined with clinical factors and outcomes.

Rami Abukhalil

Rami Abukhalil received his medical degree (MD) at Misr University for science and technology (MUST). He finished his medical residency at the Palestinian Medical Complex (PMC). He is currently working at Johnson & Johnson as a Senior Clinical Researcher. He focuses on developing surgical guides and analysing surgical metrics that help develop AI models and provide feedback to surgeons and data scientists.

Varun Kejriwal Goel

Varun Kejriwal Goel is a physician with a background in neuroscience, currently training in general surgery at UCSF – East Bay. He is also a Senior Research Fellow with the R&D arm of Johnson & Johnson MedTech, publishing research and working as the clinical lead for several artificial intelligence models. He has particular experience in computer vision. His broader goal is to encourage cross-sector collaboration and create healthcare technologies that facilitate equitable access to outstanding clinical education and healthcare. He will transition into psychiatric residency at Cornell University in 2022, where he will continue to pursue this goal through the lens of mental health.

Fausto Milletarì

Fausto Milletarì is an applied AI lead at Johnson & Johnson in Germany. His research focuses on machine learning and deep learning methods applied to medical image analysis. He pioneered the use of volumetric neural networks for radiology image segmentation. His most known work, VNet, which has accumulated more than 3500 citations, has popularised the use of Dice loss as well as 3D convolutions. Fausto Milletari is currently leading a research team focusing on the analysis of surgical videos.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access
  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart
* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.