3,082
Views
1
CrossRef citations to date
0
Altmetric
Review Articles

Eye-Tracking Indicators of Workload in Surgery: A Systematic Review

, , , , &
Pages 1340-1349 | Received 08 Oct 2021, Accepted 28 Dec 2021, Published online: 17 Jan 2022

Abstract

Background

Eye tracking is a powerful tool for unobtrusive and real time assessment of workload in clinical settings. Before the complex eye tracking derived surrogates can be proactively utilized to improve surgical safety, the indications, validity and reliability requires careful evaluation.

Methods

We conducted a systematic review of literature from 2010 to 2020 according to PRISMA guidelines. A search on PubMed, Cochrane, Scopus, Web of science, PsycInfo and Google scholar databases was conducted on July 2020. The following search query was used” ("eye tracking" OR "gaze tracking") AND (surgery OR surgical OR operative OR intraoperative) AND (workload OR stress)”. Short papers, no peer reviewed or papers in which eye-tracking methodology was not used to investigate workload or stress factors in surgery, were omitted.

Results

A total of 17 (N = 17) studies were identified eligible to this review. Most of the studies (n = 15) measured workload in simulated setting. Task difficulty and expertise were the most studied factors. Studies consistently showed surgeon’s eye movements such as pupil responses, gaze patterns, blinks were associated with the level of perceived workload. However, differences between measurements in operational room and simulated environments have been found.

Conclusion

Pupil responses, blink rate and gaze indices are valid indicators of workload. However, the effect of distractions and non-technical factors on workload is underrepresented aspect in the literature even though recognized as underlying factors in successful surgery.

Introduction

Eye tracking surrogates for surgical workload

In many skilled-operator domains, poorer performance is linked to variations in task workload.[Citation1,Citation2] The workload assessment remains a challenge in contemporary art of surgery, contributed by technical and non-technical skills underlying surgeon’s performance and adjacent patient safety.[Citation3] Eye tracking has been a promising approach for unobtrusive and real-time measurement of workload in various clinical environments.[Citation4–6] Before the eye tracking can be proactively utilized to improve patient safety, its indications, validity, and reliability needs to be thoroughly addressed.[Citation4]

Importance of workload in surgery

Workload is an individual multidimensional construct describing interactions between an operator and task demands.[Citation7,Citation8] These dimensions include mental demands, physical demands, temporal demands, task complexity, situational stress, and distractions.[Citation2] When excessive workload is perceived, the demands outweigh surgeon’s mental capacity and optimal performance could be at risk. For example, in surgery excessive workload is commonly perceived when complications or unexpected events occur during surgery.[Citation3,Citation4] Physiological foundation of this stress reaction is grounded in the cognitive workload theory stating that the working memory capacity is a finite resource.[Citation9] The more operator has to use top-down processing and the more working memory is reserved, eventually it starts to negatively affect to operator’s performance.[Citation8,Citation10] Numerous factors contribute to the perception of workload, such as task difficulty, time pressure, distractions, and operator imposed states such as skill level and anxiety.[Citation2,Citation11] Experienced surgeons illustrate lower levels of workload, which associate to higher efficient information processing, increased automation, and better surgical skills.[Citation3,Citation4,Citation12] The lower workload and increased focus on task relevant details releases mental resources for other important processes, for example communication and situational awarness.[Citation13] But, even though the more experienced surgeons have consistently lower levels of workload, the experience itself does not have direct relationship with performance and outcomes.[Citation14,Citation15]

Evaluation of workload in surgery

One of the tools created to evaluate surgeons’ workload is the self-reported surgery-specific task load index (SURG-TLX).[Citation2] The SURG-TLX is based on a widely used multidimensional workload assessment tool NASA-TLX2. Since workload is a major factor underlying successful surgery, it is important for its applications that we can measure it objectively.[Citation16] Prior research have focused on quantification of the mental workload with respect to the physiological monitors across several fields[Citation17,Citation18] including surgery.[Citation4] Many of the applied physiological sensors present ergonomic challenges for the surgeon and cannot be safely adopted to operating rooms. However, selected techniques such as eye-tracking can be embedded to a surgical workflow (e.g. as wearable eye-tracking glasses, monitor plugin, or addon on microscope oculars) without major ergonomic limitations. Despite being a broadly investigated method, the eye-tracking has been cautiously valuated due to the complexity of analysis and indications. The slow, but steadily increasing commercial interest in the applications of surgical eye tracking have seemed rational as the observable quantities are indeed a mixture of signals reflecting underlying individual neural processes.[Citation19–21] Nevertheless, the method’s foremost merit lies in its unobtrusiveness, and unlike most other biological monitors[Citation22] – it can be introduced to the operation room with little effort from the surgical team. Because of this highly desired feature, we examine in detail what the eye tracking methodology offers in the evaluation of workload and the potential indications.

According to the database search, which is discussed in methods section, there have been no previous systematic reviews about eye tracking on measuring multidimensional workload in the surgical setting or in other domains. This work systematically reviews available literature and offers concrete foundations for applications of eye tracking in surgical workload assessment.

Methods

We conducted systematic review according to PRISMA guidelines.[Citation23]

Search strategy and data sources

Search on PubMed, Cochrane, Scopus, Web of science and PsycInfo databases was conducted in June 2020. Google scholar was also searched for relevant studies. The following search query was used:” ("eye tracking" OR "gaze tracking") AND (surgery OR surgical OR operative OR intraoperative) AND (workload OR stress)”. Experienced university medical librarian was consulted so that all the relevant databases would be searched. In addition, the reference lists of the selected manuscripts were searched for additional studies.

Inclusion criteria and data extraction

After elimination of duplicates, results were sought for relevant titles and abstracts. The full text versions of the studies were evaluated according to the inclusion and exclusion criteria given in the . In short, we included studies that assessed the workload of the participant during surgical operation with eye-tracking method. Books, conference papers, and reviews were excluded. Six additional studies were considered relevant and included from the reference lists of the results. Two independent authors performed the data extraction, which included type of study, surgical technique, surgical task, number of participants, experience of participants, eye-tracking hardware, eye movements, computed features and the method of data evaluation. The data is presented in the .

Table 1. Included studies.

Quality assessment

We used the GRADE assessment protocol (The Grading of Recommendations Assessment, Development and Evaluation)[Citation24] to assess the degree of evidence. It is a four-step scale (A-D). The starting point of the evaluation is to define the experimental question (PICO), outcome parameters, the type of the study (interventional or observatory). After that assessment factors were the quality of the study (methodology) and quality of the results (confidence interval, number of participants). Observatorial studies primarily produce C grade of evidence.

Results

The initial search resulted in 116 publications, which were reduced to 80 results after the elimination of duplicates. In total 43 full-text publications were retrieved for analysis, with final inclusion of 17 studies in the final qualitative synthesis. The characteristics of the articles were mostly graded C (N = 16), and only one of the studies was graded B (N = 1). With respect to surgical techniques, the reviewed sample consisted of two microsurgical; 11 studies of laparoscopic simulation; three studies of live surgery (two of them were open surgeries and one was laparoscopic surgery); and one study in the simulated robotic environment ().

Figure 1. Flow diagram.

Figure 1. Flow diagram.

Overview of the studied eye-tracking metrics

In total 12 studies reported pupil responses, four blink rate, three gaze entropy, six fixations or saccades. Ten (n = 10) studies examined differences in the experience levels. Eight (n = 8) studies evaluated the correlation between eye parameters and task complexity. Subjective workload evaluation scores were reported in seven studies (six using NASA-TLX and one using SURG-TLX). In addition, electromyography (EMG) and heart rate variability (HRV) were used in two studies. Overview of the studies is presented in below.

Pupil responses

Originally over 50 years ago Hess & Polt and Beatty & Kahneman measured task evoked pupillary responses (TERP).[Citation25,Citation26] TERP is a peak in pupil dilation related to event or action that operator takes. Another fundamental measure is mean pupil dilation which can be calculated by establishing baseline and then averaging pupil diameter in desired quantity of frames and subtracting baseline.[Citation25] As the technology developed and the number of eye tracking studies increased, variety of measures have been derived from the pupil size, such as percentual changes, gradients of pupil size change slope and rate of change in diameter (mm/s). In addition, the index of cognitive activity (ICA) tries to minimize the effect of light by measuring only the rapid pupil dilations.[Citation27]

The effect of expertise and experience on pupil size

Four studies reported differences in pupil size measures in relation to experience or expertise in simulated surgery. In simulated microsurgical setting, Bednarik et al. measured the percentage change in pupil size between experts and novices in the suturing task. They found that pupil dilated at different segments of a suture depending on expertise.[Citation13] In simulated laparoscopic suturing task[Citation28] Cao et al found that surgical experience measured by number of surgical cases did not correlate with the pupil size dispersion, but shorter task completion time did.

During simulated endoscopic non-surgical scenarios by Menekse et al, different hand conditions pupil size of novice surgeons was larger than intermediate surgeons.[Citation29] Similarly, during simulated laparoscopic appendectomy in operation-room noise or silent background more experienced operators had smaller changes in pupil sizes.[Citation30]

One study analyzed workload in both simulated and clinical surgery. Richstone et al. used the ICA[Citation31] responses to compare experts with non-experts. ICA indicated lower workload in the more experienced group.[Citation32] Both groups had significantly higher ICA values in clinical setting compared to the simulation setting.

Two studies were conducted solely in clinical settings. Tien et al. compared the maximum pupil size, the pupil rate of change, and the predictability of pupil size change (entropy) between expert and novice surgeons during live open inguinal hernia repair. In this scenario, the novice surgeons had increased pupil sizes and pupil entropy during various stages of operation. The expert surgeons had lower task duration and NASA-TLX mental demand scores.[Citation32]

Erridge et al. conducted a study in laparoscopic Roux-en-Y gastric bypass and measured the differences in maximum pupil diameter and the rate of change in pupil diameter between novice and expert surgeons. They reported the larger maximum pupil diameters among novice surgeons during all stages of the operation. However, contradictory to Tien et al., greater rate of the change in the pupil diameter was reported among the novice surgeons.[Citation33]

The effect of task complexity on pupil size

In total seven studies analyzed pupil size with respect to the task difficulty in surgical simulated settings.

Jiang et al. examined the average adjusted pupil size and the rate of change of pupil diameter in three simulated laparoscopic subtasks of increasing difficulty. Overall, the average adjusted pupil size revealed no differences in relation to the task difficulty but the pupil size rate of change did. Contradictory results were reported by Zheng et al. who measured the adjusted pupil diameter changes in three different complexity laparoscopic tasks in the simulated environment. They found that change in pupil size increased with the task difficulty.[Citation34]

Wu et al. conducted the exploratory study in the robotic environment (The da Vinci Surgical System) to examine the relationship between the normalized pupil diameter and the perceived workload (NASA-TLX) in the robotic surgical tasks with variable complexity levels. They recorded 15 robotic skills simulation sessions over four months period and participants performed up to 12 simulated exercises in each session. Wu et al. observed that the increasing task difficulty was correlated with the increase in pupil diameter, indicating that the change in pupil responses are sensitive to task difficulty.[Citation35] However, in contrast to other studies a linear relationship was not observed between the pupil diameter and the NASA-TLX scores.[Citation35]

Menekse et al. also observed that the hand condition (using dominant, nondominant and both hands) had a significant effect on the pupil size of surgeons. Under both-hand and non-dominant conditions, the pupil size was higher compared to the dominant hand condition. Nonetheless, both-hands condition was associated with the largest pupil sizes.

In a second study, Menekse et al. focused solely the effect of hand condition and evaluated the pupil size in simulated 3D ear-nose-throat endoscopic tasks. The simulated task was executed under three conditions: dominant hand, nondominant hand, and both hands. In this case, the both-hand condition was related to the largest pupil sizes.[Citation36]

Zhang et al. assessed surgeon’s mental workload using the change in pupil size and physical workload using the EMG in the virtual laparoscopic cholechystomy. They reported that pupil size increased in every stage of the operation and physical and mental workload were non-synchronous.[Citation37]

The effect of distractions on pupil size

Operation theater is a complex working environment prone to task-related distractions, which have been associated with increased workload,[Citation3,Citation4] but only one study has investigated the effect of distractions on the pupil size. Gao et al. compared mental workload during the laparoscopic appendectomy under the exposure to genuine operation-theater noise, music, and silence. Authors observed that OR noise caused increase in pupil size, suggesting that pupil size might be sensitive to measure the effect of distractions on surgeon.

Blink rate and duration

Three studies examined surgeon’s blink rate and duration in simulated setting. Blink event is commonly recognized when eye lid covers pupil.[Citation38] However, blinks events are often defined with varying proportions of eye lid coverage. Duration is defined as interval between the start of upper eye lid drop and return to baseline or approximated as proportion of time when pupil is not detected.

Bednarik et al. investigated whether expertise and task complexity are reflected in the blink rate in the microsurgical suturing task. In this scenario, the blink rate was not affected by expertise and experts and novices blink rate patterns closely resembled each other even though novices reported higher SURG-TLX scores.[Citation38] However, the differences in blink rate were observed between segments of the suturing task, indicating that blink rate is controlled by task difficulty.

Somewhat codirectional results were reported by Zheng et al., who assessed surgeons’ mental workload in the laparoscopic procedure on a virtual reality trainer by using blink frequency and blink duration. They found that NASA-TLX scores were somewhat correlated with the blink frequency.[Citation39] Surgeons who blinked less frequently made increased effort on performance and reported higher frustration and overall workload. However, overall NASA-TLX score was not correlated with the blink frequency or duration.

Codirectional results were found in robotic environment by Wu et al. who measured the percentage of eyelid closure (PERCLOS) with respect to NASA-TLX scores and task complexity. In their study, PERCLOS was not correlated with the NASA-TLX scores or task complexity.

The only study considering blinks in live clinical setting was conducted by Erridge et al., who found that novice surgeons had higher blink frequency than experts in all the recorded phases of the operation. Compared to Bednarik et al., these findings suggest that the live surgical setting causes distractions which impair concentration especially among novice surgeons. This finding is also supported by the fact that novice surgeons had an increased number of fixations to irrelevant aspects of the operation compared to experts, which we will discuss later.

Fixations and saccades

In principle, fixation is recognized as a steady eye gaze or focus on an object or area of interest.[Citation40] Fixation duration is the time of steady gaze, often named as dwell time.[Citation33, Citation41] Saccade is defined as a rapid movement from one fixation point to another.[Citation40] Quiet eye (QE) time, which was reported by Causer et al., is defined by last fixation on a specific location before the critical movement. For example, when surgeon initiates certain action (suturing), QE is the last area of interest that surgeon looks (point where needle pierces tissue). Longer quiet eye (QE) has been linked with expertise and better performance.[Citation42, Citation43]

Six papers reported on fixations and saccades during surgery. Menekse et al. measured fixation frequency, fixations durations, saccade frequency and saccade durations and compared them between novice and intermediate groups in four different simulated endoscopic tasks. They found that novice surgeons had an increased number and duration of fixations and saccades. Both nondominant and both-hand conditions were associated with the increase fixation’s and saccade’s count and durations.[Citation36] Causer et al. examined the effect of training program on knot tying performance. They observed that complex tasks and higher anxiety results were linked to the increase in fixations, decrease in quiet eye time.[Citation44] Contradictory results were reported by Wu et al., who measured the total time spent in fixations. They did not find correlations between fixation duration and NASA-TLX or task complexity.

Richstone et al. found that expert surgeons had increased proportion of fixation duration in both simulated and clinical environments, indicating increased focus during surgical performance.

In other clinical environment studies, gaze movements were analyzed with respect to the areas of interest (AOI) in the operation theater. Erridge et al. measured fixation frequency and fixation duration to the numerous predefined AOIs in the operating theater. They found differences in the fixation frequency and durations suggesting that experts had increased focus to task relevant areas compared to novices. Similar results were reported by Tien et al. in the open surgery.

Gaze entropy

Gaze entropy is a statistical construct of gaze dispersion which is calculated with Shannon’s entropy.[Citation45] It is a measure of the average uncertainty over the position of gaze on eye tracking scene on an instant in time measured in bits.[Citation46]

Di Stasi et al. have investigated the effect of task complexity on the gaze entropy of surgical residents in simulated laparoscopic environment. Their findings revealed that both gaze entropy and NASA-TLX linearly increased with task complexity.[Citation46] In their second study, gaze entropy was studied between surgical trainees and attending surgeons in two different surgical procedures of laparo-endoscopic single-site surgery (LESS) and multiport laparoscopy surgery (MPS). In addition, two tasks of different complexity levels (Low: Pattern Cut versus High: Peg Transfer) were examined. In these scenarios, gaze entropy was higher in the LESS procedure (from which none of the participants had no previous experience) than in the MPS procedure in both groups.[Citation47] In addition, higher NASA-TLX scores were recorded LESS procedure. However, no differences in gaze entropy or velocity were reported between tasks or groups. Similar findings were reported by Wu et al., who found that gaze entropy was positively correlated with NASA-TLX scores and task complexity.

Discussion

Majority of the studies measured workload in simulated surgical settings. Task difficulty and expertise were the most studied factors. The eye-tracking methodology in the reviewed papers was based on the same principles of video-based measurement. The size of pupils and movement eyes was registered with machine vision technology from a camera pointed at the eyes of the subject.

Eye-tracking methodology

There were three generally different approaches to how eye tracking was realized: head-mount,[Citation33, Citation35] mounted on display[Citation28,Citation29] or embedded in the microscope.[Citation13] The chosen method dictates availability and quality of registered parameters. In addition, the supply of parameters varies by the make of the devices.[Citation48] Head-mount eye tracking uses ergonomically designed lightweight goggles or minimalistic eyeglasses kind of instrumented frame. This design seems to give the highest degrees of freedom for measurement as the eyes can be tracked to whichever direction the subject’s head is pointing. Also, head-mount systems are generally able to record the camera view the subject is looking at. Ideally this could provide semantic information about the content the subject is observing. However, due to complexity of interpreting the scene, in practical applications the measured parameters may be statistical measures of gaze dynamics, such as gaze entropy. Display mounted eye track seems to be the simplest execution and perhaps the most common outside surgical setting. Therefore, it may be also most robust technology despite limitations especially during open surgery or microscopic surgery. It has obvious application during laparoscopies and endoscopies where surgical visualization is via the same screen where eye-tracking device is connected. Embedding the eye-tracking device to operative microscope is perhaps the only possibility for eye-tracking during microsurgery employing microscopes. However, there were only two studies from a single group using this technology.

Eye tracking indicators of workload

There was converging support of using pupil responses in workload estimation in surgery.[Citation32,Citation33,Citation49] Supporting the usefulness of pupil responses they have been frequently linked to increased levels of arousal, task complexity, memory load and mental states in other fields[Citation25,Citation26] and considered as strong and continuous indicator of workload.[Citation50] Physiological basis for measuring workload from the pupils originates from innervations between autonomous nervous system and the muscles controlling pupil size.[Citation25,Citation50] However, the main function of the pupil is to control incoming light[Citation25] and for that reason, pupil changes due to light are common source of error,[Citation25] especially in operation theater were lightning condition are hard to control. The time interval of pupil responses must be carefully selected according to the phenomenon under investigation. For instance, mean pupil size was not sensitive enough to recognize changes in workload during subtask difficulty but rate of change was.[Citation51] Despite these limitations pupil responses can reliably be used to distinguish between novice and experienced surgeons [Citation13, Citation32, Citation33, Citation49] and to evaluate workload of the surgical task.[Citation29, Citation34, Citation35, Citation51] Lower levels of surgical workload are correlated with smaller pupil size measured in TERP’s and mean pupil dilation.

According to current literature increased blink rate is somewhat correlated with increased workload in surgery.[Citation33, Citation38, Citation39] Prior research outside surgery has linked the changes in blink rate to mental fatigue, attention, and stress.[Citation35, Citation38, Citation39] The physiological function of blinking is maintaining good vision by creating a tear film on the eye surface; therefore, the blink rate is affected by environmental factors such as temperature, humidity, and lighting conditions.[Citation52] The endogenous blinking rate is modulated by the dopamine levels in the central nervous system and, therefore, it is considered an indicator of mental state.[Citation53,Citation54] Other features such as blink duration, amplitude, tear film integrity, and eyebrow frowning have been examined with respect to variety of mental states, too.[Citation54,Citation55] Blink rate was not correlated with overall NASA-TLX scores in simulated surgery but seems to be sensitive for surgical focus and effort.[Citation38, Citation39] Only one study measured blink rate in clinical surgery environment, novice surgeons had higher blink frequency than experts in all the recorded phases of the operation which indicated better focus among expert group.[Citation33] However, blink duration does not seem to be sensitive for workload measurement in surgery.[Citation35, Citation39] Blink duration is commonly used as an indicator in driver alertness research. However, surgery is much more mentally and physically demanding which may explain the insensitivity of blink rate to increasing surgical workload. Blink rate seems to be potential indicator of focus and effort in surgery.[Citation33, Citation38, Citation39]

There is promising data from both clinical and simulated surgery that fixations and saccades depend on the expertise and workload in surgery.[Citation32,Citation33,Citation49] Beyond surgery, expertise has been linked to shorter fixations durations, longer saccade durations and more fixations to the task relevant area due to more efficient information processing.[Citation1,Citation56,Citation57] In surgery, experts ignore unnecessary information resulting in increased fixation durations and number of fixations to task relevant areas. Particularly in clinical settings, less experienced surgeons are more prone to the impaired focus which manifested as increase in fixations on task irrelevant details.[Citation32,Citation33,Citation49] Therefore, fixation rate and duration measured on areas of interest serve as a surrogate of expertise in clinical setting. Somewhat similar to fixations and saccades, eye movements are not only sensitive for measuring workload but more efficient gaze strategies could reduce workload.[Citation44] Quiet eye time is reliable indicator of performance and expertise in sports and medicine.[Citation44, Citation58] In surgery, quiet eye time training resulted in better performance and reduced anxiety during a simulated surgical task.[Citation44] Learning efficient and optimal gaze strategies might be beneficial especially for surgical trainees who suffer more frequently from higher workload. Therefore, fixations and saccades are applicable indicators of workload in surgery.

Three studies support the use of gaze entropy as a measure of workload in simulated surgery.[Citation35, Citation46] Originally in the field of aviation, it was discovered that gaze exploration patterns become more random as the workload increases.[Citation59, Citation60] The advantage of measuring gaze entropy is that, unlike pupil size, it is not depended on environmental factors such as light or emotional state.[Citation46] Therefore, it might present a more reliable measure of workload in clinical environments where lightning conditions are challenging to control. Correspondingly, gaze entropy correlated with NASA-TLX workload scores in simulated surgical environment in three studies.[Citation35, Citation46, Citation47]

Some limitations should be considered when interpreting the results of this systematic review. There is overlap in the literature regarding to the term workload. For instance, terms mental strain, cognitive workload and mental demand were used as synonyms. Secondly, there was wide variability of application and analysis of eye tracking data, which diminishes the generalizability results. Studies primarily employed simulated surgery and therefore results may not be directly applicable to clinical surgery. For instance, the effect of distractions and non-technical factors on workload were rarely considered even though they were recognized as underlying factors in successful surgery.[Citation61] However, based on data recorded during clinical surgeries these indicators show great potential in workload measurement in operation theatre.[Citation32,Citation33,Citation49] To improve the relevance of eye tracking studies, future research should focus on the effects of cumulative workload from various sources especially during clinical surgery.

Conclusion

Pupil responses, blink rate and gaze indices seem valid indicators of workload in surgery based on data originating mainly from simulated surgery. In a real clinical surgery workload is a complex entity. Triggering a genuine workload in simulation is challenging. Therefore, the effect of distractions and non-technical factors on workload are most likely underrepresented aspect in the literature.

Disclosure statement

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article.

References