1,674
Views
1
CrossRef citations to date
0
Altmetric
Opinions

Response to commentaries to Luck et al. (2021). Progress toward resolving the attentional capture debate

ABSTRACT

In a review paper, Luck et al. [(2021). Progress toward resolving the attentional capture debate. Visual Cognition, 29(1), 1–21. https://doi.org/10.1080/13506285.2020.1848949] discussed multiple perspectives on the attentional capture debate. In response to this review paper, several commentaries were written. Here, I respond to these commentaries and discuss those issues that seem to be the most pressing in the discussion on whether progress is made. I particularly focus on issues that are relevant from my point of view. I have organized the reply to the commentators into separate sections and addressed the issues that were commonly raised.

In the current response to the commentaries, I discuss those issues that seem to be the most pressing in the discussion on whether there is progress in the attentional capture debate. I particularly focus on issues that are relevant from my point of view. I have organized the reply to the commentators into separate sections and addressed the issues that were commonly raised.

Multifaceted attentional capture

Attentional capture is studied by measuring the effect of an irrelevant stimulus on task performance. The typical way to determine the effect of an irrelevant stimulus is to examine RTs in conditions in which the irrelevant salient object is present and compare this to conditions in which the salient object is not present. If there is a large difference in RT between these conditions, we argue that there is strong capture as there are strong distractor costs; if there is hardly any increase in RT then we argue that the object did not capture attention possibly because the object is not salient enough and/or the object, its features or its location has been suppressed (Luck et al., Citation2021). In several commentaries it is argued that we should examine issues related to the capture of attention and to the subsequent consequences of capture. In previous papers (Theeuwes, Citation2010; Theeuwes & Failing, Citation2020), I discussed the conditions that give capture, and conditions that result in either fast or slow disengagement of attention from the item that initially captured attention. I suggested the rapid disengagement hypothesis (Theeuwes, Citation2010; Theeuwes et al., Citation2000) to describe conditions in which attention is capture but immediately disengaged.

In his commentary Zivony (Citation2021) echoes this, arguing that it is possible that there is attentional capture without any attentional engagement. He refers to an elegant study by Zivony and Lamy (Citation2018) in which it was shown that both relevant and irrelevant abrupt onset always captured attention but only when the abrupt onset had a colour that matched that of the target there was attentional engagement. The findings of Zivony and Lamy (Citation2018) fits perfectly with the way I have explained contingent capture by Folk et al. (Citation1992). I have argued that in the Folk et al paradigm both the relevant and irrelevant cues have captured attention but only when the cue matched the features of the target, disengagement may have been slow. I argued

In the spatial cueing paradigm of Folk et al. (Citation1992) disengagement of attention from the cue may have been relatively fast when the cue and target do not share the same defining properties (e.g., the cue is red and the target is an onset), while disengagement from the cue may be relatively slow in the case where the cue and target share the same defining properties (e.g., both were red). Such a mechanism could explain why there are RT costs when the cue and target have the same defining characteristics and no costs when cue and target are different. (Theeuwes, Citation2010, p. 93)

This is also similar to claim raised by Al-Aidroos (Citation2021) in his commentary in which he points out that the discussion between Folk et al. (Citation1992) and Theeuwes (Citation1992) is related to multiple attentional processes that play a role when an object captures attention.

Similarly, Geng and Duarte (Citation2021) make a plea that the difference between proactive and reactive suppression may not be a dichotomous distinction but may be graded. Indeed, eye movement studies have shown that fixation durations on irrelevant distractors may be very short (less than 100 ms) indicating that there was very little dwelling of the eye at the location of the distractor (Born et al., Citation2011; Geng & Diquattro, Citation2010; Godijn & Theeuwes, Citation2002). The critical point to take from these commentaries is that the RT difference between a condition in which the distractor is present versus when it is absent does not reflect only capture but also attentional engagement. A large difference in RT between conditions indicates that there was capture and engagement; a small difference in RT may indicate capture and rapid disengagement (Theeuwes, Citation2010). Born et al. (Citation2011) showed that a salient object that is completely unrelated to the task at hand gives capture with little to no engagement, while a salient object that has features similar to the target gives capture and engagement. Indeed, when the distractor looks like the target, processing is needed (i.e., engagement) to determine whether the object that captured attention is the target or the distractor. Top-down processes affect the speed with which attention can be disengaged (Theeuwes, Citation2010) explaining why in Zivony and Lamy (Citation2018) abrupt onsets that did not share the target colour showed capture but no engagement.

In his commentary, Anderson (Citation2021) argues that we should not call it attentional capture anymore but instead embrace the term attentional priority. Even though interesting views are presented here, in my view it is absolutely crucial to examine attentional capture and the multifaceted processes underlying it. As we have outlined here: attentional capture is basically bottom-up while the speed with which attention can be disengaged is largely top-down (Theeuwes, Citation2019). If we all start calling it attentional priority all these processes are thrown on a big pile and all subtleties are lost.

The need for considering all subtleties involved in attentional selection and how it develops over time (i.e., capture and attentional engagement) is also echoed by commentaries of Donk (Citation2021) and of Hickey and van Zoest (Citation2021). In both commentaries it is stressed that salience drives visual selection only briefly. Specifically, Hickey and van Zoest (Citation2021) argue that attentional control is established as soon as high-level information propagate back to lower low-level representations. Recent studies involving the additional singleton paradigm in which two salient distractors were simultaneously present confirms these claims. After attention was initially captured by the most salient singleton, attention did not go to the less salient distractor suggesting that after initial capture, the salience signal of the other singleton was no longer present (Duncan & Theeuwes, Citationin prep; but see Schreij et al., Citation2014 for different result).

Control states

In previous papers we have suggested that there is little control over attentional capture except in conditions in which the spatial priority map plays no role and participants rely on serial clump-wise search through the display (Theeuwes, Citation2004, Citation2010; see also Liesefeld et al., Citation2021). For example, we showed that when the attentional window is small (i.e., when engaged in clump-wise search) there is no capture outside the attentional window (Belopolsky & Theeuwes, Citation2010; Theeuwes, Citation1994). Using the measurement of eye movements, we have shown that there is little awareness that the eyes were captured (Belopolsky et al., Citation2008; Theeuwes et al., Citation1998) even though recent studies have shown that when providing feedback one can make observers aware of that their eyes have been captured (Adams & Gaspelin, Citation2021). Finally, statistical learning takes place basically outside awareness (Duncan & Theeuwes, Citation2020). In several studies we have shown that observers learn to suppress the location that is most likely to contain a distractor even though they cannot report which location it is (Wang & Theeuwes, Citation2018a, Citation2018b). In one study, the distractor was thirteen times more likely to be presented in one particular location and even in those conditions only about half of participants were able to report the location (Wang & Theeuwes, Citation2018b).

Given these findings, the commentary of Leber (Citation2021) is somewhat of a misnomer as he suggests that in order to understand attentional selection, motivation and strategy should be taken into account. In our view, the role of strategy and motivation is very limited. The only strategy to reduce capture is to slow down. If observers choose to respond slower (delay the initial saccade, for example after committing an error), capture will be reduced because with the passing of time salience plays less of a role (Donk, Citation2021; Hickey & van Zoest, Citation2021; Theeuwes, Citation2010). Also strategy plays basically no role when learning to suppress a location. We showed learning to suppress was equally effective regardless of whether participants had to perform an additional (very loading) spatial working memory task (Gao & Theeuwes, Citation2020). Also, we showed that even when participants were passively exposed to displays that contained statistical regularities, they picked up on these (Duncan & Theeuwes, Citation2020) and used them later to improve their search. Again, because the passive exposure to a display is enough to learn regularities it is highly unlikely that strategy and motivation plays a role.

In the commentary of Geng and Duarte (Citation2021) this aspect is also stressed as they indicate that distractor suppression does not dependent on explicit goals but is merely the result of automatic sensory processes. In her commentary Won (Citation2021) describes passive distractor filtering (e.g., Turatto et al., Citation2018) as a mechanism which can reduce the interference of distractors. In an elegant study Won and Geng (Citation2020) showed that before search, the passive exposure to stimuli sharing the same colour as the distractors during search, reduced interference. Again, for low level habituation processes, there is little room for motivation, strategy and awareness. This also fits with the findings discussed in the commentary of Zhang et al. (Citation2021). They describe a study in which participants had to indicate whether they were on-task or were mind-wandering and it turned out that this had no effect on the amount of capture. This suggests again a limited role for the effect of the attentional state, strategy or motivation on capture. Zhang et al. (Citation2021) did show however that pre-trial stability of gaze had an effect on oculomotor capture. When eye stability was low there was more capture, which Zhang et al. (Citation2021) interpreted this a evidence for goal-neglect (i.e., weakened signal suppression). While feasible, this effect is more likely related to reduced inhibition of superior colliculus (SC) cells that maintain eye fixation (Munoz & Wurtz, Citation1993). When the eyes are not stable at fixation, very fast saccadic eye movements are launched giving rise to oculomotor capture (Mulckhuyse et al., Citation2009; Saslow, Citation1967).

In her commentary Leonard (Citation2021) also concludes that in the target article of Luck et al. (Citation2021) there is little consensus about a unitary mechanism of control (i.e., control state). Instead of seeking a singular entity of control, she argues to consider the mechanism of biased competition operating across multiple levels throughout cortex (Desimone & Duncan, Citation1995) as a way to understand the dynamics of attentional selection.

In their commentary Schmid et al. (Citation2021) indicate that participants are usually not aware that attention or their eyes were captured (see Belopolsky et al., Citation2008; Theeuwes et al., Citation1998), even though recent studies show that when observers are asked each trial whether their eyes were captured some awareness of capture can develop (Adams & Gaspelin, Citation2021). Importantly, making observers aware that their eyes were captured did not affect subsequent capture by salient items, providing additional evidence that the attentional control state plays basically no role.

In their commentary Henare and Schubö (Citation2021) make a plea for the role of task set arguing for the use of a voluntary choice paradigm in which participants can choose each trial between one of two tasks (in this case search or categorization). It is argued that this should give better attentional control even though the preliminary RT data presented in the commentary seem to suggest otherwise as attentional capture was equally strong in the voluntary and non-voluntary task.

Overall, findings that there is little awareness of capture (Belopolsky et al., Citation2008), little awareness of the learned regularities present in the display (Wang & Theeuwes, Citation2018b), little to no influence of task set, little to no influence of motivation and the finding that passive exposure to displays with regularities is sufficient to learn then (Duncan & Theeuwes, Citation2020; Won & Geng, Citation2020) all indicate that the control state has little to no effect on these processes. It suggests that there are many similarities between bottom-up attentional capture and capture due to selection history (see Theeuwes, Citation2019). As we argued (Theeuwes, Citation2018), both these processes are automatic, implicit, operate largely outside awareness and they run off without much, if any, attentional control. Similar ideas were put forward in the classic work of Shiffrin and Schneider (Citation1977) who argued that automatic process represents the activation of a sequence of nodes that “nearly always becomes active in response to a particular input configuration,” and that “is activated automatically without the necessity for active control or attention by the subject” (p. 2).

Theories and frameworks

In their commentary, Slagter and van Moorselaar (Citation2021) argue that the well-known account of predictive coding (Friston, Citation2010) may provide a unified theoretical perspective that can account for the findings on attentional capture including learned distractor suppression and activation. They argue that “capture is a logical consequence of the overall imperative of the brain to predict what sensory signals provide precise information to achieve goal-directed behavior”. Even though it presents an interesting viewpoint on attentional capture, it is unlikely that it can explain attentional capture. Indeed, as argued by de Lange et al. (Citation2018) in their influential TICS paper on predictive coding “predictions are the result of an initial bottom-up analysis and are only formed after the first wave of feedforward activity. Such ‘predictions’ are thus hypotheses about the current sensory input, rather than forecasts of what is coming next” (p. 773). So predictive coding works well for all top-down processes following attentional capture (i.e., following initial bottom-up analysis) but has little to do with explaining why irrelevant salient objects capture attention in the first place. Predictive coding works particularly well to explain perception, but has trouble explaining selection issues. Indeed, predictive coding works well to explain for example difficulty in disengaging attention; if the object that is selected is unlike what was expected, it will take time to disengage attention. Yet, this does not mean that attention is drawn to objects that are unexpected within a scene. Even though the classic work of Loftus and Mackworth (Citation1978) on scene viewing suggested that participants fixate semantically inconsistent objects earlier than consistent objects, this finding could not be replicated in more recent studies (Henderson et al., Citation1999). The trouble that predictive coding has with selection is also clear from basic findings of attentional research. Indeed, on the one hand, from Posner cueing tasks we know that we attend to cued objects that are expected; while from the additional singleton paradigm we know that we are more likely to attend objects that are unexpected. Indeed, infrequent distractors capture attention more than frequent distractors, suggesting that unexpected events capture attention (Müller et al., Citation2009). It cannot be that the same theory predicts more attention to both expected and unexpected objects.

In their commentary Pearson et al. (Citation2021) describe value-modulated attentional capture (VMAC) which refers to the phenomenon that a distractor that is associated with reward is more likely to capture gaze and attention, even when attending the distractor is counterproductive. It basically indicates that high reward distractors have increased attentional priority. At the same time, studies on selection history have shown the opposite: experience with a salient distractor results in proactive suppression (Huang et al., Citation2021). Recent studies examined the interaction between these opposing forces (Kim & Anderson, Citation2021; Le Pelley et al., Citation2020) and showed that the effects of distractor value and location suppression were additive, suggesting that the competing influences on the priority map operate independently (see also Gao & Theeuwes, Citation2021 for a similar finding). Importantly, however as outlined by Pearson et al. (Citation2021) it is unclear how and when these modulatory weights are applied to sensory input, how the effect of the spatial priority map unfolds over time and/or whether these effects are mostly proactive or reactive.

Salience

As is clear from Luck et al. (Citation2021), there is little agreement on how to measure and define salience. In the commentaries different viewpoints are put forward to explain the results of Wang and Theeuwes (Citation2020) who argued that it is only possible to suppress a feature signal when both target and distractor are not salient (see also Theeuwes, Citation2004 for a similar argument). With respect to physical salience it is assume that salience computations take place automatically, across the visual field and independent of task set, representing how different a particular stimulus is from neighbouring stimuli in low-level visual features (e.g., colour, line orientation, size luminance, etc.; see Itti & Koch, Citation2001). Aspects such as local feature contrast (i.e., how different is a stimulus from its direct nearby stimuli; Nothdurft, Citation1993) and distractor-distractor similarity (i.e., how heterogeneous is a visual display; Duncan & Humphreys, Citation1989) play a crucial role.

From the commentaries it is clear that salience is even more complex. Kryklywy et al. (Citation2021) point to affective salience referring to the notion that emotional history can affect attentional control (Todd & Manaligod, Citation2018). For example using the additional singleton paradigm, Schmidt et al. (Citation2015) showed more attentional capture for a colour singleton that was associated with receiving an electric shock than that very same colour singleton when it was not associated with an electric shock. The same physical stimulus resulted in more capture because participants had learned to associate the colour singleton with receiving a shock. Similarly Nissens et al. (Citation2017) showed more oculomotor capture by an object that signalled the possibility of receiving an electric shock relative to an object signalling no shock. People cannot help looking at objects that they fear illustrating the role of emotional history on attentional capture. In their commentary Most and Curby (Citation2021) also point to emotional driven biases particular in to individual differences (referred to as {“perceptual experience”) and highlight that the proposed framework of Luck et al. (Citation2021) provides an organized taxonomy of attentional mechanisms that can be linked to clinically relevant individual differences.

Finally, as Most and Curby (Citation2021) indicate individual differences in attentional capture is an emerging field. For example, Zhang et al. (Citation2021) report on a study by Abagis (Citation2020) who investigated individual differences in attentional capture using the additional singleton paradigm. Even though the capture effect was large, the split-half reliability was modest (0.45), a result that is quite similar to a recent study in which we employed split-half and test-retest methods using of attentional capture and distractor suppression within the same session and over a two-month period (Ivanov et al., Citationin prep). Moreover, recently we re-analyzed the ERP data of Wang et al. (Citation2019) in which participants learned to suppress a location. The results showed that of the 18 participants there were about 12 so-called learners who showed a strong reliable PD (signifying suppression) and about 6 participants were no-learners who showed little learning and no PD. Investigating individual differences maybe a way to answer outstanding questions.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

JT was supported by a H2020 European Research Council (ERC) advanced grant 833029 – [LEARNATTEND].

References