2,410
Views
7
CrossRef citations to date
0
Altmetric
Articles

Auditory distraction can be studied online! A direct comparison between in-Person and online experimentationFootnote

ORCID Icon, ORCID Icon, ORCID Icon, & ORCID Icon
Pages 307-324 | Received 03 Feb 2020, Accepted 19 Dec 2021, Published online: 26 Apr 2022

ABSTRACT

Referring to the well-replicated finding that the presence of to-be-ignored sound disrupts short-term memory for serially-presented visual items, the irrelevant sound effect (ISE) is an important benchmark finding within cognitive psychology. The ISE has proven useful in evaluating the structure, function and development of short-term memory. This preregistered report focused on a methodological examination of the paradigm typically used to study the ISE and sought to determine whether the ISE can be reliably studied using the increasingly popular method of online testing. Comparing Psychology students tested online, in-person and participants from an online panel, results demonstrated successful reproduction of the key signature effects of auditory distraction (the changing-state effect and the steady-state effect), albeit smaller effects with the online panel. Our results confirmed the viability of online data collection for auditory distraction research and provided important insights for the accumulation and maintenance of high data quality in internet-based experimentation.

KEYWORDS:

Introduction

The present study aimed to establish whether key phenomena of auditory distraction can be studied using the online recruitment of participants. Internet-based experiments are increasingly used in cognitive psychology (e.g. Leding, Citation2019) because they provide quick access to a large and diverse pool of participants, which has clear advantages for improving the statistical power of experiments and increasing the representativeness of the data. However, studying auditory distraction online imposes unique challenges as it allows for less experimental control over the presentation of the stimulus material. Here, we provide an empirical test of whether the nature of the data collection method (in-person, online) moderated key findings of auditory distraction. We present this empirical approach to provide conclusions about the feasibility of online data collection in auditory distraction research, along with some guidance for maintaining high data quality in online experimentation. The importance of online experimentation has increased in ways that no one could have anticipated, prior to the global Covid-19 pandemic.

Due to the omnipresence of sound within the society we live in, it is a rare occurrence to undertake a cognitive task in a quiet environment. This is perhaps even more true nowadays, in which smartphones and laptops allow us to work and to study almost anywhere—on the train, in a café or at the beach. This portability implies that demanding cognitive activities are often performed under the influence of task-irrelevant sounds such as background speech. It is therefore of great relevance to understand how cognitive processes are affected by the auditory environment. It has been known for some time that the presence of task-irrelevant sounds, even if ignored, break through the barriers of selective attention and have typically adverse effects on cognition. Since its initial discovery over forty years ago (Colle & Welsh, Citation1976), the vulnerability of visual-verbal short-term memory to disruption via the mere presence of task-irrelevant sound has become a well-established and easily replicable phenomenon (Elliott, Citation2002; Jones & Macken, Citation1993; Neath, Citation2000; see also Campbell et al., Citation2002, for an empirical demonstration of an auditory-verbal version). This irrelevant sound effect (Beaman & Jones, Citation1997) has excellent test-retest reliability (Ellermeier & Zimmer, Citation1997), and has recently been included in the list of the most important benchmark phenomena within cognitive psychology (Oberauer et al., Citation2018). In part, this status reflects the fact that research undertaken on the irrelevant sound effect has informed theories of the structure (Salamé & Baddeley, Citation1982), function (Cowan, Citation1995) and development (Elliott, Citation2002) of working memory, as well as the interaction between speech-planning (motoric) and perceptual processes (Hughes & Marsh, Citation2017; Jones et al., Citation2004). This research is also of great applied relevance for the acoustical design of productive learning and work environments (for reviews, see Banbury et al., Citation2001; Beaman, Citation2005), and for the development of effective noise abatement measures that protect students and workers from the adverse consequences of task-irrelevant sound (e.g. Schlittmeier & Hellbrück, Citation2009). Such protection is needed, as individual differences in cognitive control do not predict the size of the disruption from changing-state sounds in the context of visual-verbal serial recall (Elliott et al., Citation2020; Hughes et al., Citation2013; Körner et al., Citation2017).

The standard paradigm for examining the effect of background noise on performance is the serial recall paradigm in which short lists of digits, consonants or words are visually presented, and have to be recalled in the correct serial order, either immediately after their presentation or after a short retention interval. It is well-established that serial recall is impaired when auditory distractors such as speech (Salamé & Baddeley, Citation1982), tones (Jones & Macken, Citation1993) or environmental sounds (Klatte et al., Citation2010) have to be ignored. Perhaps counterintuitively, this disruptive effect of irrelevant sounds seems to be independent of the loudness of the auditory distractors (Colle & Welsh, Citation1976; Ellermeier & Hellbrück, Citation1998; at least within the range studied [40 to 74 dB(A)] which should not cause damage to hearing), and is mainly determined by changes in the to-be ignored sound. The key empirical signature of the irrelevant sound effect is the changing-state effect (Jones et al., Citation1992). While repeated sounds may cause a measurable amount of auditory distraction relative to quiet (Bell et al., Citation2019), the disruption of short-term memory is much larger when auditory distractors differ from each other (Jones & Macken, Citation1993; Jones et al., Citation1992).

Research on the changing-state effect has proven influential; at the time of writing the current manuscript, Jones et al. (Citation1992) has been cited 368 times, while Jones and Macken (Citation1993) has been cited 590 times according to Google Scholar. To illustrate its influence, one of the simplest manipulations of changing-state that has provided robust evidence of a changing-state effect (e.g. Jones et al., Citation1992) is to compare the disruptive effect of a sequence of different auditorily presented letters (e.g. ABCDEFGH) with that of a sequence of repeated letters (e.g. AAAAAAAA). While speech distractors have proven to provide particularly strong evidence of distraction (Ellermeier & Zimmer, Citation2014), robust changing-state effects have also been observed with non-speech distractor materials such as tones (Jones & Macken, Citation1993) and music (Klatte et al., Citation1995; Schlittmeier et al., Citation2008; Schweppe & Knigge, Citation2020). However, although qualitatively similar in their effects on performance, quantitatively, the disruptive impact of speech distractors is typically greater than for non-speech distractors (LeCompte et al., Citation1997), which may be due to the relatively greater acoustic complexity of speech compared to non-speech signals (Tremblay et al., Citation2000).

There are two main theoretical views concerning the mechanisms underpinning the changing-state effect: the unitary attention and the duplex mechanism accounts. Very briefly, unitary views of auditory distraction propose that it is caused by attention capture (Bell et al., Citation2010; Bell et al., Citation2012; Cowan, Citation1995; Elliott, Citation2002). The duplex account assumes that, depending on the acoustic nature of the irrelevant sound, auditory distraction can occur via attentional capture or from interference-by-process. The latter results from a clash between deliberate serial order processing applied to visual-verbal items, via serial rehearsal and automatic seriation applied to changing acoustic elements within the irrelevant sound as part of the auditory streaming process (cf. Bregman, Citation1990; e.g. Hughes, Citation2014; Hughes & Marsh, Citation2020; Hughes et al., Citation2007). Both unitary and duplex views predict that changing-state sounds will be more disruptive than steady-state sounds; however, the predictions stem from different underlying mechanisms (i.e. attentional capture according to the unitary account and interference-by-process according to the duplex mechanism account). Thus, beyond its benchmark status within models of working memory (Oberauer et al., Citation2018), the changing-state effect holds a prominent place in theoretical models of auditory distraction effects, and it is critical to determine the viability of online investigation of the effect as a starting point for continued research on auditory distraction.

Even before the global Covid-19 pandemic, the use of online data collection had become increasingly popular in cognitive psychology (e.g. Leding, Citation2019). Previously, research on auditory distraction was typically undertaken in a laboratory environment that afforded researchers full control over the presentation of the auditory distractors and other aspects of the experimental situation. Cognitive psychologists may have been reluctant to sacrifice experimental control, but online data collection offers a number of advantages over traditional laboratory-based studies that make it a highly attractive tool for researchers (e.g. Benfield & Szlemko, Citation2006; Mason & Suri, Citation2012; Reips, Citation2008). Given the current replicability crisis (Open Science Collaboration, Citation2015), it is increasingly being realised that it is essential to ensure adequate power to obtain reliable results (Brysbaert, Citation2019). Appropriate statistical power is facilitated by online data collection that gives researchers quick access to a large pool of participants and also facilitates collaborative data collection across different universities. Online data collection also allows researchers to gather data from a more diverse pool of participants. This factor may be relevant for auditory distraction as university students who are young and well educated may be better equipped to deal with auditory distraction than older or less educated individuals (Sörqvist, Citation2010; but see Körner et al., Citation2017). A greater diversity of the sample is particularly desirable for research on inter-individual differences in auditory distraction, wherein a restriction of variance may mask potentially meaningful associations with other variables. Moreover, providing means by which participation in experiments can be undertaken without using public or private transport to laboratories can reduce the carbon footprint of the participants, and can provide a way to maintain social distancing during the Covid-19 pandemic.

However, examining auditory distractions online also poses several challenges that make it difficult to predict whether online data collection can be a useful tool in this area of research (e.g. Lefever et al., Citation2007; Reips, Citation2008). Most of these challenges are related to a lack of experimental control. With regard to technical factors, online data collection offers less control over the delivery of the irrelevant sound, that is, whether the sound is presented over the headphones or speakers and the quality of the presentation. Given that participants may use smartphones, laptops or desktop computers, it also allows less control over how the visual targets are displayed. Finally, the internet connection may affect the data transmission so that timing of the visual and auditory stimuli may be less precise. With regard to environmental factors, it is less clear than in the laboratory that the environment does not contain any other visual or auditory distracting information as the experiment may be performed on a smartphone in a crowded environment such as a train station. With regard to human behavior, researchers have no direct way of noticing that participants fail to comply with instructions. For instance, participants may turn off the volume or take off the headphones to avoid distraction. Such behaviour is only rarely observed in the laboratory, but cheating may be more tempting when participants are not monitored by a research assistant (e.g. Jensen & Thomsen, Citation2014). It is also possible to speculate that, due to the lack of direct participant-researcher interactions, participants may feel less motivated in online studies (which may, for example, lead to failures in completing the study). In sum, without appropriate safeguards, it seems unclear whether data collected online has the same quality as that obtained in the laboratory.

Here, we take an empirical approach towards the subject by performing the same experiment online and in the laboratory, and by comparing the size of the changing-state effect (CSE; steady-state sounds minus changing-state sounds) between the online sample and the laboratory sample. These comparisons should show whether there are systematic differences between the data obtained online and in the laboratory. As supplementary analyses, we also compared the size of the steady-state effect (SSE; silence minus steady-state sounds) and the size of the irrelevant sound effect (ISE; serial recall performance in silence minus changing-state sounds) between the online and in-person samples. Furthermore, we explored different ways to remedy the potential limitations of online testing outlined above. The aims of the study were to assess whether it is feasible to conduct auditory distraction research online and, if so, to provide recommendations about how to ensure high data quality in web-based studies.

Preregistered methods

Preregistered data collection plan

Originally, we planned to collect the data of two groups of participants in a between-subjects design, in two main recruitment locations, for a total of four groups. In one of the recruitment locations, groups of participants were to be drawn from the same participant pool for the in-person and online comparison. Participants at Louisiana State University (LSU) were to be drawn from the Department of Psychology participant pool. The participant pool at LSU is created by offering either course credit or extra credit in Psychology courses, and the participants are typically between the ages of 18–25 years of age. The pool is approximately 75–80% female in a given semester. This method of recruiting participants served to help to alleviate any concerns regarding differences in the sample between the online and in-person versions of the study.

In a second recruitment approach, we originally planned to recruit university participants in-person (but see below), and the second group of participants (the online group) was to be recruited via Prolific Academic (https://www.prolific.co) to represent non-students from the general online population. This second approach served to allow comparisons between the population of Psychology students and the broader population of online participants. For the Prolific Academic sample, the “Custom Screening” option was chosen. pre-screened exclusion criteria included “Student Status”, “Dyslexia, Dyspraxia, ADHD or any other related literacy difficulties”, “NHS mental health support”, “mild cognitive impairment/dementia”, “antidepressants”, “mental illness daily impact”, “autistic spectrum disorder” and “mental health/illness/condition – ongoing”. Additional eligibility criteria included self-report of normal or corrected-to-normal vision and no hearing loss or difficulties, being 18–30 years, of UK nationality born and living in the UK and speaking English as their first language. Finally, participants were eligible if their approval rate was greater than 95% for participation on Prolific to ensure a high quality of data.

We incorporated inclusion and exclusion criteria. Passing a headphone check task was an inclusion criterion. Exclusion criteria were self-reported hearing loss, missing a “catch” trial at the end of the ISE task and recall performance below 3 SD from the sample mean of performance in the silent condition in each of the recruitment groups (online and in-person groups). Both the headphone check task and the catch trial are described in more detail below. If participants reported an obvious failure to comply with task-instruction in either the online or in-person groups (e.g. removing headphones, turning off headphone volume, undertaking the study in the presence of other auditory and visual distractors), their data was to be excluded from the analysis. This information was obtained from the post-experiment questionnaire. Should a participant fail to complete the post-experiment questionnaire, they were to be excluded as well. The number of exclusions and reasons for exclusion are reported below.

Sample size. We planned to utilise a sequential design with a maximal number of participants, using Bayes factors (e.g. Schönbrodt & Wagenmakers, Citation2018; Schönbrodt et al., Citation2017). Numerous prior studies have provided estimates of the effect size of the changing-state effect, but there are no published studies using an online methodology, to our knowledge. The choice to use a Bayesian framework to make statistical inferences and designing the experiment is motivated by evidence that Bayesian statistical analyses are not influenced by experimenters’ intention, and allow for sequential testing (see, e.g. Berger & Berry, Citation1988; Dienes, Citation2016; Rouder, Citation2014). In order to reduce the rate of false positive evidence that can occur with early termination of sequential designs (Schönbrodt & Wagenmakers, Citation2018), a minimum number of participants was to be recruited before conducting any analysis. We planned to recruit forty participants in each methodology for a total of 80 participants within each of the two separate recruitment approaches (i.e. a minimum sample of 160 total, but see below).

For each sample, we planned to compare performance from the changing- and steady-state conditions, via Bayesian one-sided, paired-samples t-tests, predicting larger scores in the steady-state condition. The analysis was to be performed with the open statistical software JASP (version 0.10.0; JASP Team, Citation2018),Footnote1 using the default prior for t-test corresponding to a Cauchy distribution with a width of .707. This statistical approach allowed us to compute, within each recruitment group separately, a Bayes factor (BF), computed with the BayesFactor package in R (Morey & Rouder, Citation2015), that can be interpreted as the relative predictive performance of two competing hypotheses (i.e. a null hypothesis predicting no CSE and an alternative hypothesis predicting a positive CSE) in explaining the data (van Doorn et al., Citation2019).

For instance, a BF of 3 in favour of the alternative hypothesis indicates that the observed data are three times more likely under the alternative hypothesis than under the null hypothesis (see, e.g. Wagenmakers et al., Citation2018). We used a BF of 7—in favour of either the null (BF01) or the alternative hypothesis (BF10)—to define our stopping rule, as well as a guideline for maximal recruitment. Once we reached a minimum sample of 40 participants in each group, we planned to examine the BF for the t-test described above for each group. If it was < 7 in one of the groups, we planned to continue to recruit participants in batches of 10 within each group and to perform the analysis again. However, we planned to stop once we have reached a maximum of 100 participants in each of the groups, for a maximal total of 200 within each recruitment approach. Generally, a BF between 1 and 3 is considered “anecdotal evidence” for the tested hypothesis, whereas a BF between 3 and 10, and a BF larger than 10, is considered “moderate” and “strong” evidence, respectively (Lee & Wagenmakers, Citation2013).

The sample size required for this study was determined based on simulations performed with the BFDA package (Schönbrodt & Stefa, Citation2019) running in R (R Core Team, Citation2021). As a first step, we simulated hypothetical paired-samples studies using the BFDA.sim function, both under H1 and under H0, with 10,000 simulations for each. We simulated an expected effect size of Cohen’s d = 0.5 for H1 studies and Cohen’s d = 0 for H0 studies. The expected effect size was determined based on a conceptual replication (Marsh et al., Citation2022) of the CSE reported in the visual-verbal condition of Jones et al. (Citation1995; Experiment 4) that yielded an effect size of Cohen’s d of roughly 1. As one may expect that the CSE would be lower with an online setting, we adopted a conservative approach by simulating an expected effect size of Cohen’s d = 0.5. The minimum and maximum sample sizes for each simulation were 10 and 300, respectively. After each batch of 10 simulated participants (from 10 to 300), a Bayesian paired-samples t-test with a positive directional alternative hypothesis and a prior of 0.707 centred on 0, was conducted. The next step required to analyse the simulations using the BFDA.analyze function to determine a sample size that would be associated with a sufficient probability of detecting the expected effect. As shown in the top part of , when the minimum and maximum sample size was set to 40 and 100, respectively, roughly 99% of the H1 simulated studies reached the expected boundary (i.e. BF10  ≥  7). For H0 simulated studies (see bottom part of ), when the minimum and the maximum sample size was set to 40 and 100, respectively, 75% of H0 simulated studies reached the expected boundary (i.e. BF01  ≥  7), and 13% of the studies reached the limit of 100 participants but still had a BF01 > 3 that can still be interpreted as (moderate) evidence for the null.

Figure 1. Results of the simulations using the Bayes Factor Design Analysis (BFDA) package in R, illustrating the benefits of defining the minimum and maximum sample size ranges. Top: the proportion of H1 simulated studies (y-axis) that reached the BF01 boundary (solid line, false negative) and the BF10 boundary (dashed line, true positive), as a function of the maximum sample size (x-axis); bottom: the proportion of H0 simulated studies (y-axis) that reached the BF01 boundary (solid line, true negative) and the BF10 boundary (dashed line, false positive), as a function of the maximum sample size (x-axis).

Figure 1. Results of the simulations using the Bayes Factor Design Analysis (BFDA) package in R, illustrating the benefits of defining the minimum and maximum sample size ranges. Top: the proportion of H1 simulated studies (y-axis) that reached the BF01 boundary (solid line, false negative) and the BF10 boundary (dashed line, true positive), as a function of the maximum sample size (x-axis); bottom: the proportion of H0 simulated studies (y-axis) that reached the BF01 boundary (solid line, true negative) and the BF10 boundary (dashed line, false positive), as a function of the maximum sample size (x-axis).

To sum up, the simulations revealed that using a Bayesian sequential design with minimum and maximum sample sizes of 40 and 100 participants, respectively, should lead to the high probability of detecting either a true CSE (characterised by a Cohen’s d of at least 0.5) or a true null effect. Indeed, in the case of H0 simulated studies, only 2% of the studies led to wrongly supporting the alternative hypothesis, while 99% of the studies correctly supported the alternative hypothesis in H1 simulations.

Materials

The programme was completed within labjs, which is a graphical interface for creating Javascript experiments (Henninger et al., Citation2019). The experiment was presented to online and in-person participants via OpenLab (https://open-lab.online/).

Headphone check task. Because a portion of our participants were tested in an online environment, it was critical that we could determine whether or not the participants set the volume of the sounds to a comfortable listening level, and that they wore headphones during the entire duration of the experiment. Woods et al. (Citation2017) devised a screening to determine if participants are compliant with the instructions. Results from the study by Woods et al. indicated that during an in-person comparison of participants who were either wearing headphones or using the loudspeaker, “20 of 20 participants wearing headphones passed the test, whereas 19 of 20 participants listening over loudspeakers did not.” (p. 2068).

Procedure. Participants in both groups were presented with these instructions:

Please close any other applications on your device, and please put away and silence your cell phone. It is important to minimise any distractions in your environment, so that you can concentrate on this task. Begin this task when you know that you have at least 30 minutes of uninterrupted time to complete it. Please do not take your headphones off, and please do not adjust the volume until the study is completed. It is important that you follow the instructions, as the data will be published as part of a research project.

Upon pressing the spacebar to continue, a prompt to adjust the volume of the sounds to a comfortable listening level appeared, and participants were asked to put on their headphones. Then, they were presented with a series of 6 trials in which three tones were presented, and the participants had to use the mouse to click on the number representing which tone was the quietest, out of three possible choices (“Which of the three sounds was the softest [quietest]?”). Participants had to respond correctly on 5 out of the 6 trials to proceed, and no feedback was given to them on their choices. If they did not meet this criterion, a screen appeared that said, “Sorry! Your system does not provide the audio fidelity needed to complete this study. We are very sorry, but you cannot continue”.

Number of attempts. To ensure that participants were given multiple chances to complete the headphone check and then move on to the ISE task, participants were given up to 5 attempts to complete the headphone check before the experiment terminated.

Stimuli. The headphone check programme included a 200 Hz tone. Two manipulations to the original tone allowed for the sensitivity of the programme to differentiate between headphone listeners and loudspeaker listeners by (1) phase reversing one tone between the stereo channels and by (2) decreasing the level of one tone by 6 dB.

Irrelevant Sounds Task

Participants completed a recall paradigm with irrelevant sounds (changing-state sounds, steady-state sounds and silence), using a reconstruction of order response.

Stimuli. The changing-state sounds consisted of different spoken letters, and the steady-state sounds consisted of the repetition of a single letter (for example, “C, C, C … ” on one trial or “K, K, K … ” on another trial). Letters were selected from the set of “A C F H I J K L N O Q R S U X Y”. For changing-state trials, letters were drawn randomly without replacement from the letter set. Irrelevant items were presented at a rate of two per second, and the onset of the first sound was simultaneous with the onset of the first visual item. The to-be-ignored letters were digitally recorded in a synthesised female voice using “Polly” from the free Amazon Web Service. Each letter was edited to 250 ms.

Procedure. Digits were presented in the centre of the screen in black Arial font on a white background. The approximate size was the 72-point font, which was estimated from the pixel size that was used in the browser-based programme. One digit was presented at a time and remained onscreen for 800 ms with a 200 ms blank before the next digit appeared. The digits were chosen from the set of digits 1 to 8, the list length was 8 digits, and digits were not repeated. Digit sequences were selected given the following constraints: trials could not start with the digit 1 and digits were not numerically adjacent (2 followed by 3 or the reverse was not possible). After digit presentation, participants were prompted to recall their answer, and the set of digits appeared on the screen in canonical order. Participants were specifically instructed to select the digits that they saw in the order that they were presented using a mouse-driven pointer. The next trial was only initiated after participants selected all 8 items. Participants were only able to click on each digit once. Clicking on a digit resulted in that digit disappearing from the screen, thereby yielding feedback to the participant that that digit had been selected. There were 60 trials in all, with 20 in each of the distractor conditions (Jones et al., Citation1993). Sounds were presented quasi-randomly, with the constraint that one of each type of distractor condition was selected randomly before going on to the next set of three distractor conditions; predictability of the type of sound sequence does not influence the size of the CSE (Marsh et al., Citation2014). After the experimental trials were completed, participants heard a series of three repeated letters and were asked to type the last letter that they heard. This final manipulation served as the “catch trial” to ensure that participants kept their headphones on during the duration of the experiment.

For the in-person testing at the LSU Psychology department, participants were seated approximately 50 cm from a 17-inch computer screen connected to a Dell desktop computer. In this set up visual digits subtended a visual angle of approximately 3.4°. Participants wore over-the-ear Sennheiser HD 229 headphones connected to the computer. Participants were tested individually.

Post-experiment questions. To provide insight into participants’ general compliance with task-instruction and motivation on the serial recall task, the following questions were asked:

  1. Did you have any help from another person when remembering the digits? Y/N

  2. Did you use any external help (e.g. paper and pencil) to remember the digits? Y/N

  3. Did you say the digits aloud when trying to remember them? Y/N

  4. Did you turn off the volume on your headphones during the task? Y/N

  5. Did you remove or unplug your headphones during the task? Y/N

  6. Type in the number 2. (attentional check/catch response) Free-field response

  7. While you were completing the study, were there any external sources of visual or auditory distraction (e.g. other people speaking in the same room, a running video, a song playing in the background, etc.)? Y/N

    • If Y, what was the source of distraction? Free-field response

  8. What equipment did you use to do the experiment?

    1. Desktop computer

    2. Laptop computer

    3. Tablet

    4. Smartphone

  9. What type of headphones did you use to play the sounds?

    1. In-ear

    2. On-ear

    3. Over-ear

  10. What device did you use to record your responses?

    1. Mouse

    2. Trackpad

    3. Touchscreen

  11. How motivated were you to obtain the best test-score possible?

    1. Lowest motivation

    2. Low motivation

    3. Average motivation

    4. High motivation

    5. Highest motivation

  12. How concentrated were you on the task?

    1. Lowest concentration

    2. Low concentration

    3. Average concentration

    4. High concentration

    5. Highest concentration

  13. When performing the task were you switching between different tasks or browsers? Y/N

  14. Did you experience any technical difficulties during the study (e.g. problems with the internet connection, delays in presentation, etc.) Y/N

  15. What is the current time at your location (please specify am or pm)? Free-field response

  16. If you reported that you have hearing loss at the start of the study, please tell us more about this now. Free-field response

Preregistered analysis plan

Headphone check task. The headphone check task provided a pass or fail outcome. Only if participants passed the headphone screening did they continue on to the rest of the experiment. As mentioned above, they were given up to 5 attempts to pass the headphone screening.

Irrelevant sounds task. Performance was scored according to a strict serial recall criterion (that is, only digits reported at the correct serial position were scored as correct).

The main analyses of interest were to determine (1) the magnitude of the CSE in the entire sample as well as (2) the comparison of the magnitude of the CSE across the different types of administration. We planned to run a 3 × 2 mixed Bayesian ANOVA with distractor condition (silence, steady-state, changing-state) and data collection procedure (in-person, online) as independent variables and serial recall performance collapsed across serial position as dependent variable to determine (a) whether there was a main effect of auditory distraction and (b) whether auditory distraction differed as a function of the data collection procedure. This analysis was planned to be conducted with default priors for ANOVA, with a Cauchy distribution with a width of 0.5 and 1 for fixed and random effects, respectively.

We planned to report results from the analysis of effects provided in JASP, that is, a model averaging technique providing inclusion BF representing evidence for a specific effect averaged across all the models containing the effect of interest. This same analysis was planned to be conducted for each of the recruitment approaches.

In case the results supported the presence of an interaction between the distractor condition and the data collection procedure (i.e. an inclusion BF > 3 for the effect of interaction), we planned to examine whether the size of the three distraction effects (CSE: steady-state sounds minus changing-state sounds; SSE: silence minus steady-state sounds; ISE: silence minus changing-state sounds) was consistent across the in-person versus online administration methods, separately for each recruitment approach. This was to be tested with multiple independent-samples Bayesian t-tests. We planned to use undirected tests with default prior to taking the form of Cauchy distribution with a width of .707.

Changes to the participant recruitment and the analysis plan

Due to the global Covid-19 pandemic, we had to make changes to the participant recruitment plan that also required us to make some changes to the data analysis. Most importantly, it was only possible to collect in-person data at one recruitment location (LSU) so that only three groups of participants (Psychology students in-person, Psychology students online, online panel) were tested. This allowed us to compare the data of participants from the same participant pool (Psychology students) between the two types of data collection procedures (in-person, online), and it allowed us to compare a typical student sample (Psychology students) with a sample from an online panel. Due to changes in our recruitment procedures because of the pandemic, we ended up with sample sizes that are somewhat larger than we had planned. A total of 181 participants signed up to participate in the two groups from the LSU Psychology department (in-person = 64, online = 98, failed headphone check in-person group and did not continue = 5; in the online group = 14). Participant sign ups were posted in batches and consistent with our preregistered plan, the data were examined for exclusionary criteria to determine if additional recruitment would be needed to obtain a sample of 40 participants who could be included in the analyses. After applying the preregistered exclusion criteria, we analysed data from 42 to 46 participants in the online and in-person Psychology student groups from LSU, which was slightly over 40 due to the way the participants signed up in batches.

For the online panel, we did not stop data collection after 40 but continued until we had collected 140 data sets despite the fact that our stopping criterion was already met after 40 valid data sets were collected. In the main comparison of the distraction effects among the different groups, we included only the 40 valid data sets that were first collected from this sample, in line with the preregistered stopping rule. However, the full data set is used in a separate supplementary analysis to explore how the pattern of results may change with different exclusion criteria.

Due to the changes in the participant recruitment procedure, it was necessary to deviate from the preregistered analysis by performing a 3 (silence, steady-state, changing-state) x   3 (Psychology students in-person, Psychology students online, online panel) analysis to compare the effects of auditory distraction as a function of the data collection site/procedure. Note that, despite these changes, we were able to compare whether the effects varied as a function of the data collection method between Psychology students that were tested online and in-person, and we were able to compare whether the effects differed as a function of the population from which the participants were drawn (Psychology students, online panel). We were thus able to perform all of the relevant comparisons detailed in our preregistered analyses, despite the changes in the participant-recruitment plan. This comparison of the distraction effects among the different groups included valid data sets of 46 Psychology students who were tested in-person, 42 Psychology students who were tested online and the 40 valid data sets from the online panel that were first collected.

In addition to these main analyses, we also conducted separate supplementary analyses to explore how the pattern of results may change with different exclusion criteria. In this supplementary analysis, we started by including the full sets of data that were collected from the different samples and proceeded with examining smaller subsets of data by successively applying progressively stricter exclusion criteria. This analysis served to test whether the careful inspection and preprocessing of the data affect the measurement of the distraction effects. Note that the supplementary analysis included the full data set that was collected using the Prolific Academic online panel.

Results

Main comparison of the distraction effects among the different groups

We ran a 3 × 3 mixed Bayesian ANOVA on serial recall performance collapsed across serial position, with the distractor condition as a 3-level within-participants variable (silence, steady-state, changing-state) and the data collection site/procedure as a 3-level between-participant variable (Psychology students in-person, Psychology students online, online panel). As described above, these results are reported using the BFInclusion model averaging technique. The analysis of effects yielded decisive evidence for the inclusion of the two main effects (distractor condition: BFInclusion = 1.11e+14; data collection site/procedure: BFInclusion = 528.36) and very strong evidence for the interaction between the two (BFInclusion = 51.29). Given the very strong support for the presence of an interaction between data collection site/procedure and distractor condition, the interaction was explored by first comparing the different distraction effects (CSE, SSE and ISE) across the different data collection sites/procedures.

As indicated in the analysis plan, we compared the CSE (changing-state minus steady-state conditions), SSE (silence minus steady-state conditions) and ISE (silence minus changing-state conditions) across the different data collection sites/procedures using undirected Bayesian t-test for independent-samples (effect sizes of the auditory distraction effects for each group are available in ). The comparison between the two samples of Psychology students (in-person, online) revealed virtually no difference with regards to the different distraction effects (CSE: BF10 = 1.27; SSE: BF01 = 3.49; ISE: BF01 = 2.96). The comparison between the Psychology students who completed the experiment online and the online panel provided similar results as when comparing the two Psychology samples together. Even though suggests that the CSE and SSE are larger in the Psychology students online sample than in the online panel, we observed no evidence for a difference between the different distraction effects (CSE: BF01 = 1.40; SSE: BF01 = 3.13; ISE: BF10 = 1.54). Finally, the comparison between the in-person Psychology students and the online panel yielded strong evidence for the presence of larger CSE (BF10 = 23.64) and ISE (BF10 = 11.97) in the Psychology student in-person sample, while there is moderate evidence for an absence of difference between the samples with regards to the SSE (BF01 = 4.37).

Figure 2. Serial recall performance collapsed across serial position (y-axis) as a function of distractor condition (colour) and sample/procedure of collection (x-axis). Boxes are box-plots with ranges going from first to third quartiles; horizontal lines correspond to the median; vertical lines range from first quantile minus 1.5 times the inter-quartile-range to third quantile plus 1.5 times the inter-quartile-range; points are individual data points and data distribution is plotted vertically.

Figure 2. Serial recall performance collapsed across serial position (y-axis) as a function of distractor condition (colour) and sample/procedure of collection (x-axis). Boxes are box-plots with ranges going from first to third quartiles; horizontal lines correspond to the median; vertical lines range from first quantile minus 1.5 times the inter-quartile-range to third quantile plus 1.5 times the inter-quartile-range; points are individual data points and data distribution is plotted vertically.

Table 1. Effect sizes by data collection site/procedure.

Exploratory analyses of the exclusion criteria

Due to the large difference between the number of participants included in the final analysis of the online groups (after applying all of the preregistered exclusion criteria) and the total number of data collected in the groups (Psychology students online: 42 analysed and 98 in the full sample; online panel: 40 analysed and 140 in the full sample), we took that opportunity to analyse the effects of applying different combinations of exclusion criteria on the effect sizes of the auditory distraction effects. shows that in the post-experiment questionnaire, 17% (Psychology students online) and 18% (online panel) of the participants responded to having used overt rehearsal as the only exclusion criterion. We also observed that 11% of the participants in the Psychology students online sample reported external distraction as the only exclusion criterion. The percentage of participants failing the final audio check was very low in the online panel (4%) but was 14% for the Psychology student online sample. Finally, responding “yes” to any other question or combinations of them was rare.

Figure 3. Analysis of response to the post-experiment questionnaires. The different questions are listed on the left with a horizontal bar representing the number of participants having responded “yes” to the question. The vertical bars represent the number of participants having responded “yes” to specific combinations of questions, each being represented by dark grey connected dots.

Figure 3. Analysis of response to the post-experiment questionnaires. The different questions are listed on the left with a horizontal bar representing the number of participants having responded “yes” to the question. The vertical bars represent the number of participants having responded “yes” to specific combinations of questions, each being represented by dark grey connected dots.

Taking this information into consideration, we decided to apply the following filters: (1) including all the participants tested (full sample); (2) excluding from the full sample participants who reported technical issue or not using a computer (no technical issue sample); (3) excluding from the no technical issue sample participants who reported behaviours considered as cheating (not passing the audio check, reporting external help or help from another person, turning the sound off or unplugging the headphones, no cheating sample); (4) excluding from the no cheating sample the participants who reported external distraction or switching between screens during the experiment (no distraction sample); (5) excluding from the no distraction sample participants who reported the use of overt rehearsal (no overt rehearsal sample for the online panel only); and (6) applying all the preregistered exclusion criteria that were used in the main comparison of the distraction effects among the different groups reported above (as planned sample).

indicates that for the Psychology students, the effect sizes of the different auditory distraction effects does not vary as a function of the type of exclusion criteria applied to the sample, except for the CSE for which the effect size progressively decreased with the inclusion of more participants who reported exclusion criteria (from 0.98 in the as planned sample to 0.69 in the full sample). Looking at , the pattern is even clearer for the online panel as all of the effects (including the CSE) are virtually of the same size regardless of whether data are excluded based on the preregistered exclusion criteria or not. Overall, the results indicated that the level of strictness applied when cleaning the data had a minimal impact on the effect size observed. Notably, the BFs tend to increase with sample sizes as the larger sample sizes provide more evidence in favour of the effects.

Table 2. Effect sizes of auditory distraction effects as a function of different applications of exclusion criteria in the psychology students online sample.

Table 3. Effect sizes of auditory distraction effects as a function of different applications of exclusion criteria in the online panel.

Discussion

The main question of the present study was whether the effects of auditory distraction can be studied online. The results clearly demonstrate that the effects of auditory distraction can be obtained and studied online. The ISE was replicated both in the online sample consisting of Psychology students and in the Prolific Academic sample (i.e. the online panel). What is more, key empirical signatures of auditory distraction effects were replicated in the online settings. Most importantly, the CSE (e.g. Jones et al., Citation1992) was observed to be robust even with moderate sample sizes in the online experiments. Interestingly, there was also evidence of an SSE both in the in-person experiment and in the online studies, confirming the recent observation that SSE can be robustly obtained with sample sizes ≥  40 participants (Bell et al., Citation2019). The results thus unambiguously demonstrate that key findings of auditory distraction can be replicated online. This is notable given that examining auditory distraction in online settings can be seen as quite challenging given that there is a low degree of experimental control over the presentation of the stimulus material and the compliance of the participants.

As yet, we have only focused on whether or not key signature effects of auditory distraction can be obtained in online settings, as this was the main goal of the preregistered report. However, it is, of course, possible that, even though effects can be detected, they may be considerably smaller than the effects obtained in the laboratory. When comparing the size of the effects between the in-person sample and the online samples, there was no firm evidence that effects of auditory distraction were generally smaller in the online setting than in the laboratory. When comparing the size of the CSE between Psychology students who were tested in-person versus online, it became evident that the effects were about the same size. The results thus allowed us to reject the hypothesis that results that were obtained online are considerably smaller than those who were tested in-person. Potential reservations against online experimentation thus seem to be unsubstantiated, even when it involves the presentation of auditory distractors.

There are, however, noticeable differences between the Psychology students who were tested in-person and the online panel. First, the online panel performed somewhat better at the serial recall task than the Psychology student samples. Secondly and most importantly, the effects of auditory distraction were robustly obtained in the online panel but smaller than those shown by the Psychology students. At first glance, it may be tempting to attribute these differences to the specific characteristics of the samples that were tested. Although we cannot rule out that individual differences between the LSU Psychology student and Prolific Academic online panel in terms of working memory capacity, task-engagement or general intelligence, drives the differences in the size of the effects observed, we consider it unlikely. This is because a raft of previous studies has demonstrated that there is usually no relationship between these intrinsic factors and the magnitude of the CSE (Hughes et al., Citation2013; Körner et al., Citation2017; Sörqvist, Citation2010).

We can only speculate about other differences between the Psychology student samples and the online panel that could moderate the magnitude of the CSE. Potential differences that could be considered of importance include those pertaining to the technical equipment used by the different samples. It seems possible that the Psychology students used other technical equipment than the online panel. Auditory distraction crucially depends on acoustic factors (Schlittmeier et al., Citation2012), such as the fidelity of the auditory signal (e.g. Dorsi et al., Citation2018), thus differences in the equipment delivering the sound to the ears of the participants could moderate the magnitude of the effects observed. Other potential moderating factors include differences between the participant samples in relation to prior experience of taking part in psychological studies. Unlike the online panel, the online LSU Psychology student sample have previous experience of taking part in Psychology studies on campus under controlled laboratory settings. It is possible that the online Psychology student sample, against the online panel, endeavoured to “recreate” a controlled environment within an off-campus setting, limiting, for example, their exposure to extraneous noise and visual distractions that might weaken the auditory distraction effects under observation. Furthermore, through participating in previous Psychology studies or via exposure to course material, the Psychology students against the online panel, may adopt mnemonic strategies such as serial rehearsal that might render them particularly vulnerable to disruption via the changing-state properties of the task-irrelevant sound (Beaman & Jones, Citation1997). The differences between samples could thus be due to a range of different factors, some intrinsic (use of different mnemonic strategies) and some extrinsic (use of different technical equipment, different testing environments) so that these differences between samples must be interpreted with caution.

The discussion about the potential reasons as to why the CSE was somewhat larger for the Psychology students than for the online panel from the Prolific Academic sample should not detract from our main research question. The answer to whether auditory distraction can be studied online is a resounding yes: It is definitely possible to reproduce key findings of auditory distraction in online settings regardless of whether the sample is an online panel drawn from the general public or students enrolled on Psychology courses. First, online versus in-person testing does not differ when the same population is tested and, secondly, the absolute size of effects is rarely compared directly between Psychology students and online panels so that the critical question is whether the effects can be replicated while it is of less importance whether the effect size is exactly the same. Therefore, we cannot recommend against online testing based on the present results.

In addition to establishing that auditory distraction can be studied online, further analysis of our results yields a number of recommendations concerning the required sample size and how closely online results should be scrutinised. In relation to sample size, it might seem intuitive that larger sample size is required to detect effects of interest with online versus in-person testing. However, the auditory distraction effect sizes were of comparable magnitude in the Psychology students tested in-person and online. Further, with regard to our online Prolific Academic sample, our initial sample of 40 participants already replicated the CSE and adding another 57 participants (achieving a sample size of 97) resulted in effect sizes of comparable magnitude to the initial sample (see the comparison between no overt rehearsal and as planned samples for the different auditory distraction effects in ). Our adoption of a BF > 7 as a stopping criterion and the conduction of t-tests after running batches of 10 participants above the minimum sample of 40 should a BF > 7 not be reached, offers a cost-effective solution when the effects under exploration are robust. We note that such a method is suitable for Bayesian statistical inferences as adopted here, but not for Frequentist statical inferences where such data peeking is known to increase Type 1 error (false positivity rate), thereby requiring changes to significance thresholds (Lang, Citation2017).

Given that we assessed a number of exclusion criteria within our study, we were able to investigate to what extent the pattern of results and the magnitude of effect sizes differed in online samples when participants excluded from the original analyses were included in the supplementary analyses. and present data obtained after filtering using the different combinations of criteria for exclusion. What is striking from observing these data is that exercising the various exclusion criteria has little effect on the magnitude of the auditory distraction effects reported and their associated effect sizes (the BFs of course, change [increase] with sample size as the larger sample provides more evidence). The data suggest that there was little, if any, gain from closely scrutinising the data. Since exercising exclusion criteria results in reduced sample size and decreases the BFs reflecting the evidence in favour of a hypothesis in Bayesian statistics or the statistical power to detect effects in Frequentist statistics, there is clearly a trade-off to be made between the level of scrutiny applied (e.g. participants excluded) and the resource requirements involved in additional participant recruitment. Perhaps somewhat counterintuitively, our data suggest that careful inspection and preprocessing of the data from online testing, guided by extensive exclusion criteria based on participants’ self-reports, may be ineffective (or even counterproductive) in improving online hypothesis testing. One may thus refrain from excluding large proportions of data prior to data analysis even when analysing the data of online studies.

The capability of online testing offers researchers the opportunity to rapidly collect data. It also affords researchers from different universities to cooperate and use, in some circumstances, a common software platform for the execution of studies and data collection, thus aiding cooperation among different research groups. Online experimentation may thus complement the tool set of researchers in various ways, for example, by facilitating the inclusion of more diverse participants and by promoting the use of large sample sizes. Online testing is an important tool at the disposal of researchers to undertake research not only during the Covid-19 pandemic. However, in the face of environmental problems such as the Covid-19 pandemic and climate change, online testing yields a way of perpetuating research while socially distancing and reducing carbon footprints. Yet, a number of challenges are associated with online testing, such as the compliance of participants and a reduction of experimental control over the presentation of the stimulus material and participant’s surroundings. As any new technology, online experimentation may be met with some degree of scepticism. This scepticism is justified as long as there is uncertainty about the comparability of online and in-person data. Unique challenges may arise, for example, when presenting auditory stimuli or testing distractions online. Therefore, it is important to provide direct evidence on whether results from in-person laboratory settings can be compared with online testing. Here we provide the groundwork demonstrating that it is possible to examine the effects of auditory distraction online and we encourage, and expect to see, more studies using online experimentation in the future.

Open practices statement

The materials, simulations, data, analysis scripts, and statistical output files for this project are available on the OSF page https://osf.io/kpc9d.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

* Stage 2 Manuscript in the Journal of Cognitive Psychology. The Stage 1 manuscript included a different title, “Can the Irrelevant Sound Effect be Studied Online?”; However, we used a new title for the Stage 2 manuscript to reflect the findings.

** Journal submission portal would not allow entry of specific key words, therefore “auditory processing”, “short-term memory” and “working memory” were chosen.

1 Statistical analyses were finally performed using version 0.14 of the JASP statistical software, but with the same priors as described in the preregistration of the study.

References