1,974
Views
17
CrossRef citations to date
0
Altmetric
REGULAR ARTICLES

Neural correlates of spoken word production in semantic and phonological blocked cyclic naming

, , &
Pages 575-586 | Received 07 Apr 2017, Accepted 13 Oct 2017, Published online: 01 Nov 2017

ABSTRACT

The blocked cyclic naming paradigm has been increasingly employed to investigate the mechanisms underlying spoken word production. Semantic homogeneity typically elicits longer naming latencies than heterogeneity; however, it is debated whether competitive lexical selection or incremental learning underlies this effect. The current study manipulated both semantic and phonological homogeneity and used behavioural and electrophysiological measurements to provide evidence that can distinguish between the two accounts. Results show that naming latencies are longer in semantically homogeneous blocks, but shorter in phonologically homogeneous blocks, relative to heterogeneity. The semantic factor significantly modulates electrophysiological waveforms from 200 ms and the phonological factor from 350 ms after picture presentation. A positive component was demonstrated in both manipulations, possibly reflecting a task-related top-down bias in performing blocked cyclic naming. These results provide novel insights into the neural correlates of blocked cyclic naming and further contribute to the understanding of spoken word production.

Introduction

The blocked cyclic naming paradigm has been increasingly used as a tool to test lexical selection mechanisms during spoken word production. In the blocked cyclic naming paradigm, participants name a small set of pictures either in a homogeneous block (e.g. apple, mango, peach, lemon) or a heterogeneous block (e.g. apple, chair, duck, bus) repeatedly in a cyclic manner (Damian, Vigliocco, & Levelt, Citation2001). In this paradigm, speakers are typically slower in naming pictures in the semantically homogeneous block than in the semantically heterogeneous block (e.g. Abdel Rahman & Melinger, Citation2009; Belke, Meyer, & Damian, Citation2005; Damian et al., Citation2001; Damian & Als, Citation2005; but see Navarrete, Del Prato, Peressotti, & Mahon, Citation2014). This is called the semantic blocking effect.Footnote1

The blocked cyclic naming paradigm is complex in that it involves multiple cognitive components, such as language-specific skills as well as top-down control strategies (e.g. lexical selection, priming, learning, task-representation; Belke et al., Citation2005; Belke & Stielow, Citation2013; Oppenheim, Dell, & Schwartz, Citation2010; Shao, Roelofs, Martin, & Meyer, Citation2015; see Belke, Citation2017 for a review). Therefore, it is critical to understand the mechanisms involved in the blocked cyclic naming paradigm in order to use it effectively as a tool to investigate language processing.

One account argues that the underlying mechanism responsible for the semantic blocking effect is competitive lexical selection (Belke et al., Citation2005; also derived from Howard, Nickels, Coltheart, & Cole-Virtue, Citation2006). Specifically, the previously named picture (e.g. apple) becomes highly active and competes for selection during the subsequent production of a semantically-related target (e.g. mango).

An alternative account argues that competition during lexical selection is not required to produce the semantic blocking effect (Navarrete et al., Citation2014; also derived from Oppenheim et al., Citation2010). Instead, such an effect can be explained by an incremental learning mechanism (Oppenheim et al., Citation2010). This error-based learning mechanism strengthens the connections between the semantic features (e.g. fruit, yellow) and to-be-produced words (e.g. mango) while also weakening the connections between the semantic features and competitors (e.g. peach, apple; cf. Spalek, Damian, & Bölte, Citation2013). This is referred to as “the dark side of incremental learning” (Oppenheim et al., Citation2010). Navarrete and colleagues (Navarrete et al., Citation2014) claim that the difference in naming latencies in the blocked cyclic paradigm is caused by the differential priming effects with the underlying incremental learning mechanism. More specifically, in the semantically homogeneous blocks, the connections between the semantic features and target words are weakened for semantically homogeneous words within one cycle, but strengthened for the cyclically repeated target words within a block. By contrast, in the semantically heterogeneous blocks, the connections are always strengthened for the repeated items (i.e. repetition priming). Consequently, naming latencies in the semantically heterogeneous blocks are faster relative to those in the semantically homogeneous blocks where less repetition priming occurs. Navarrete et al. (Citation2014) conclude that competitive lexical selection is not required to account for the semantic blocking effect.

Navarrete et al.’s (Citation2014) account predicts less repetition priming in the semantically homogeneous blocks compared to the heterogeneous blocks. In language comprehension studies, repetition priming is generally reflected by an attenuated N400 effect (e.g. Rugg, Citation1985, Citation1990; see e.g. Misra & Holcomb, Citation2003 for discussion). In spoken word production, however, repetition priming can influence multiple planning stages (Belke et al., Citation2005). Based on the incremental learning account (Oppenheim et al., Citation2010), the adjustment of connections between the semantic features and the lemma is likely to be a process before lexical selection. In other words, the adjustment of connections may take place during the mapping from the conceptual level to the semantic level.

Recently, studies have made use of electrophysiological and neuroimaging measurements to provide further insights into this debate but have yielded inconsistent findings. By recording the participants’ electrophysiological activation in a combination of the picture-word interference and blocked cyclic naming paradigms, Aristei and colleagues (Aristei, Melinger, & Abdel Rahman, Citation2011) found that the semantic blocking effect takes place at around 200 ms after picture presentation. This temporal locus is in line with the locus of lexical selection based on meta-analyses of the temporal and spatial signatures of word production components (Indefrey, Citation2011; Indefrey & Levelt, Citation2004). The electrophysiological effect starting around 200 ms after picture presentation is not easily reconciled with Navarrete et al.’s (Citation2014) account based on the incremental learning mechanism (Oppenheim et al., Citation2010).

Alternatively, Janssen and colleagues (Janssen, Hernández-Cabrera, Van der Meij, & Barber, Citation2015) found a post-retrieval locus of the electrophysiological effect corresponding to the semantic blocking effect represented by longer naming latencies. Janssen et al. (Citation2015) interpreted the “late” effect as a conflict resolution component reflecting an underlying cognitive control mechanism. Therefore, it is still unclear exactly when the semantic blocking effect takes place during spoken word production.

Furthermore, using neuroimaging and neuropsychological methods, Schnur and colleagues found the semantic blocking effect to be associated with the activities in Broca’s area, which corresponds to competition among lexical selection candidates (Schnur et al., Citation2009; Schnur, Schwartz, Brecher, & Hodgson, Citation2006). These findings lend support to the competitive lexical selection account. To our knowledge, no supporting electrophysiological evidence has been reported for the incremental learning account so far.

Besides the disagreements on the level of lexical-semantic encoding, another motivation for carrying out the current study is the small number of studies looking into neural mechanisms underlying phonological encoding, which is also a critical stage in spoken word production (Indefrey, Citation2011; Indefrey & Levelt, Citation2004). The general finding is that when items form a homogeneous block in terms of their onset segments (e.g. coat, cat, cook), naming is facilitated compared to a heterogeneous block, suggesting either facilitation at the word-form encoding stage during speech production or strategic preparation due to high predictability (e.g. Breining, Nozari, & Rapp, Citation2016; Damian, Citation2003; Meyer, Citation1991; Roelofs, Citation1999; Schnur et al., Citation2009). However, inhibitory effects have also been observed when the position of the overlapping segment is not the onset (Breining et al., Citation2016). Breining and colleagues (Citation2016) suggest a common mechanism responsible for the semantic blocking effect as well as phonological effect. Specifically, both the inhibitory phonological effect and the inhibitory semantic blocking effect are “similarity-based” and the interference arises due to the distributed semantic or segmental feature overlap during repeated retrieval. In other words, the incremental learning mechanism accounts for the phonological effect in a similar way to the semantic blocking effect in the blocked cyclic naming paradigm (Breining et al., Citation2016).

The present study

The present study aims to contribute to the discussion concerning accounts of encoding in spoken word production by drawing on evidence from the blocked cyclic naming paradigm. With this aim, we probe the semantic blocking effect and the phonological facilitation effect with behavioural and electrophysiological measurements. We hope that by finding the neural correlates of the semantic blocking effect and the phonological facilitation effect, we can better understand the mechanisms underlying spoken word production as reflected by the blocked cyclic naming paradigm.

We present items in semantically homogeneous and heterogeneous blocks, “homogeneous” meaning that items are congruent in terms of their semantic category and “heterogeneous” meaning that they are incongruent. Besides semantic congruency, we also investigate phonological congruency: in phonologically homogeneous blocks, items overlap in their onset segment, while in phonologically heterogeneous blocks they do not. Based on the results from previous studies, we expect to observe longer naming latencies in the semantically homogeneous blocks relative to the semantically heterogeneous blocks (e.g. Abdel Rahman & Melinger, Citation2009; Belke, Citation2017; Belke et al., Citation2005; Damian et al., Citation2001; Damian & Als, Citation2005; but see Navarrete et al., Citation2014), and shorter naming latencies in the phonologically homogeneous blocks relative to the phonologically heterogeneous blocks (e.g. Damian, Citation2003; Meyer, Citation1991; Roelofs, Citation1999; Schnur et al., Citation2009).

In terms of electrophysiological data outcomes, if competitive lexical selection is involved, we expect to observe a difference in event-related potentials (ERPs) between semantically homogeneous and heterogeneous blocks starting around 200 ms after picture presentation (e.g. Aristei et al., Citation2011; Indefrey, Citation2011; Indefrey & Levelt, Citation2004). Alternatively, as introduced above, Navarrete et al.’s (Citation2014) account based on incremental learning (see also Oppenheim et al., Citation2010) would predict an ERP effect at the stage of mapping from the conceptual to semantic representation. Based on the predictions of the meta-analysis studies (Indefrey, Citation2011; Indefrey & Levelt, Citation2004), an ERP effect before 200 ms is expected (Navarrete et al.’s, Citation2014; Oppenheim et al., Citation2010). In terms of the polarity of the expected effect, less repetition priming should elicit a stronger negative effect in the semantically homogeneous condition relative to the heterogeneous condition (e.g. Rugg, Citation1985, Citation1990; see e.g. Misra & Holcomb, Citation2003 for discussion). Unfortunately, to our knowledge, there is not yet any ERP study conducted to investigate the polarity of repetition priming in speech production or to determine the locus of phonological facilitation in blocked cyclic naming. If the phonological facilitation effect reflects facilitation at the phonological form encoding stage, we expect to observe ERP differences between phonologically homogeneous and heterogeneous blocks at around 355–400 ms after picture presentation (calculated based on a meta-analysis of the neural correlates of phonological code retrieval and syllabification stages; see Indefrey, Citation2011 for details). Alternatively, if the incremental learning mechanism underlies phonological encoding and the effect takes place at the stage of lexical-segmental mapping (as proposed by Breining et al., Citation2016), we expect a stronger negative effect between 275 and 355 ms in the phonologically homogeneous blocks relative to the heterogeneous blocks, based on the predictions of the meta-analysis studies (Indefrey, Citation2011; Indefrey & Levelt, Citation2004).

Methods

Participants

Thirty-two native speakers of Mandarin Chinese living in Beijing participated in the study (15 female; mean age = 22.3 years, SD = 3.8 years). They were all right-handed and had normal or corrected-to-normal vision and no history of neurological or language impairment. All participants gave informed consent and received 100 RMB for their participation.

Materials

Thirty-two black-and-white line drawings of common objects were selected from the CRL International Picture Naming Project (Bates et al., Citation2000) and other standardised picture databases (Snodgrass & Vanderwart, Citation1980; Zhang & Yang, Citation2003). Pictures were standardised to 300 by 300 pixels and appeared in the centre of the screen as black line drawings on a white background. The target pictures were homogeneous in terms of word length (number of syllables, mean = 2.04, SD = .43); and, based on ratings on a 5-point Likert scale, concept familiarity (mean = 4.63, SD = .29), visual complexity (mean = 2.43, SD = .68), subjective word frequency (mean = 3.04, SD = .85), age of acquisition (mean = 5.02 years, SD = 2.78), and name agreement (the percentage of participants giving the most common name, mean = .81, SD = .12; see Liu, Hao, Li, & Shu, Citation2011 for details of the norming measurements).

Sixteen of the pictures were selected and combined to create four semantically homogeneous blocks (henceforth S+) with four pictures in each block. The pictures in each block were repeated four times in a cyclic manner. As noted above, the pictures in a semantically homogeneous block belonged to the same semantic category, such as 眼睛 (/yan3jing1/, [eye]), 耳朵 (/er3duo1/, [ear]), 胳膊 (/ge1bo0/, [arm]), 肩膀 (/jian1bang3/, [shoulder]). The four blocks contained items belonging to the semantic categories of: animals, clothing, body parts and furniture, respectively. The same sixteen pictures were shuffled and combined to create four semantically heterogeneous blocks (henceforth S-). Twenty native Mandarin speakers who did not participate in the naming experiment were asked to rate semantic relatedness (in term of semantic category) of each set of 4 pictures. The average rating scores were 4.98 (S+) and 1.6 (S-) on a 1-to-5 scale, suggesting the semantically homogeneous blocks were semantically related and the semantically heterogeneous blocks were semantically unrelated.

The other sixteen pictures were selected and combined to create four phonologically homogeneous blocks (henceforth P+) with four pictures in each block. The picture names in a phonologically homogenous block overlapped in their phonological onsets in terms of syllable structure, such as吉他 (/ji2ta1/, [guitar]), 剪刀 (/jian3dao1/, [scissors]), 镜子 (/jing4zi0/, [mirror]), 金字塔 (/jin1zi4ta3/, [pyramid]). There was no overlap in lexical tones. All sixteen pictures were then shuffled and combined to create four phonologically heterogeneous blocks (henceforth P-). The target pictures were considered semantically unrelated based on the rating scores of semantic relatedness: 1.54 (P+) and 1.32 (P-) on a 1–5 scale.

In total, there were sixteen experimental blocks (semantic: 4 homogeneous and 4 heterogeneous and phonological: 4 homogeneous and 4 heterogeneous) resulting in 236 experimental trials. Within each block, each picture was repeated in a pseudo-randomized cyclic manner, i.e. each picture appeared once in each position of the cycle. The sequence of blocks was pseudo-randomized using Mix (Van Casteren & Davis, Citation2006) so that the same block condition did not appear in two consecutive blocks.

Procedure and apparatus

Participants were seated in front of a monitor at a distance of approximately 50 cm in a soundproof booth. The stimuli were presented using the software E-prime 2.0 and the reaction times (RT) were measured online by a voice-key connected with a PST serial response box. The participants’ vocal responses were recorded using the microphone. Incorrect responses were coded manually. Mis-triggered RTs were inspected and corrected manually using the CheckVocal programme (Protopapas, Citation2007).

Before the experiment, the participants were familiarised with the pictures and the names used in the experiment. Each picture was presented once in the centre of the screen for 2 s. Following the familiarisation, there was a practice session where participants were asked to name the pictures. On each practice trial, a fixation cross appeared in the centre of the screen for 500 ms, followed by a jittered blank screen for 500, 600 or 750 ms. Then, the target picture appeared and lasted until the voice-key was triggered or a 2-s limit was exceeded, followed by another blank screen (2 s). Responses that deviated from the names given in the familiarisation phase were corrected by the experimenter.

The experimental trial procedure was the same as that of the practice trials. There were four warm-up trials preceding each experimental list, with pictures that were not included as targets. There were self-paced breaks between blocks. The whole experiment lasted about one hour, comprising 30 min setting up the electroencephalogram (EEG) equipment and a 30-minute experimental session.

Electroencephalogram recording and data pre-processing

Participants’ EEG was recorded simultaneously with 64 Ag/AgCI electrodes using BrainCap (Brain Products GmbH, Germany), following the international 10–20 system. Two EOG electrodes were placed beneath the left eye and at the external canthus of the right eye to record eye movements. On-line recording was referenced to the electrode “AFz” and the signals were recorded at a sampling rate of 500 Hz. The signals were preprocessed using the Matlab toolbox Fieldtrip (Oostenveld, Fries, Maris, & Schoffelen, Citation2011). The signals were offline re-referenced to the average of all channels and the data from peripheral electrode sites were excluded to avoid possible muscle activity contamination. The signals of the remaining channels (59) were then band-pass filtered from 0.1 to 30 Hz. ERPs were time-locked to the onset of target pictures and were first segmented from −500  to 1000 ms. Artifact rejection was firstly implemented using the visual artifact rejection function in Fieldtrip to remove segments with variance values bigger than 1000 µV2 (with the threshold determined based on visual inspection of all participants’ recordings). Next, an independent component analysis (ICA) was performed in Fieldtrip (code based on a function in EEGLAB; Delorme & Makeig, Citation2004) to remove the eye-movement artifacts. At most two components per dataset were identified as vertical and horizontal eye movements and removed from the EEG signals for further analysis. The trials were then segmented from −350 to 650 ms with a −350 ms to −50 ms pre-stimulus baseline. Trials with amplitudes exceeding ± 100 μV within each trial, or exceeding 5 standard deviations of a participant’s mean amplitude of all trials were considered outliers and rejected from the datasets (The cut-off SD value was determined based on visual inspection of five participants’ recordings). Datasets from ten participants were excluded due to an insufficient number of remaining trials after artifact rejection and technical problems, leaving twenty-two effective datasets (11 female; mean age = 22.5 years, SD = 3.8).

Statistical analysis

A total of 2.72% of all data points (5632) were removed from the behavioural data analysis. This included: (a) incorrect responses; (b) responses with hesitations; (c) voice-key failures (the first three types were considered as errors; the error rate was 2.45% and considered not informative enough for further analysis); (d) outliers (RTs shorter than 200 ms or longer than 1300 ms; 0.27%). Data (both behavioural and EEG) from the first cycle in each semantic block were also excluded, following a common approach in the blocked cyclic naming paradigm (e.g. Belke et al., Citation2005).

Altogether 16.36% of all the experimental trials were removed from the ERP data analysis including error trials (2.45%) and segments rejected during artifact rejection (13.91%). There were in total 4122 trials left for the following analysis. Repeated measures ANOVAs were performed on both behavioural and EEG data.

Results

Semantic effects

In behavioural data analyses, by-subject and by-item repeated measure ANOVAs were performed with block condition (2 levels: homogeneous vs. Heterogeneous) and presentation cycle (3 levels) as two factors. The interaction between the two factors was also included in the models. There was a main effect of semantic relatedness, F1(1, 21) = 28.315, p < .0001,  = .574; F2(1, 15) = 20.878, p < .001,  = .582, demonstrating the semantic blocking effect, i.e. longer RTs in the semantically homogeneous blocks than in the heterogeneous blocks (27 ms; ). There was no significant effect of presentation cycle, F1(2, 42) = 1.214, p = .307,  = .055; F2(2, 30) = .683, p = .513,  = .044. The interaction between block condition and presentation cycle was not significant, F1(2, 42) = .902, p = .413,  = .041; F2(2, 30) = .583, p = .565,  = .037.

Figure 1. The semantic blocking effect in reaction times. Data from the first cycle were excluded (following Belke et al., Citation2005).

Figure 1. The semantic blocking effect in reaction times. Data from the first cycle were excluded (following Belke et al., Citation2005).

EEG data were also submitted to repeated measures ANOVA, with the mean amplitudes for every consecutive 50 ms time window from 0 to 550 ms as the dependent variable and the region of interest (henceforth ROI; 4 levels: left-anterior – F1, F3, F5, FC3, FC5, right-anterior – F2, F4, F6, FC4, FC6, left-posterior – P1, P3, P5, CP3, CP5 and right-posterior – P2, P4, P6, CP4, CP6) and block condition (2 levels: semantically homogeneous versus heterogeneous) as the independent variable (following a similar approach in e.g. Aristei et al., Citation2011; Costa, Strijkers, Martin, & Thierry, Citation2009; Dell’Acqua et al., Citation2010; Jescheniak, Hahne, & Schriefers, Citation2003). The results showed that in the early time windows (i.e. 0–50 ms, 50–100 ms, 100–150 ms and 150–200 ms), there was only a main effect of ROI, p-values < .01, indicating that the mean amplitudes were significantly different between ROIs. Neither the effect of semantic relatedness nor the interaction between ROI and semantic relatedness reached significance.

Between 200 and 500 ms, there was a main effect of ROI, F-values >11.0, p-values < .01. The interaction between ROI and semantic relatedness was significant, F-values >4.5, p-values < .03. There was a trend of interaction between ROI and semantic relatedness between 500 and 550 ms, F = 2.9, p = .08. The mean amplitudes per ROI in the semantically homogeneous and heterogeneous conditions were then submitted to pair-wise t-tests, summarised in c. Generally, in the anterior regions, the S- condition elicited more negativities than the S+ condition (see a). In the posterior regions, the S- condition elicited more positivities than the S+ condition (see b). The pattern was consistent within 200–550 ms (see ). The detailed effects in each ROI are summarised in c.

Figure 2. The grand average ERPs of the semantically homogeneous (S+) and heterogeneous (S-) conditions. The top graph (a) depicts the ERPs from a representative anterior electrode FC4, with more negativities in the S- than S+ condition. The middle graph (b) depicts the ERPs from a representative posterior electrode Pz, with more positivities in the S- than the S+ condition. The bar graph (c) summarises the p-values resulting from the pairwise t-tests on the mean amplitudes within each time window per ROI in the semantic blocks. The red line refers to the significance level .05. Four ROIs are represented: left-anterior (blue), right-anterior (green), left-posterior (yellow) and right-posterior (orange).

Figure 2. The grand average ERPs of the semantically homogeneous (S+) and heterogeneous (S-) conditions. The top graph (a) depicts the ERPs from a representative anterior electrode FC4, with more negativities in the S- than S+ condition. The middle graph (b) depicts the ERPs from a representative posterior electrode Pz, with more positivities in the S- than the S+ condition. The bar graph (c) summarises the p-values resulting from the pairwise t-tests on the mean amplitudes within each time window per ROI in the semantic blocks. The red line refers to the significance level .05. Four ROIs are represented: left-anterior (blue), right-anterior (green), left-posterior (yellow) and right-posterior (orange).

Phonological effects

In the behavioural data analyses, by-subject and by-items repeated measure ANOVAs were performed with block condition (2 levels: homogeneous versus heterogeneous) and presentation cycle (4 levels) as two factors. The interaction between the two factors was also included in the models. There was a main effect of phonological relatedness, F1(1, 21) = 11.111, p = .003,  = .346; F2(1, 15) = 11.250, p = .004,  = .429, indicating phonological facilitation, with shorter RTs in the phonologically homogeneous blocks than in the heterogeneous blocks (−13 ms). There was also a main effect of presentation cycle, F1(3, 63) = 50.085, p < .0001,  = .705; F2(3, 45) = 51.976, p < .0001,  = .776, indicating that RTs in the later cycles were shorter than in the earlier cycles (see ). The interaction between block condition and presentation cycle was not significant, F1(3, 63) = .754, p = .524,  = .035; F2(3, 45) = .893, p = .452,  = .056.

Figure 3. The phonological facilitation effect in reaction times across presentation cycles.

Figure 3. The phonological facilitation effect in reaction times across presentation cycles.

In EEG analyses, between 0 and 350 ms, there was only a main effect of ROI, F-values >3.4, p-values < .01, indicating the mean amplitudes were significantly different between ROIs. Neither the effect of phonological relatedness nor the interaction between ROI and phonological relatedness reached significance.

Between 350 and 500 ms, there was a main effect of ROI, F-value >13, p-value < .0001 and a significant interaction between ROI and phonological relatedness between 350 and 550 ms, F-values >1.8, p-values < .05. The mean amplitudes per ROI in the phonologically homogeneous and heterogeneous conditions were then submitted to pair-wise t-tests, summarised in c. The topographic distribution for phonological effects showed a similar pattern to that of the semantic effects. In the anterior regions, the P- condition elicited more negativities than the P+ condition from 400 to 550 ms (see a). In the posterior regions, the P- condition elicited more positivities than the P+ condition from 350 to 550 ms (see b). The detailed effects in each ROI are summarised in .

Figure 4. The grand average ERPs of the phonologically homogeneous (P+) and heterogeneous (P-) conditions. The top graph (a) depicts the ERPs from a representative anterior electrode FC4, with more negativities in the P- than P+ condition. The middle graph (b) depicts the ERPs from a representative posterior electrode Pz, with more positivities in the P- than the P+ condition. The bar graph (c) summarises the p-values resulting from the pairwise t-tests on the mean amplitudes within each time window per ROI in the phonological blocks. The red line refers to the significance level .05. Four ROIs are represented: left- anterior (blue), right-anterior (green), left-posterior (yellow) and right-posterior (orange).

Figure 4. The grand average ERPs of the phonologically homogeneous (P+) and heterogeneous (P-) conditions. The top graph (a) depicts the ERPs from a representative anterior electrode FC4, with more negativities in the P- than P+ condition. The middle graph (b) depicts the ERPs from a representative posterior electrode Pz, with more positivities in the P- than the P+ condition. The bar graph (c) summarises the p-values resulting from the pairwise t-tests on the mean amplitudes within each time window per ROI in the phonological blocks. The red line refers to the significance level .05. Four ROIs are represented: left- anterior (blue), right-anterior (green), left-posterior (yellow) and right-posterior (orange).

Post-hoc analyses

Multiple hypothesis tests are susceptive to Type I and Type II errors, which are also known as false positives and false negatives. In the present study, we employed the Holm–Bonferroni method (Holm, Citation1979) for correction for multiple ANOVA tests on consecutive time windows. Please note that the correction method is rather conservative (correcting for 11 tests) and it may produce false negatives as well while controlling for false positives.

After the Holm–Bonferroni method correction, in the semantic blocks, the interaction between ROI and semantic relatedness within 300–350 ms was no longer significant. However, it is obvious from visual inspection on the grand averages of the ERP waveforms that the ERP semantic effect is more likely to start from 200 ms and continue until around 550 ms rather than being composed of two ERP components with a break at 300–350 ms. The correction for the multiple t-tests on separate ROIs yielded the same pattern of results, with the semantically heterogeneous condition eliciting more negativities in the anterior region between 200 and 550 ms and more positivities in the posterior region between 200 and 550 ms, relative to the semantically homogeneous condition.

Regarding the phonological effects, the interaction between ROI and phonological relatedness remained significant from 450 to 550 ms, which was a smaller time window compared to that before the correction. Nevertheless, the correction for the multiple t-tests on separate ROIs yielded the same pattern of results, with the phonologically heterogeneous condition eliciting more negativities from 400 to 550 ms in the anterior region and more positivities in the posterior region from 350 to 550 ms, relative to the phonologically homogeneous condition.

Discussion

Employing behavioural and electrophysiological measurements, we investigated both the behavioural and neural correlates of spoken word production in the blocked cyclic naming paradigm. We observed both the semantic blocking effect and the phonological facilitation effect: Reaction times (RTs) in the semantically homogeneous blocks were longer than those in the semantically heterogeneous blocks, in line with previous findings (e.g. Abdel Rahman & Melinger, Citation2009; Belke, Citation2017; Belke et al., Citation2005; Damian et al., Citation2001; Damian & Als, Citation2005). Furthermore, shorter RTs were observed in the phonologically homogeneous blocks relative to the phonologically heterogeneous blocks, which is in line with the phonological facilitation effect shown in previous studies (e.g. Damian, Citation2003; Roelofs, Citation1999; but see Damian & Dumay, Citation2009 for an inhibitory effect). In the electrophysiological data, semantic relatedness modulated the ERP waveforms from about 200 ms and phonological relatedness from about 350 ms after the picture presentation.

In the semantic blocks, significant ERP effects were observed from around 200–550 ms after picture presentation, indicating the effect takes place during lexical selection (Belke et al., Citation2005; Belke, Shao, & Meyer, Citation2017; see Indefrey, Citation2011; Indefrey & Levelt, Citation2004). Generally, the semantically heterogeneous condition elicited more negativities in the anterior region and more positivities in the posterior region within the same time window (i.e. 200–550 ms). The results are thus at odds with the account put forward by Navarrete et al. (Citation2014) based on the incremental learning mechanism which predicts an ERP effect before 200 ms. Our ERP effect in the anterior region bears similarity to the negative effect observed in Cycles 1, 2 and 3 between 250 and 400 ms in Janssen et al. (Citation2015), with the heterogeneous condition eliciting more negativities. Janssen et al. (Citation2015) interpreted the negative component as reflecting the ease of integrating semantic information in different semantic contexts, with the heterogeneous blocks as the more difficult condition (Lau, Philips, & Poeppel, Citation2008). This negative ERP component possibly also reflects the ease of retrieving semantic information from memory (Kutas & Federmeier, Citation2011; cf. Janssen et al., Citation2015).

The phonologically heterogeneous condition also elicited more negativities in the anterior region and more positivities in the posterior region. Specifically, significant ERP effects were found from around 400–550 ms in the anterior region and from 350 to 550 ms in the posterior region. The topographic distribution for the phonological effects is similar to that for the semantic effects. As explained in the Introduction, if incremental learning had underlied phonological encoding (Breining et al., Citation2016), a stronger negative effect between 275 and 355 ms would have been observed in the phonologically homogeneous blocks relative to the heterogeneous blocks. However, the present study yielded attenuated negative effects around 400 ms after picture presentation for the phonologically homogeneous condition, contrary to the prediction of the incremental learning account.

The ERP effect in the anterior region resembles the ERP effect associated with phonological priming in the auditory lexical decision task (e.g. Praamstra, Meyer, & Levelt, Citation1994), with greater phonological mismatch (cf. Our phonologically heterogeneous condition) eliciting more negativities. This negative effect also resembles the one found in the semantic blocks, but with a much later onset (i.e. 200 ms for the semantic condition and 400 ms for the phonological condition). This timing difference is in line with the serial time course proposed for semantic and phonological processing in word production using e.g. the go/no-go task (Van Turennout, Hagoort, & Brown, Citation1997) and the picture-word interference task (Zhu, Damian, & Zhang, Citation2015). Note that the onset of the phonological effect in our data overlaps for at least 50 ms with the time window where the semantic effect is found, suggesting that semantic processing precedes phonological processing, but more likely in a cascading or less strictly serial manner.

It is worth noting that in both the semantic and phonological blocks, a positive component was observed in the posterior region. The ERP effect observed in the posterior region in the semantic blocks has the same polarity as the positive component in Janssen et al. (Citation2015) (i.e. Cycles 2–4) with the heterogeneous condition eliciting more positivities, although the current study found an earlier temporal locus (i.e. 200–550 ms) than that in Janssen et al. (Citation2015) (i.e. 500–750 ms). Janssen et al. (Citation2015) interpreted the positive component as reflecting conflict resolution after lexical retrieval, corresponding to the interference effect observed in the Cycles 2–4 in their study. The average RT in Janssen et al. (Citation2015) is 650 ms, which falls within the time window where the positive component is observed in their study. However, in the current study, the average RT in the semantic blocks is 624 ms, which falls outside the time window for our observed positive component. Thus, it is unlikely that this effect, given its early onset (i.e. around 200 ms), should reflect post-lexical processes. Given that the positive component in the posterior region is relatively late (which peaked around 450 ms), a cautionary note is that it may be subjected to the influence of speech-related artifacts. Although low-pass filtering and artifact rejection is expected to remove any speech-related artifacts, other solutions for speech-related artifact rejection have also been proposed such as the SAR-ICA procedure in Porcaro, Medaglia, and Krott (Citation2015). Thus, replication of the current finding, preferably with more than one artifact rejection methods, would be important for future research.

The positive ERP component in the posterior region is close to the P3b component which reflects cognitive workload and/or differences in the probability of pictures seen in homogeneous versus heterogeneous blocks (e.g. Donchin, Citation1981). The P3b wave “depends on the probability of the task-defined category of stimulus” (Luck, Citation2005, p. 44). The items in the homogeneous blocks are more predictable within the context of the task than items in the heterogeneous blocks (either semantically or phonologically). Alternatively, this component may correspond to a novel component related to task representation in the blocked cyclic naming paradigm, as argued by Belke and colleagues (Belke, Citation2008, Citation2017; Belke & Stielow, Citation2013). The component was proposed based on the observations that when participants have to perform a concurrent digit-retention task, their performances are affected in the blocked cyclic naming task, but not in the continuous naming. Belke and Stielow (Citation2013) pointed out that in contrast to the continuous naming, the blocked design (homogeneous vs. Heterogeneous blocks) means that participants are able to formulate a task-relevant representation and adopt a top-down bias. According to Belke and Stielow’s (Citation2013) account, participants can bias the level of activation of words by memorising the picture set after the first cycle. In the heterogeneous context, the bias-selection mechanism is more efficient because the participants bias only one candidate per semantic category. In the homogeneous context, however, the bias does not help resolve the competition during lexical selection, thus it is more effortful to name pictures in the homogeneous blocks. Ultimately, this account and the probability account are not mutually exclusive. The ERP effects in the posterior region, however, are not easily explained by the account put forward by Navarrete et al. (Citation2014). The reason is that greater priming or ease of adjusting the connections between semantic-lexical features and lexical-segmental features in the heterogeneous blocks would predict an attenuated ERP effect for the heterogeneous condition, rather than the homogeneous condition as observed in the current study.

In summary, in the current study both the semantic blocking effect and phonological facilitation effect were observed in behavioural and electrophysiological data. Distinct but similar ERP effects in the posterior region were observed in both semantic and phonological blocks, with the heterogeneous condition showing more positivities. The ERP effects run against the account put forward by Navarrete et al. (Citation2014) based on the incremental learning mechanism (Oppenheim et al., Citation2010). The positive component is likely to reflect greater cognitive workload, lower predictability of stimuli and may arise due to a task-related top-down selection bias. These results shed light on the neural correlates of blocked cyclic naming and provide novel evidence for our further understanding of the semantic and phonological processing involved in spoken word production.

Acknowledgement

We thank Frank Mertz for help with the Matlab script. We thank Elly Dutton for proofreading this manuscript.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by grants from “Talent & Training China-Netherlands” programme.

Notes

1 In the present study, we will refer to this slowing-down effect observed in the blocked cyclic naming paradigm as the semantic blocking effect to differentiate it from semantic interference effects in the cumulative semantic interference paradigm (Costa et al., Citation2009; Howard et al., Citation2006; Navarrete, Mahon, & Caramazza, Citation2010) or the picture-word interference paradigm (e.g., Glaser & Düngelhoff, Citation1984; Schriefers, Meyer, & Levelt, Citation1990).

References