853
Views
5
CrossRef citations to date
0
Altmetric
Articles

The discontinuity model: Statistical and grammatical learning in adult second-language acquisition

Pages 387-415 | Received 22 Jan 2018, Accepted 09 Nov 2018, Published online: 20 Mar 2019
 

ABSTRACT

The Discontinuity Model (DM) described in this article proposes that adults can learn part of L2 morphosyntax twice, in two different ways. The same item can be learned as the product of generation by a rule or as a modification of a template already stored in memory. These learning modalities, which are often seen as opposed in language theory, integrate and superpose in adult SLA. Learners resort to grammatical rules and statistical templates under different circumstances during language processing. Ontogenetically, while in L1 acquisition, the natural endowment for language constrains statistical learners’ capacity by narrowing the hypothesis space; in adult SLA, statistics can reopen the window of opportunity for grammar and drive adult learners to derive part of L2 morphosyntax. This article proposes a computational and psycholinguistic model of how this might occur. According to this model, skewness between transition probabilities (TP) represents the triggering factor in both L1 and L2 acquisition. As fluctuation in TP drives children to individuate the words in a speech stream, so skewness between TP drives adult learners to discover the grammatical features that are hidden in asymmetric chunks.

Acknowledgments

I would like to thank Michael Long for his friendly support and for providing insightful discussion about the DM in the last three years. I would also like to thank Kiel Christianson and another anonymous reviewer for their time and competence.

Disclosure statement

No potential conflict of interest was reported by the author.

Notes

1 There is no agreement about the meanings of N400 and P600 effects. The N400 has been related not only to frequency and SL but also to semantic memory retrieval and preactivation of information in semantic memory (Kutas & Federmeier Citation2011; Wlotko & Federmeier Citation2013). The P600 has been associated not only with reanalysis processes but also to more general controlled processes (e.g., task effects, context, frequency of occurrence of anomalies within the experimental task itself, etc.) and to strong semantic violations, as well as violations of event-level predictions (Kolk et al. Citation2003; Schacht et al. Citation2014; Kuperberg Citation2007). Most recent accounts of the P600 also associate it with the P300 component (Sassenhagen et al. Citation2014).

2 Discontinuity is not equivalent to “nonlinearity,” and the DM differs from theories of nonlinear changes, such as Dynamic System Theory (DST). These theories focus on nonlinearity in learner production and try to account for discontinuities, jumps, U-shaped trajectories, and backsliding by using appropriate statistical models (de Bot et al. Citation2007). In these theories, nonlinear development is described by a continuous, rather than discontinuous, function. Nonlinearity—unlike discontinuity—does not imply the presence of superposing developmental trajectories.

3 Superposition—meant as the continuing simultaneous functioning of the two learning procedures—differs from “syncroninc variation,” an expression utilized in SLA studies to refer to the continued, albeit gradually decreasing coexistence of target-like and non-target-like forms that characterizes also advanced stages of acquisition. The difference lies in the fact that superposition concerns just target-like forms.

4 Unlike in this article, some authors use “chunk” to refer just to the mental representations of all multiword units in long-term memory. Such authors do not distinguish between symmetric and asymmetric chunks.

5 Multiword expressions are often defined as strings of words characterized by different degrees of fixedness and idiomaticity that act as a single unit at some level of linguistic analysis (Wray Citation2017).

6 For details on the statistical procedure utilized to calculate skewness holding between the components of this and of other chunks in written and spoken contemporary Italian, see Appendix section 4.

7 For a list of reference corpora of spoken and written Italian utilized in this study, see Appendix section 1.

8 I adopt Thompson and Newport’s (Citation2007:4) definition: “Transitional probability is a conditional probability statistic that measures the predictiveness of adjacent elements.” Other conditionalized statistics, such as mutual information, conditional entropy, z-scores, and t-scores, include additional information in the formula, such as directionality of the effect and the size of the reference corpus, but they all share the general assumption that the overall probability of co-occurrence of two events equals the ratio between the joint and the disjoint probabilities of the events.

9 Light verb constructions are formed by a verb devoid of semantic content (e.g., get, do, take in English) plus an element (e.g., a noun such as criticism, cleaning, exam) that carries the meaning of the whole expression (get harsh criticism, do the cleaning, take an exam; Grimshaw & Mester Citation1988).

10 A reviewer pointed out that the DM fits nicely with Romance and Germanic languages but gets less convincing when applied to typologically different families—for instance highly agglutinative languages or Semitic languages, with root-and-pattern morphology—because in those cases it is less evident either what may work as a cue for chunking (whether bound morphemes or roots) or how the asymmetry between the components of chunks can be calculated, given that such components are perhaps less clearly decipherable and detachable from one another in the input stream by L2 learners. A possibility is that—in such languages—the functioning of procedural memory and the availability of the combinatorial skills that are supported by it become even more crucial for the learning task. The separateness among different components that procedural memory needs in order to disentangle chunks and rearrange into new combinations is warranted by the uneven distribution of these elements in the input. It does not matter whether such uneven distribution is encoded by the stem-affix relationship, as in most Standard Average European Languages, or it is interpreted at a different layer of linguistic description, such as the alternation in the consonants of the root and the vowels.

11 Tomasello (Citation2003) does not mention the passato prossimo among what he calls “constructions.” However, Joan Bybee (personal communication, April 2010) suggests that aux+PastP does constitute a construction, as “it is a cognitive unit which develops through the speaker’s experience with language.”

12 “A collocation is a word combination whose semantic and/or syntactic properties cannot be fully predicted from those of its components, and which therefore has to be listed in a lexicon” (Evert Citation2005:17). According to Ellis and Odgen (Citation2017:606), collocations are “words with particular selection preference” (see also Matsuno Citation2017).

13 For example, the combination [modal auxiliary verb + budge] ‘will/won’t budge’ is a colligation because the English verb budge is attracted to this specific construction (Sinclair Citation1998:13; Hunston Citation2001:13; Tognini-Bonelli Citation2001).

14 http://corpora.dslo.unibo.it/coris_ita.html.

15 A reviewer observed that people might not be that good at remembering the word immediately preceding the word they are currently hearing or reading. As to reading, there is evidence that people automatically preprocess upcoming words and therefore make shorter fixations on words that were visible in the parafovea during preceding fixations (Li et al. Citation2015). For obvious reasons, the mechanism that would allow speakers to keep track of a preceding word when processing the successive—if any—must be different when the word is just heard. For instance, I have no knowledge of studies using auditory n-back tasks to test speakers’ capacity to retain memory of preceding words in a sequence. I agree that it would be a fascinating series of experiments to run.

16 For example, according to the formula Tn = T1 n−a, a ~ 0.4, where Tn = the time to perform a task after n trials; T1 = the time to perform a task in the first trial; n = the number of trials.

17 A pivot is the word around which other words in the sentence revolve. The term pivot is more theoretically neutral than the term head, even though some authors (e.g., Malec Citation2010:129) utilize the expression collocational head as an equivalent of statistical pivot. A statistical pivot is always defined by its context. A grammatical pivot is defined by its labels (see the following).

18 In chunk (1b) in section 3.1, the statistical pivot is a meno, in chunk (1c) the statistical pivot is stai, in chunk (1d) it is arrivata, in chunk (1e) it is paura, in chunk (1f) it is colazione, and in chunk (1g) it is ridere.

19 In chunk (1c) in section 3.1, the grammatical pivot is come and the label is “Interrogative”; in chunk (1d), the grammatical pivot is è and the label “Unaccusative”; in chunks (1e) and (1g), the grammatical pivots are mette and fa and the label “Causative.”

20 The DM assumes that statistically learned representations are processed with statistical processing mechanisms and that grammatically learned representations are processed with grammatical processing mechanisms. An anonymous reviewer pointed out that it is not logically necessary that there is a one-to-one mapping between learning and processing mechanisms. I acknowledge that the issue exists and that not even ERP data can disentangle it, given that the P600 is no longer directly associated exclusively with grammatical processing.

21 Native speakers of Italian do possess this kind of implicit knowledge. A sample of 176 Italian native speakers (mean age 21;03) at the University of Pavia (Italy) were asked to select A or E for the predicate correre ‘run’ at passato prossimo. When the predicate was followed by prepositional phrases such as those in the variation sets described previously, 73 participants out of 76 selected A, as expected.

22 In this article, the structurally overlapping sentences that make up a variation set do not necessarily need to be uttered within a very short time span—for instance, in the same speech turn—in order to have a learning effect.

23 As constituting “an obligatory response to input” (Batterink et al. Citation2015).

24 In Ullman’s version of the DP model, the word lexicon corresponds to what Paradis (Citation2009:15‒19) calls “vocabulary.” Paradis (Citation2004, Citation2009) in contrast defines the lexicon as the complete, structural set of variously combining items forming a network of combinable stems and affixes. In more recent articles, Paradis seems to agree with Ullman that lexical items are comparable to vocabulary items defined as “form-meaning pairs,” which are stored in declarative memory (Paradis Citation2013). In this article, by “lexicon” I mean the closed set of form-meaning pairs that are stored as unanalyzed wholes in the declarative memory system, as in Ullman’s version of the declarative/procedural model.

25 Michael Ullman, personal communication, October 2017, Siena, Italy.

26 A reviewer correctly pointed out that studies exist (e.g., Luke & Christianson Citation2016) demonstrating that in language processing the predictive capacity could be either limited or very selective (e.g., semantic and morphosyntactic information can be highly predictable even when word identity is not).

27 Evidence is indirect because only the possible effects of gemination in performance data are disclosed, not how SL and GL impact online processing.

28 This technique of course does not allow us to place the shift from SL to GL at an exact point in the developmental path nor to establish whether such a shift was gradual or instantaneous.

29 Unlike the DM, Pinker (Citation1998) uses the terms combinatorial and non-combinatorial to define respectively grammatical and lexical (idiosyncratic, nonderivable) grammar.

30 A reviewer pointed out that take a drink is fine.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.