468
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Is Regularization Uniform across Linguistic Levels? Comparing Learning and Production of Unconditioned Probabilistic Variation in Morphology and Word Order

ORCID Icon, ORCID Icon, & ORCID Icon
 

ABSTRACT

Languages exhibit variation at all linguistic levels, from phonology, to the lexicon, to syntax. Importantly, that variation tends to be (at least partially) conditioned on some aspect of the social or linguistic context. When variation is unconditioned, language learners regularize it – removing some or all variants, or conditioning variant use on context. Previous studies using artificial language learning experiments have documented regularizing behavior in the learning of lexical, morphological, and syntactic variation. These studies implicitly assume that regularization reflects uniform mechanisms and processes across linguistic levels. However, studies on natural language learning and pidgin/creole formation suggest that morphological and syntactic variation may be treated differently. In particular, there is evidence that morphological variation may be more susceptible to regularization. Here we provide the first systematic comparison of the strength of regularization across these two linguistic levels. In line with previous studies, we find that the presence of a favored variant can induce different degrees of regularization. However, when input languages are carefully matched – with comparable initial variability, and no variant-specific biases – regularization can be comparable across morphology and word order. This is the case regardless of whether the task is explicitly communicative. Overall, our findings suggest an overarching regularizing mechanism at work, with apparent differences among levels likely due to differences in inherent complexity or variant-specific biases. Differences between production and encoding in our tasks further suggest this overarching mechanism is driven by production.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 It is also worth mentioning that morphological and syntactic phenomena are not always clearly differentiated, particularly in the context of language acquisition and change.

2 Similarity between a correct answer in the input language and the response provided on each trial was calculated using normalized Damerau-Levenshtein edit distance (Damerau, Citation1964; Levenshtein, Citation1966). We excluded participants’ data with an average distance of more than two edits (i.e., typos) per response (excluding bare nouns), or with greater than 20% of descriptions in which a word was omitted entirely or inserted (i.e., descriptions which had less or more words than required).

3 The languages were designed to be small so as to minimize the effort of learning the lexicon and keep the experiment to under an hour when extended to include dyadic communication in Experiment 2.

4 This type of morphological variation could result from languages with rich inflectional morphology within noun phrases, such as Swedish where adjectives and determiners agree in gender and number with the head noun: e.g., en grön stol (“a green chair”) and min gröna stol (“my green chair”), or mitt gröna bord (“my green table”) and mina gröna bord (“my green tables”).

5 Hupp et al. (Citation2009) find that speakers of a suffixal language such as English are more likely to treat two words as referring to the same referent if they differ in their endings, rather than in their beginnings. Further, Martin and Culbertson (Citation2020) shows that this preference is not present in speakers of prefixal languages such as Kitharaka.

6 These are not included in .

7 Individual trials in which words were inserted or omitted were also excluded from analysis: the mean proportion of excluded trials (out of 50) per participant was 0.014 (SD=0.042) and 0.01 (SD=0.029) for the Morphology and Word Order conditions respectively.

8 Typos were generally corrected to the closest vocabulary item, that is, the vocabulary item with the lower Demerau-Levenshtein’s distance (Damerau, Citation1964; Levenshtein, Citation1966); if there was not a single closest vocabulary item, they were not corrected. More specifically, we corrected one-off misspellings with the correct initial syllable and final vowel (e.g., nepli instead of nefri or kolpra instead of kogla) or the systematic misspelling of lexical items (e.g., consistent use of kolga or korpa instead of kogla). Note that if the participants produced two variants (more than once each) that could be corrected to the same vocabulary item (e.g., konga and kolga), only one variant was corrected (i.e., the variant with the lowest edit distance, or the most frequent one otherwise); in cases where one of these variants was the target one, the others were not corrected (e.g., konga would not be corrected if kogla was also used more than once). However, we also allowed for innovations by participants. For example, in the Morphology condition, we retained additional variants introduced by a given participant that did not fall within the aforementioned categories of typo. Similarly, in the Word Order condition, production of two-modifier phrase word orders which were not present in the input were not corrected. These additional variations introduced by participants could therefore increase their entropy scores beyond the maximal entropy of the distribution of variants in the input language. After correction, the mean proportion of trials with innovative lexical items and word orders per participant in the Morphology and Word Order conditions, respectively, were 0.042 (SD = 0.104) and 0.049 (SD = 0.108) which amount to an average of approximately two innovative trials (out of 50) per participant; note that we only count those innovations that increase the number of variants participants produce beyond the input number because these are the only relevant cases for the entropy measure.

9 We previously ruled out the possibility that participants were producing conditioned variation: the mutual information between the different variants and different nouns was consistently around 0, which means that there exists no relation between the variant that was used and the noun it was used with.

10 The most regular language which is still expressive (i.e., contains a unique description for each picture) would consist of three different variants, one Num Only (e.g., N nefri), one Adj Only (e.g., N kogla) and one two-modifier (e.g., N kogla nefri). As the most regular system would contain a single variant per phrase type, the minimum entropy for the set of productions for a given phrase type considered individually is 0. The final production phase consisted of 50 trials (excluding the 2 Noun trials), divided up into 20 one-modifier trials (half Num and half Adj) and 30 two-modifier trials: the entropy lower bound for the overall language (i.e. not treating each phrase type separately) is thus 1.37 bits (represented as a solid vertical line in ). The overall input entropy for the same number of trials would be 2.72 bits (represented as a dotted vertical line in ).

11 Reverse Helmert coding compares levels of a variable with the mean of the previous levels of the variable, the intercept being the grand mean. It allows us compare the means of the one-modifier phrases to each other and the mean of the two-modifier phrases against the mean of those.

12 Using reverse Helmert coding with two levels is equivalent to using simple contrast coding, which compares each level to the reference level but unlike in treatment coding, the intercept is the grand mean.

13 A model including by-Subject random slopes for Phrase Type given the parameters and power we have is over-fitted and thus we report the simpler model.

14 Additionally, we ran a mixed-effects logistic regression to test whether the proportions of prenominal two-modifier phrases differed from those predicted from the product of Num N and Adj N productions within one participant’s use. If a participant produced the variant Num N 40% of the time and the variant Adj N also 40% of the time and they combined them to produce two-modifier variants proportionally, they would produce Num Adj N, N Adj Num, Num N Adj and Adj N Num orders 16%, 36%, 24% and 24% of the time respectively. With the model, we wanted to test whether the proportion of two-modifier prenominal variants was proportional to the proportion of prenominal productions of one-modifier phrases. Results show that there is no difference between the actual output proportions of two-modifier phrase variants and the predicted proportional productions from one-modifier variants (β=0.345,SE=0.682,p=0.613). These results at least do not contradict a non-trivial relationship between prenominal modification in one-modifier and two-modifier variants.

15 Note that although N Num Adj (non-isomorphic) is dispreferred relative to Num Adj N by English participants in the lab, recent work shows that this dispreference is not strong (Martin, Holtz, Abels, Adger & Culbertson, Citation2020) .

16 Individual trials whose phrases did not contain the right number of words were excluded form analysis as in Experiment 1a: the mean proportion of excluded trials per participant was 0.006 (SD=0.012). Participants were also allowed to introduce new variants not present in the input: the mean proportion of trials with innovative word orders per participant in Word Order 2 was 0.06 (SD=0.11), which amounts to an average of three variants (out of 50) per participant.

17 We used the bayestestR library (Makowski et al., Citation2019) to further explore the strength of the evidence in favor of the null hypothesis – that entropy drop scores do not differ across conditions – by comparing a regression model as presented in the main text (but only with the data from Morphology and Word Order 2) with an intercept-only model (i.e., not containing condition as a predictor). Models are compared by their BIC measures, allowing a Bayesian comparison of non-nested frequentist models (Wagenmakers, Citation2007). The results reveal a Bayes factor in favor of the full model of < 0.01. This indicates very strong evidence in favor of the null (intercept-only) model over the full model, suggesting that we can have high confidence in the similarity in entropy scores across conditions.

18 Further, we found a Bayes Factor in favor of the full model (over an alternative not including condition, phrase type, or experiment as fixed effects) of < 0.01. This again points to very strong evidence for the null (intercept-only) model, and allows us to conclude with confidence that entropy drop scores are similar across conditions and phrase types, as well as across isolate production and dyadic communication.

19 The Jensen-Shannon distance is the square root of the Jensen-Shannon divergence, which is a symmetrized version of the more general Kullback-Leibler divergence metric. Let P and Q be two probability distributions. Kullback-Leibler divergence is defined as:

(2) DKL(P||Q)=iP(i)log2P(i)Qi.(2)

we then symmetrize this expression and take the square root to obtain the Jensen-Shannon distance, given by

(3) JSD(P||Q)=DKL(P||M)+DKL(Q||M)2,(3)

where M=(P+Q)/2.

20 Note that the ungrammatical variants judged by participants in the word order conditions differ slightly across experiments, due to our exclusion of prenominal two-modifier variants in Experiments 1b and 2. In Experiment 1a, word order violations were two-modifier phrases with either Adj Num N or N Num Adj order. In Experiments 1b and 2, the ungrammatical word orders were Adj Num N and Num Adj N. In both cases, however, failure to reject ungrammatical word order variants was highly systematic: in the task, there were two relevant trials out of a total of 24, resulting in a score of 22/24 (0.917) for all participants.

21 We converted the input and reported relative frequencies into dummy coded absolute frequencies weighted according to the number of input and output trials respectively. For example, if a participant reported that the two-modifier majority variant N Adj Num occurred 70% of the time, and they produced that variant 30 times during production, we would code 0.7×30 presences and 0.3×30 absences of the majority input variant respectively.

22 Although we cannot say whether participants perceive the variation as lexical or morphological, the errors that participants produce in the Morphology conditions indicate that they learn the initial common phonemes between variants (or “stems”) better than the endings (or “suffixes ”); 70% percent of the errors contained the right “stems”.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.