Gradual syntactic triggering: The gradient parameter hypothesis: Language Acquisition: Vol 28 , No 1

ABSTRACT

In this article, we propose a reconceptualization of the principles and parameters (P&P) framework. We argue that in lieu of discrete parameter values, a parameter value exists on a gradient plane that encodes a learner’s confidence that a particular parametric structure licenses the utterances in the learner’s linguistic input. Crucially, this gradient parameter hypothesis obviates the need for default parameter values. Default parameter values can be put to use effectively from the perspective of linguistic learnability but are lacking in terms of empirical and theoretical consistency. We present findings from a computational implementation of a gradient P&P learner. The findings suggest that the gradient parameter hypothesis provides the basis for a viable alternative to existing computational models of language acquisition in the classic P&P paradigm. We close with a brief discussion of how a gradient parameter space offers a path to address shortcomings that have been attributed to the P&P framework.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

¹ Or nearly the same set, taking into account microvariation within a linguistic community.

² For simplicity, as is standard in linguistic L1 learnability studies, the linguistic (or language) environment is assumed to be monolingual.

³ The concept of a subset principle was first introduced in Gold (Citation1967), though not by that name.

⁴ Here, we use utterance and sentence interchangeably to mean an instance of verbal language that a child learner would encounter. When discussing simulation studies of an implemented model, we also use sentence to mean an instance of an artificial language word order pattern that a model of child language learning consumes.

⁵ See Pearl & Lidz (Citation2013:247–249) for an overview of a study presented in Han, Lidz & Musolino (Citation2007) showing that children do make such an arbitrary choice concerning verb raising in Korean despite lack of evidence for verb raising (or not).

⁶ How the learner handles this unambiguous evidence is specific to the learner.

⁷ Local irrelevance is related to global irrelevance discussed previously; global irrelevance entails local irrelevance; however, the reverse does not hold.

⁸ Our definition of approximately correct is different from typical PAC paradigm learning models. We take “approximately correct” to mean an approximation of a discrete parameter value on a continuous parameter gradient (see discussion in section 3.1).

⁹ This is a simplified picture. For some models, rather than discarding or retaining grammars, hypothesis grammars or grammatical choices are made more or less likely to be entertained by the learner in the future.

¹⁰ Only unambiguous e-triggers can be used to implement a classic triggering model. One addition we make to the e-triggers presented in Sakas & Fodor is the development and use of ambiguous e-triggers.

¹¹ From this point forward, we use trigger, e-trigger, and e-trigger schema interchangeably. A trigger refers to an observable pattern in the surface-form. We will consistently use the term i-trigger in the sense outlined perviously.

¹² Both studies (Sakas & Fodor Citation2001 & Citation2012) largely put aside the issue of noisy input (e.g., caregiver misspeaks), as does the simulation presented in section 6. However, it is an interesting question to ask: How do learners in a gradient space perform in noisy environments?

¹³ Sakas and Fodor understood that this mapping is a nontrivial assumption. See discussion in section 3 of S&F. I-triggers and e-triggers are directly tied to Chomsky’s notion of I-language and E-language, which has been a central issue for generative linguistics for decades.

¹⁴ One reviewer astutely pointed out that in light of current minimalist theories, movement might not be so unattractive as a default value. The work presented here does not take a stance on the desirability of any particular default value but rather shows that learning is possible in the absence of defaults.

¹⁵ There are other paradigms that can model stronger and weaker tendencies, e.g., microparameters (Kayne Citation1994) and linear optimality theory (Keller Citation2000, Citation2006).

¹⁶ Gould (Citation2017) also presents a family of models that can learn in domains with superset languages by balancing the effects of ambiguous and unambiguous input during acquisition. See discussion in section 3.2.1.

¹⁷ In the case that a sentence contains more than one viable trigger (though see footnote 34), the NDL arbitrarily picks one giving preference to unambiguous triggers over ambiguous ones.

¹⁸ Another difference from previous models is that the NDL is statistical but at the same time deterministic. That is, there is no randomness in NDL learning. Any NDL learner given the same fixed set of inputs will always converge to to exactly the same place on the gradient for all parameters. This notion of deterministic is different from the notion of deterministic in parsing and learning where once a deterministic learner or parser makes a choice, the choice cannot be revised.

¹⁹ For the study presented in section 6, we only needed to condition on target parameter values that have unambiguous triggers for both the 0 and 1 target values. It is an interesting empirical question whether conditioning on parameters with only ambiguous triggers would be effective. More generally, it is interesting to investigate how changing the balance between ambiguous and unambiguous evidence would affect learning; we leave these questions for future work.

²⁰ See Hornstein’s (Citation2016) blog post for interesting discussion of Yang (Citation2017) on negative versus positive evidence and the repercussions of both.

²¹ Clearly, the success and speed of the NDL depends on the specific learning rates chosen.

²² The distinction between the use of positive ambiguous evidence and indirect negative evidence is stronger where subset-superset relationships involve interactions between more than a single parameter (Yang, p.c.), but that point is not relevant to the specific NDL findings presented in this article.

²³ There have been many proposals to heuristically curtail the potentially damaging effects of randomness in P&P acquisition models. For example, The Single Value Constraint of Clark (Citation1989, Citation1992), which was picked up by Gibson & Wexler (Citation1994), the use of decoding by the Guessing STL model of Fodor & Sakas (Citation2004), etc. However, to the best of our knowledge, only the NDL and the model outlined in Kapur (Citation1994) are statistical but nonrandom.

²⁴ See Sugisaki & Snyder (Citation2006a) for discussion of how children do not toggle back-and-forth between preposition stranding and pied-piping during acquisition as might be predicted by the VL model.

²⁵ Note that the parser would require comprehensive knowledge of domain structure, e.g., what parameter combinations are disallowed, if any.

²⁶ An early version of the VL provably converges to (a correct) final grammar (Straus Citation2008).

²⁷ One reviewer noted that although the VL chooses discrete grammars, one could view the states of both a VL and an NDL learner to be similarly embodied in the VL’s weights and the NDL’s gradients respectively. One could adopt this view (though we would take a VL learner’s state to be the weights together with its current grammar hypothesis). But, for the NDL, the state of the learner is its current hypothesis; whereas the state of a VL learner serves to help guide its probabilistic search of the hypothesis space.

²⁸ Unlike Bayesian learning, it should be noted that, to the best of our knowledge, there is no mathematical treatise of Gould’s learner in the general case.

²⁹ If implemented, smoothing would likely be employed to keep the ratio real-valued.

³⁰ Yang points to related work by Osherson, Stob & Weinstein (Citation1986) that suggests that the sets required to make use of indirect negative evidence in order to implement the Subset Principle may not be computable for infinite domains such as natural language.

³¹ The domain adheres to two linguistically natural constraints on parameter combinations (described when relevant in the following sections), restricting the allowable CoLAG grammars from 2¹³ to 3,072.

³² In CoLAG, tense in declaratives and questions is assumed to be inextricably attached to either a lexical Aux or Verb, which, in the absence of Affix Hopping, must be dominated by I or C.

³³ See footnote 14.

³⁴ Note that in both this work and in S&F, the e-triggers described are not intended to be exhaustive of all the available triggers in the CoLAG domain. Rather, they are meant to be partial but sufficient evidence for successful learning. The decision to restrict the NDL to only a partial collection of the many e-triggers available was made in part for simplicity and in part to limit the computational load placed on the learner. One reviewer made the point that if the NDL made use of a different set of triggers, learning would likely proceed differently. This is true, but it would not affect the overall outcome of learning.

³⁵ In contrast, S&F tried, and succeeded given movement as default, to disentangle this interaction, though, in other cases S&F accepted weak equivalence as sufficient for successful learning.

³⁶ The generalness of the HIP parameter in CoLAG is a simplification that applies to many but not all natural languages. Some languages exhibit different headedness directionality among their IP, NegP, PP, and VP subtrees.

³⁷ The CoLAG UG stipulates that C must always be filled, if not with ka, or a verbal item (Aux or Verb), then with a null complementizer provided by UG.

³⁸ CoLAG has a strict obliqueness order in the VP dictated by the head position (HIP) parameter. In head-initial languages the order is V O1 O2 P O3 Adv, and in head-final languages the order is reversed. Any element occurring in its noncanonical position is considered out of obliqueness order. Thus, obliqueness can used to determine topicalization. This is the first time we require conditioning in an e-trigger definition. How conditioning is employed by the NDL is discussed generally in section 3.2 and specifically concerning affix hopping in section 5.2.6 and ItoC movement in section 5.2.7.

³⁹ Existence of the overt topic is also an ambiguous trigger for no null topic. See for an outline of subject-oriented and topic-oriented languages in CoLAG.

⁴⁰ S&F reject sentences containing all VP complements as suitable triggers because in natural language there is no finite set of VP complements. In this work, we take such sentences in CoLAG to be proxy for verb phrases with a large number of complements. We don’t see this as problematic since, unlike S&F, who consider such sentences as unambiguous evidence (for optional topic), we take them as ambiguous evidence.

⁴¹ This is the first time we require conditioning in an e-trigger definition. How conditioning is employed by the NDL is discussed generally in section 3.2 and specfically concerning ItoC movement in section 5.3.1.

⁴² Recall the NDL uses only unambiguous e-triggers to set these three parameters, which will quickly lead the NDL toward the correct target value and will never lead the NDL in the wrong direction. This is consistent with empirical psycholinguistic evidence (e.g., Christophe et al. Citation2003).

⁴³ In our implementation, we do not exhaustively check for each element. We consider two specific elements (e.g., Aux and S, or Aux and Never) for each language family. As mentioned earlier, the e-triggers used in this work are not exhaustive of those that exist in the domain.

⁴⁴ The VP-edge triggers for ItoC movement are the same as the VP-edge triggers discussed previously for affix hopping. This doesn’t mean that all CIVOS and SOVIC languages have affix hopping. Languages without affix hopping (and without VtoI movement) must have an Aux in every declarative sentence. Due to the adjacency of I and C in CIVOS and SOVIC languages, the setting of ItoC movement is globally irrelevant in CIVOS and SOVIC languages that don’t have affix hopping or VtoI movement.

⁴⁵ We use the term sentence here to mean a word-order pattern in CoLAG.

⁴⁶ Although these are default starting points on the gradient, they are distinctly different from traditional parametric defaults in which parameters are initialized to either one value or the other.

⁴⁷ It should be noted that the sentences that make up a CoLAG target language exist a priori to a simulation run; they are not generated during the course of the simulation.

⁴⁸ In the following section, we discuss potential advantages of NDL learning in the case that the learner arrives at points further from the target endpoints.

⁴⁹ As mentioned in the previous section, the similation effectively used a random uniform distribution of sentences, i.e., each sentence in a target language was equally as likely to occur in an e-child’s linguistic environment.

⁵⁰ Within a CoLAG language, whether or not a sentence contains a subject is arbitrary, unlike natural languages where there are within language constraints on dropped subjects (though, like natural language, the percentage of dropped subjects varies between CoLAG languages the cause is due to interactions between Null Subject other parameters).

⁵¹ For example, defaults might well be incorporated into an implementation of the NDL and could prove useful in modeling a child’s longitudinal trajectory during acquisition (e.g., Sugisaki & Snyder Citation2003, Citation2006b).

⁵² As mentioned, we use converge in the nontechnical sense of “the value on the gradient arrived at by the end of a simulation run.” Technically, updating the C-values of the NDL is asymptotic, and the learner will never converge at one of the end-point target values.

Gradual syntactic triggering: The gradient parameter hypothesis

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Gradual syntactic triggering: The gradient parameter hypothesis

ABSTRACT

Disclosure statement

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date