Abstract
Word learning is a lifelong activity constrained by cognitive biases that people possess at particular points in development. Age of acquisition (AoA) is a psycholinguistic variable that may prove useful toward gauging the relative weighting of different phonological, semantic, and morphological factors at different phases of language acquisition and development. Our aim here was to evaluate AoA as a statistical tool for taking “snapshots” of cognitive development. We examined a large corpus of English nouns (n = 1,381) with AoA as the outcome variable in three separate multivariate regressions, encompassing different age ranges (early–middle–late). Predictors included perceptual (e.g., imagery), phonological (e.g., phonological neighborhood density), and lexical (e.g., word length) factors. Different combinations of predictors accounted for significant proportions of the variance for different AoA ranges (i.e., early–middle–late). For example, imageability and frequency are stronger predictors of early relative to late word learning. These corpus analyses support a hybrid model of word learning in which multiple perceptual and linguistic factors are differentially weighted over time. This statistical approach may provide independent corroboration of and motivation for experimental studies in language learning and cognitive development.
ACKNOWLEDGMENT
We would like to thank Dr. Kathy Hirsh-Pasek for her comments and suggestions on an earlier version of this manuscript.
Notes
1We recognize that there are alternative metrics of word frequency using hypertext (Lund & Burgess, Citation1996, as provided in the English Lexicon Project [elexicon.wustl.edu]; Balota et al., Citation2007) and film subtitles (Brysbaert & New, 2009) that may be superior to Kučera and Francis's (Citation1982) text-based frequency estimates in some respects. However, use of these alternative metrics had the potential to create serious collinearity problems with the other predictor variables in this analysis. If anything, using the Kučera and Francis norms in the present analyses would have put our hypotheses at a disadvantage—something that ultimately did not appear to affect our ability to predict words’ AoA.
2Prior to analysis, variables were examined by means of various programs provided in the Statistical Package for the Social Sciences for accuracy of data entry, missing values, and fit between the distributions of each variable and the assumptions of multivariate analysis. The results of the evaluation of assumptions did not identify missing values for any of the variables. In addition, no violations of homoscedasticity and no significant violations of normality were observed. Regarding the assumptions of multicollinearity and singularity, squared multiple correlations (SMC) were calculated for each variable and converted to tolerances (1-SMC). None of the tolerances approached 0, hence satisfying the singularity and multicollinearity assumptions. Additionally, all variables entered the regression equation without violating the default value for tolerance, which further resolved doubts about possible multicollinearity and singularity among the independent variables. Finally, the highest correlations among the variables did not exceed r = .65, which further supports the conclusion that the multivariate assumptions were adequately satisfied (see Tabachnick & Fidell, Citation2001, pp. 56–110).
*p ≤ .05. **p < .01.
a Unique variability = .31; shared variability = .12.
AoA = age of acquisition; FAM = familiarity; IMAG = imageability; FREQ = log of text frequency; NSYL = number of syllables; NCON = number of consonant clusters; NMRPH = number of morphemes; ETYM = etymology; DENS = phonological neighborhood density; B = unstandardized regression coefficients; β = standardized regression coefficients; sr 2 = semipartial correlations.
*p < .05. **p < .01.
a Unique variability = .19; shared variability = .01.
AoA =age of acquisition; FAM =familiarity; IMAG =imageability; NSYL =number of syllables; NCON =number of consonant clusters; NMRPH =number of morphemes; ETYM =etymology; DENS =phonological neighborhood density; STRS =stress; B =unstandardized regression coefficients; β =standardized regression coefficients; sr 2 = semipartial correlations.
*p < .05. **p < .01.
a Unique variability = .19; shared variability = .28.
AoA = age of acquisition; FAM = familiarity; IMAG = imageability; FREQ = log of text frequency; NSYL = number of syllables; NMRPH = number of morphemes; DENS = phonological neighborhood density; STRS = stress; B = unstandardized regression coefficients; β = standardized regression coefficients; sr 2 = semipartial correlations.