ABSTRACT
We present an open-access analytic tool, which allows researchers to simultaneously control for and combine language data from the child, the caregiver, multiple languages, and across multiple time points to make inferences about the social and cognitive factors driving the shape of language development. We demonstrate how the tool works in three domains of language learning and across six languages. The results demonstrate the usefulness of this approach as well as providing deeper insight into three areas of language production and acquisition: egocentric language use, the learnability of nouns versus verbs, and imageability. We have made the Frequency Filter tool freely available as an R-package for other researchers to use at https://github.com/rosemm/FrequencyFilter.
Acknowledgments
Thank you to the anonymous reviewer who substantially improved the merit of this paper through their attention to detail, patience and expertise.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1. A separate line of information in the corpus data that codes morphemic segments by type and part of speech. For example: *MAR: I wanted a toy. %mor: PRO|I&1S V|want-PAST DET|a&INDEF N|toy. More technical detail can be found here: https://talkbank.org/manuals/CHAT.pdf
2. This is the training tier for the POST tagger. It has the same form as the %mor line.