ABSTRACT
Hashimoto (2021) reported a correlation of −.50 (r2 = .25) between word frequency rank and difficulty, concluding the construct of modern vocabulary size tests is questionable. In this response we show that the relationship between frequency and difficulty is clear albeit non-linear and demonstrate that if a wider range of frequencies is tested and log transformations are applied, the correlation can approach .80. Finally, while we acknowledge the great promise of knowledge-based word lists, we note that a strong correlation between difficulty and frequency is not, in fact, the primary reason size tests are organized by frequency.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 As with Hashimoto (Citation2021) and as suggested by Plonsky (Citation2013), r2 is used as the effect size of consequence as it describes the amount of variance shared between the variables (i.e., the strength of association).
2 Meaning recall responses were also available for the data set used in this paper. The correlation to COCA rank was −0.542 compared to −0.533 for Yes/No responses, and the difference was statistically insignificant (Steiger’s z = 0.245, p > 0.8). While it is possible a larger set of items could establish a significant difference, this result suggests that relative to correlations of two proficiency levels for the same learner, test item format does not make as large a difference with correlations to frequency data.
3 A sensitivity power analysis assuming a 5% alpha and 20% beta threshold revealed that our sample size was powered to detect a minimal effect size of r = .32 (r2 = .10) which is reasonable vis-a-vis Hashimoto’s value of most interest (r = .50).
4 The data was created by Parr (n. d.) and was retrieved from https://codepen.io/adrianparr/pen/jwmjmv?js-preprocessor=babel.