360
Views
22
CrossRef citations to date
0
Altmetric
Regular articles

The role of lexical variables in the visual recognition of Chinese characters: A megastudy analysis

, &
Pages 1541-1570 | Received 21 Jan 2014, Accepted 18 Oct 2014, Published online: 27 Jan 2015
 

Abstract

Logographic Chinese orthography partially represents both phonology and semantics. By capturing the online processing of a large pool of Chinese characters, we were able to examine the relative salience of specific lexical variables when this nonalphabetic script is read. Using a sample of native mainland Chinese speakers (N = 35), lexical decision latencies for 1560 single characters were collated into a database, before the effects of a comprehensive range of variables were explored. Hierarchical regression analyses determined the unique item-level variance explained by orthographic (frequency, stroke count), semantic (age of learning, imageability, number of meanings), and phonological (consistency, phonological frequency) factors. Orthographic and semantic variables, respectively, accounted for more collective variance than the phonological variables. Significant main effects were further observed for the individual orthographic and semantic predictors. These results are consistent with the idea that skilled readers tend to rely on orthographic and semantic information when processing visually presented characters. This megastudy approach marks an important extension to existing work on Chinese character recognition, which hitherto has relied on factorial designs. Collectively, the findings reported here represent a useful set of empirical constraints for future computational models of character recognition.

Notes

1 There is a more substantial body of research on this for double-character Chinese words. In general, the papers report that participants responded to polysemous words faster (e.g., in lexical decision tasks: Experiment 1 in Chen & Peng, Citation2001; Experiments 1 and 2 in Liu & Peng, Citation2005).

2 Yang et al.'s (Citation2009) computational model, trained on a corpus of 4468 characters, is the most extensive one thus far. However, it is a naming model. The model is also a variation of the lexical constituency model, and therefore subsequent discussion on the lexical constituency model should also apply to Yang et al. (Citation2009).

3 Number of components is another lexical variable that sought to capture the character's visuo-orthographic complexity. Components are combinations of strokes that recur across characters (see Chen & Ye, Citation2009, for a thorough review). However, preliminary analyses showed the variable to be highly correlated to stroke count (r = .66, p < .001). Additionally, the tolerance values for the number of strokes and number of components were very low, pitched at .536 and .554 respectively. Low tolerance is taken as statistical diagnostic of possible collinearity (Berk, Citation1977). To avoid multicollinearity, number of components was excluded in the final analysis. In any case, this variable produced no predictive effect when it was included as the eighth variable (β = 0.205, p = .43).

4 The tolerance values for consistency and regularity were .66 and .57, respectively; and .70 and .73 for phonological frequency and homophone density, respectively. Despite the modest tolerance values for phonological frequency and homophone density, the variables are conceptually very similar; thus only one would still be selected for use (detailed explanations in the text).

5 We are grateful to Y. Liu for sharing a subset of his naming data.

6 We thank an anonymous reviewer for suggesting the second possibility.

7 An additional 2 × 2 within ANOVA analysis was run, based on the same factorial set-up and stimuli as those used in Study 2 of Yang et al. (Citation2009). Character frequency (middle/low) was crossed against consistency (consistent/inconsistent) for 44 characters that were common between Yang et al.’s naming study and the Chinese Lexicon Project. The results did not reveal a Frequency × Consistency interaction [F(1, 34) = 2.25, p = .14, MSE = 2923.41, ]. However, the descriptive statistics from this small-scale factorial reanalysis yielded a similar trend to the underadditive pattern we have observed, such that the reaction times taken for the low-frequency items were similar (low-frequency, consistent characters: M = 682.74, SD = 91.43; low-frequency, inconsistent characters: M = 678.48, SD = 99.32), but a greater discrepancy in speed was noted for the middle-frequency items. Reaction for the middle-frequency, consistent characters (M = 609.65, SD = 71.19) appeared slightly faster than that for the middle-frequency, inconsistent characters (M = 632.79, SD = 79.40). There is therefore some possibility that the underadditive interaction obtained in this paper might even be potentially reproducible in factorial studies using lexical decision. The authors would like to thank J. Yang and Jason Zevin for sharing their stimuli.

8 The metric used by the researchers in support of the multilevel interactive activation framework is published in a Chinese Radical Position Frequency Dictionary (1984). However, there might be only one hand-written copy in existence, and this will not be accessible to most researchers.

9 Xing was one of the primary researchers involved in the project.

Log in via your institution

Log in to Taylor & Francis Online

There are no offers available at the current time.

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.