958
Views
60
CrossRef citations to date
0
Altmetric
Original

Data analysis of Chinese characters in primary school corpora of Hong Kong and mainland China: preliminary theoretical interpretations

&
Pages 379-389 | Received 07 Nov 2006, Accepted 12 Jul 2007, Published online: 09 Jul 2009
 

Abstract

Metalinguistic awareness (an awareness about the structure of orthography) had been considered vital for reading acquisition. The awareness of phonological regularity and consistency had been found in advanced readers in recent research. Evidence based on simplified Chinese suggested the effect of semantic transparency on reading in school readers. Studies based on traditional Chinese also reported that reading acquisition, including the development of metalinguistic awareness, is affected by script, properties of characters in school curricula, approaches and strategies of reading training. This paper reports the comparison between corpora of simplified Chinese characters based on primary school textbooks and the updated Hong Kong Corpus of Primary School Chinese (HKCPSC). The proportion of characters in the total curriculum, the ratio of phonetic‐semantic compounds, visual complexity (defined by the number of strokes) and the levels of phonetic regularity and semantic transparency of Chinese characters across grades in the two corpora are compared. Two marked differences found are the frequency‐weighted proportion of regular characters and the proportion of semantically transparent characters across grades. The relationships between the data and recent findings of reading development in Chinese are discussed.

Notes

1. The term “frequency” here refers to the frequency of occurrence, i.e. the number of times a character occurs in a context (e.g. a database, a series of textbooks). For example, character X occurs 156 times while character Y occurs 3 times. High frequency characters occur more frequently while low frequency characters occur less frequently. There are different systems to classify high‐, mid‐, and low‐frequency characters. In our system, the frequency values of characters are ranked in descending order. The top 33% of characters are classified as high frequency, the following 33% as mid frequency and the last 33% as low frequency. Characters which do not occur at all are classified as “unfamiliar”.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 65.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 484.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.