958
Views
60
CrossRef citations to date
0
Altmetric
Original

Data analysis of Chinese characters in primary school corpora of Hong Kong and mainland China: preliminary theoretical interpretations

&
Pages 379-389 | Received 07 Nov 2006, Accepted 12 Jul 2007, Published online: 09 Jul 2009
 

Abstract

Metalinguistic awareness (an awareness about the structure of orthography) had been considered vital for reading acquisition. The awareness of phonological regularity and consistency had been found in advanced readers in recent research. Evidence based on simplified Chinese suggested the effect of semantic transparency on reading in school readers. Studies based on traditional Chinese also reported that reading acquisition, including the development of metalinguistic awareness, is affected by script, properties of characters in school curricula, approaches and strategies of reading training. This paper reports the comparison between corpora of simplified Chinese characters based on primary school textbooks and the updated Hong Kong Corpus of Primary School Chinese (HKCPSC). The proportion of characters in the total curriculum, the ratio of phonetic‐semantic compounds, visual complexity (defined by the number of strokes) and the levels of phonetic regularity and semantic transparency of Chinese characters across grades in the two corpora are compared. Two marked differences found are the frequency‐weighted proportion of regular characters and the proportion of semantically transparent characters across grades. The relationships between the data and recent findings of reading development in Chinese are discussed.

Notes

1. The term “frequency” here refers to the frequency of occurrence, i.e. the number of times a character occurs in a context (e.g. a database, a series of textbooks). For example, character X occurs 156 times while character Y occurs 3 times. High frequency characters occur more frequently while low frequency characters occur less frequently. There are different systems to classify high‐, mid‐, and low‐frequency characters. In our system, the frequency values of characters are ranked in descending order. The top 33% of characters are classified as high frequency, the following 33% as mid frequency and the last 33% as low frequency. Characters which do not occur at all are classified as “unfamiliar”.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.