ABSTRACT
This research divides the large-scale written British National Corpus (BNC) and American National Corpus (ANC) into the experimental set and test set and makes a dynamic study of the English intertextual lexical repetition rates. First, a theoretical mathematical model to calculate intertextual lexical repetition rates is established. Then based on the experimental set, the standard deviations for the repetition rates of all text pairs and the 95% confidence intervals for the estimated values of repetition rates for all text pairs are calculated. After the test of the model and its 95% confidence intervals against all the observed intertextual lexical repetition rates in the test set, it is found that the theoretical model and its 95% confidence intervals experience a very small margin of error, and the model can be used to estimate the intertextual lexical repetition rates and their possible range of variation for authentic English texts of different lengths. The significance of this research is that it can be applied in English teaching, learning and English textbook compilation.
ACKNOWLEDGEMENTS
This work was supported by a grant from Development Foundation of Humanities and Social Sciences from Ministry of Education of the People’s Republic of China (No. 12YJA740116). The authors also want to thank Professor Fan Fengxiang of Dalian Maritime University for his valuable advice and help in the research. Special thanks also go to the anonymous reviewers for their insightful and constructive suggestions.