ABSTRACT
This study employed a text mining method to investigate the lexical features and their dynamic changes of PhD theses across the natural sciences, social sciences and humanities. Four quantitative indices, i.e. TTR, h-point, R1 and writer’s view, were employed to analyze 150 PhD theses (50 theses from each discipline). Although h-point and writer’s view were found counter-intuitively to show insignificant variation across disciplines, the results of TTR and R1 did reveal sharp contrasts between theses in humanities and natural sciences. While the second half of humanities theses showed a significantly higher level of lexical diversity, indicated by higher TTR, theses in natural sciences tended to be richer in content words in the first half, indicated by a higher R1. Meanwhile, theses in social sciences seemed to be more moderate, with features lying in the middle position. This study has implications not only for the widening of applications of quantitative linguistic methods but also for academic writing (especially PhD thesis writing) instruction and practice.
Disclosure statement
No potential conflict of interest was reported by the authors.