Abstract
We propose a new methodology for testing whether two writing samples were written by the same author. While many such tests are based on an index of lexical richness, we propose to use an entire profile of such indices. Specifically, we evaluate a profile of generalized Simpson’s indices for two writing samples and see if the profiles are significantly different or not. We validate our methodology on several poems whose authorship is known. We then apply it to test whether the poem ‘Shall I Die?’ which is sometimes attributed to William Shakespeare was, in fact, written by him. Further, we provide R code and a package for R that easily implements this methodology.
Notes
No potential conflict of interest was reported by the authors.
1 The sonnets, along with all works by Shakespeare used in this study, were downloaded from http://www.opensourceshakespeare.org.
2 These begin with the lines ‘Two households, both alike in dignity’, ‘Now old desire doth in his death-bed lie’, and ‘If I profane with my unworthiest hand’.
3 These begin with the lines ‘If love make me forsworn, how shall I swear to love’, ‘So sweet a kiss the golden sun gives not’, ‘Did not the heavenly rhetoric of thine eye’, and ‘On a day–alack the day!’
4 All poems used in this study that were not written by Shakespeare were downloaded from http://www.theotherpages.org/poems/.