Abstract
We apply to the task of linguistic phylogenetic inference a successful cognate identification learning model based on point accepted mutation (PAM)-like matrices. We train our system and we employ the learned parameters for measuring the lexical distance between languages. We estimate phylogenetic trees using distance-based methods on an Indo-European database. Our results reproduce correctly all the established major language groups and subgroups present in the dataset, are compatible with the Indo-European benchmark tree and include also some of the supported higher-level structures. We review and compare other studies reported in the literature with regard to recognized aspects of the Indo-European language family.
ACKNOWLEDGEMENTS
We thank Quentin Atkinson for supplying and commenting the Hittite and Tocharian lists and Geoff Nicholls for providing some of his papers and datasets.