71
Views
4
CrossRef citations to date
0
Altmetric
Original Articles

A critique of the separation base method for genealogical subgrouping, with data from mixe-zoquean

, &
Pages 225-264 | Published online: 22 Dec 2011
 

Abstract

Holm (Citation2000) proposes the “separation base” method for determining subgroup relationships in a language family. The method is claimed to be superior to most approaches to lexicostatistics because the latter falls victim to the “proportionality trap”, that is, the assumption that similarity is proportional to closeness of relationship. The principles underlying Holm's method are innovative and not obviously incorrect. However, his only demonstration of the method is with Indo-European. This makes it difficult to interpret the results, because higher-order Indo-European subgrouping remains controversial. In order to have some basis for verification, we have tested the method on Mixe-Zoquean, a well-studied family of Mesoamerica whose subgrouping has been established by two scholars working independently and using the traditional comparative method. The results of our application of Holm's method are significantly different from the currently accepted family tree of Mixe-Zoquean. We identify two basic sources of problems that arise when Holm's approach is applied to our data. The first is reliance on an etymological dictionary of the proto-language in question, which creates problems of circularity that cannot be overcome. The second is that the method is sensitive to the amount of documentation available for the daughter languages, which has a distorting effect on the computed relationships. We then compare the results of Holm's approach with lexicostatistics and show that the latter actually performs quite well, producing a family tree for Mixe-Zoquean very similar to the one arrived at through the comparative method.

Notes

1We gratefully acknowledge comments from Sheila Embleton and Hans Holm on an earlier version of this paper. The usual disclaimers apply.

2It is now accepted that some languages, such as creoles, do not fit this model. But these are rather special cases, and we have no reason to presume anything other than ordinary genetic descent for Mixe-Zoquean.

3Holm (Citation2003, p. 43) argues that “in real families” it is very difficult to isolate shared innovations from other phenomena such as borrowing. It is also never known if a language had an innovating feature in the past and later lost it. Shared retentions are thus more reliable data. We see no reason to believe that shared innovations should be inherently more difficult to recover than shared retentions. In fact, Holm's belief in the linguist's infallible ability to reconstruct the proto-lexicon is one of the major weaknesses of his approach (see Section 4).

4Thus, the basic data used in Holm's method are very different from the word lists typically used in lexicostatistics. In traditional (“Swadesh”) word lists, the meanings are shared, but not necessary the form. In contrast, Holm's method uses cognates, which, though related in form, might have different meanings in different languages.

5In this model, the proto-language is like an initial stock of lexemes of finite size which slowly diminishes over the family's history. It monotonically decreases, because innovations and borrowings are ignored. A problem with this assumption is the existence of recurrent cognation (Brainard, Citation1970, p. 70; Embleton, Citation1986, p. 63). A lexeme that is lost might later be reinserted into the language by borrowing, e.g., the borrowing of Latin words into French.

6 N should be rounded down to the nearest integer. If N is already an integer, the next lowest integer is an equally accurate estimate. We have taken the average of the two integers in such cases.

7The total of 699 Proto-Mixe-Zoquean entries mentioned by Wichmann (Citation1995, p. 230) also includes reconstructed numerals and grammatical morphemes. We exclude these from consideration in this paper, however, as Holm's focus is on lexical items. It should not significantly affect the outcome in any case.

8We used a program written in perl to tabulate Wichmann's data and make the calculations. Three of the N estimates in are larger than the original number of 618 reconstructed etyma included in this analysis, viz., those for ChisZ-OlP (626), ChisZ-SaP (643), and ChisZ-TxZ (644). The precise reasons for this are unclear, but likely result from the unexpectedly low number of agreements attested for these pairs, even clearly lower than what would be expected by chance.

9For the concept of latest separation, Holm refers to the “nearest neighbour method” as described by Embleton (Citation1986, pp. 30 – 32; Citation1991, pp. 371 – 372). This method groups languages together recursively according to lowest dissimilarity. Note that this method need not simply produce a linear ordering, but is also capable of producing trees. Note further that Holm's “earliest separation” cannot be interpreted as the inverse of Embleton's method. It simply makes no sense in terms of reconstruction to group languages together that are highly dissimilar. The earliest separation should thus be considered a different approach, though likely inspired by Embleton's nearest neighbour method.

10After seeing an earlier version of this paper, Holm replied that this reconstruction was not how he would interpret the data; he claims to see a clear separation between Mixean and Zoquean. However, we still fail to see this anywhere in the N or Bx estimates.

11In order to establish a root, and hence the historical direction of these splits, we would need an outgroup language for comparison. That is, we would need a language that is historically related to Mixe-Zoquean but not a member of the family. For this purpose we might use Proto-Uto-Aztecan (cf. the end of Section 2). However, too few of the items in the list of 618 Proto-Mixe-Zoquean etyma have potential Uto-Aztecan cognates for us determine the root with any level of certainty.

12For this analysis, we used the fitch program from the phylip package.

13For this analysis, we used the T-Rex program.

14For this analysis, we used the neighbor program from the phylip package.

15That this poses a problem to Holm's approach was already recognized by Kendall (Citation1950, p. 42f).

16Embleton (Citation1986, pp. 22 – 24) cites various examples in which selectively available knowledge influences the reconstruction.

17We used the fitch program from the phylip package, which implements this algorithm.

18In this case, detailed comparative work has been done, and we have good evidence for proto-forms in many cases. However, we are trying here to simulate a situation like that in which traditional lexicostatistics is usually applied, where such work has not yet been carried out.

19For this analysis, we used the pars program from the phylip package.

20This particular graph was made with the SplitsTree program, using the NeighborNet algorithm developed by Bryant and Moulton (Citation2002).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.