75
Views
4
CrossRef citations to date
0
Altmetric
Original Articles

Markov Associativities

Pages 123-137 | Published online: 16 Feb 2007
 

Abstract

Quantifying the concept of co-occurrence and iterated co-occurrence yields indices of similarity between words or between documents. These similarities are associated with a reversible Markov transition matrix, the formal properties of which enable us to define euclidean distances, allowing in turn to perform words-documents correspondence analysis as well as words (or documents) classifications at various co-occurrences orders.

ACKNOWLEDGEMENTS

Thanks to M. Rajman and J.-C. Chappelier for stimulating discussions, and to N. Jufer and S. Durrer for their textual data.

Notes

1This distribution is unique if njk is irreducible, that is, not degenerate into two or more components (for instance, one component containing French words only in French documents and another containing German words only in German documents, with no lexical intersection).

2Reversibility characterizes here the word – word or document – document association, and does not refer to the sequential ordering of words inside documents, of course.

3cf. the behaviour of category DETDEMFS in illustration 5 below.

4A significant exception to this is the case of co-ordination.

5 La Liberté, edited in Fribourg, Switzerland.

6Key to the abbreviations: PREP = preposition, ADV = adverb, NC(M|F)(S|P) = masculine/feminine singular/plural common noun, ADJ(M|F)(S|P) = masculine/feminine singular/plural adjective, ADJ(S|P)IG = idem, gender-invariant, DET(I|D|DEM|POSS)(M|F)S = indefinite/definite/demonstrative/possessive masculine/feminine singular article, DET(I|D|DEM|POSS)(S|P)IG = idem, singular/plural gender-invariant.

7Proof:

The associativity

of order r is the ratio of the probability to get the word j’ starting from word j to the relative frequency of word j' : first, draw a document k containing word j, pick another word l in k, find another document k’ containing l, pick another word l’ in k’, find another document k” containing l’, and finally pick (or not) word j’ in k”.

Singular-plural factor α = 2 opposes cluster 2 ((ADJ|DETPOSS)SIG) to cluster 4 (NC(F|M)P, ADJ(F|M)P, ADJPIG, ADJNUM, DET(D|I|DEM|POSS)PIG). Masculine-feminine factor α = 3 opposes cluster 1 ((NC|ADJ)MS, DET(D|I|DEM|POSS)MS) to cluster 3 ((NC|ADJ)FS, DET(D|I|DEM|POSS)FS).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.