75
Views
4
CrossRef citations to date
0
Altmetric
Original Articles

Markov Associativities

Pages 123-137 | Published online: 16 Feb 2007
 

Abstract

Quantifying the concept of co-occurrence and iterated co-occurrence yields indices of similarity between words or between documents. These similarities are associated with a reversible Markov transition matrix, the formal properties of which enable us to define euclidean distances, allowing in turn to perform words-documents correspondence analysis as well as words (or documents) classifications at various co-occurrences orders.

ACKNOWLEDGEMENTS

Thanks to M. Rajman and J.-C. Chappelier for stimulating discussions, and to N. Jufer and S. Durrer for their textual data.

Notes

1This distribution is unique if njk is irreducible, that is, not degenerate into two or more components (for instance, one component containing French words only in French documents and another containing German words only in German documents, with no lexical intersection).

2Reversibility characterizes here the word – word or document – document association, and does not refer to the sequential ordering of words inside documents, of course.

3cf. the behaviour of category DETDEMFS in illustration 5 below.

4A significant exception to this is the case of co-ordination.

5 La Liberté, edited in Fribourg, Switzerland.

6Key to the abbreviations: PREP = preposition, ADV = adverb, NC(M|F)(S|P) = masculine/feminine singular/plural common noun, ADJ(M|F)(S|P) = masculine/feminine singular/plural adjective, ADJ(S|P)IG = idem, gender-invariant, DET(I|D|DEM|POSS)(M|F)S = indefinite/definite/demonstrative/possessive masculine/feminine singular article, DET(I|D|DEM|POSS)(S|P)IG = idem, singular/plural gender-invariant.

7Proof:

The associativity

of order r is the ratio of the probability to get the word j’ starting from word j to the relative frequency of word j' : first, draw a document k containing word j, pick another word l in k, find another document k’ containing l, pick another word l’ in k’, find another document k” containing l’, and finally pick (or not) word j’ in k”.

Singular-plural factor α = 2 opposes cluster 2 ((ADJ|DETPOSS)SIG) to cluster 4 (NC(F|M)P, ADJ(F|M)P, ADJPIG, ADJNUM, DET(D|I|DEM|POSS)PIG). Masculine-feminine factor α = 3 opposes cluster 1 ((NC|ADJ)MS, DET(D|I|DEM|POSS)MS) to cluster 3 ((NC|ADJ)FS, DET(D|I|DEM|POSS)FS).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 394.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.