131
Views
1
CrossRef citations to date
0
Altmetric
Original Articles

Sequential Pattern Analysis: A Statistical Investigation of Sequence Length and Support

&
Pages 1044-1062 | Received 12 Aug 2010, Accepted 22 Dec 2011, Published online: 02 Jan 2013

References

  • Agrawal , R. and Srikant , R. 1994 . “ Fast algorithms for mining association rules ” . In Proceedings of the 20th International Conference on Very Large Databases , 487 – 499 . Chile : Santiago .
  • Agrawal , R. and Srikant , R. 1995 . “ Mining sequential patterns ” . In Proceedings of the 11th International Conference on Data Engineering (ICDE ’95) , 3 – 14 . Taipeh, Taiwan .
  • Altmann , G. 1988 . Verteilungen der Satzlängen . Glottometrika , 9 : 147 – 169 .
  • Baker , M. 1993 . “ Corpus linguistics and translation studies – Implications and applications. In: Baker et al. eds ” . In Text and Technology , 233 – 250 . Philadelphia/Amsterdam : John Benjamins Publishing Company .
  • Best , K.-H. 2001 . Probability distributions of language entities . Journal of Quantitative Linguistics , 8 ( 1 ) : 1 – 11 .
  • Brejová , B. , DiMarco , C. , Vinař , T. , Hidalgo , S. R. , Holguin , G. and Patten , C. 2000 . Finding patterns in biological sequences . Technical report CS-2000-22, University of Waterloo
  • Brill , E. 1992 . A simple rule-based part of speech tagger . Proceedings of the 3rd Conference on Applied Natural Language Processing. , ACL : 152 – 155 .
  • Douglas , J. B. 1980 . Analysis with Standard Contagious Distributions , Fairland, Maryland USA : Int. Co-operative Publishing House .
  • Fucks , W. 1970 . “ Analyse formaler Eigenschaften von Texten mit mathematischen Hilfsmitteln ” . In Der Berliner Germanistentag 1968, Vorträge und Berichte , Edited by: Borck , K. H. and Henss , R. 42 – 52 . Heidelberg: Carl Winter Universitätsverlag .
  • Fudos , I. , Pitoura , E. and Szpankowski , W. 1996 . On pattern occurrences in a random text . Information Processing Letters , 57 ( 6 ) : 307 – 312 .
  • Grzybek , P. 1999 . Wie lang sind slowenische Sprichwörter? Zur Häufigkeitsverteilung von (in Worten berechneten) Satzlängen slowenischer Sprichwörter . Anzeiger für Slavische Philologie XXVII , : 87 – 108 .
  • Halliday , M. A. K. , Teubert , W. , Yallop , C. and Čermáková , A. 2004 . Lexicology and Corpus Linguistics—An Introduction , London , New York : Continuum .
  • Hotho , A. , Nürnberger , A. and Paaß , G. 2005 . A brief survey of text mining . Journal for Language Technology and Computational Linguistics , 25 ( 1 ) : 19 – 62 .
  • Jacquemont , S. , Jacquenet , F. and Sebban , M. 2009 . Mining probabilistic automata: A statistical view of sequential pattern mining . Machine Learning , 75 ( 1 ) : 91 – 127 .
  • Kedem , B. 1980 . Binary Time Series , New York : Marcel Dekker, Inc. .
  • Kelih , E. and Grzybek , P. 2005 . Satzlänge: Definitionen, Häufigkeiten, Modelle (Am Beispiel slowenischer Prosatexte) . Journal for Language Technology and Computational Linguistics , 25 ( 2 ) : 31 – 51 .
  • Kolehmainen , L. and Stahl , P. 2007 . “ Das zweisprachige FinDe-Korpus ” . In FinDe. Arbeiten mit dem finnisch-deutschen Kontrastkorpus, Band 3 Edited by: Wolf , N. R. , Wegstein , W. , Jäntti , A. , Piitulainen , M.-L. and Hyärinen , I. URL: http://www.opus-bayern.de/uni-wuerzburg/volltexte/2007/2537/
  • Kulasekera , K. B. and Tonkyn , D. W. 1992 . A new distribution with applications to survival dispersal and dispersion . Communications in Statistics—Simulation and Computation , 21 ( 2 ) : 499 – 518 .
  • Laur , P. , Symphor , J. , Nock , R. and Poncelet , P. 2007 . Statistical supports for mining sequential patterns and improving the incremental update process on data streams . Intelligent Data Analysis , 11 ( 1 ) : 29 – 47 .
  • Laxman , S. and Sastry , P. S. 2006 . A survey of temporal data mining . Sādhanā , 31 ( 2 ) : 173 – 198 .
  • Mannila , H. , Toivonen , H. and Verkamo , A. I. 1997 . Discovery of frequent episodes in event sequences . Data Mining and Knowledge Discovery , 1 : 259 – 289 .
  • Montgomery , D. C. 2005 . Introduction to Statistical Quality Control, 5th ed. , New York : John Wiley & Sons, Inc. .
  • Németh , G. and Zainkó , C. 2001 . “ Word unit based multilingual comparative analysis of text corpora ” . In Proceedings of the Eurospeech 2001 , 2035 – 2038 . Aalborg, Denmark .
  • Németh , G. and Zainkó , C. 2003 . Multilingual statistical text analysis, Zipf’s law and Hungarian speech generation . Acta Linguistica Hungarica , 49 ( 3–4 ) : 385 – 405 .
  • Peltola , M. 2007 . Konversion des parallelen finnisch-deutschen FINDE-Korpus in TEI/XML-Strukturen , MA thesis, University of Würzburg, Germany
  • Régnier , M. and Szpankowski , W. 1997 . On the approximate pattern occurrences in a text . Proceedings of the Compression and Complexity of Sequences 1997 , : 253 – 264 .
  • Sigurd , B. , Eeg-Olofsson , M. and van de Weijer , J. 2004 . Word length, sentence length and frequency – Zipf revisited . Studia Linguistica , 58 ( 1 ) : 37 – 52 .
  • TEI P5 . 2007 . Guidelines for Electronic Text Encoding and Interchange, Version 1.4.0 , Text Encoding Initiative. URL: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html
  • Weiß , C. H. 2007 . “ Sampling in data mining ” . In Encyclopedia of Statistics in Quality and Reliability , Edited by: Ruggeri , et al. 1719 – 1722 . New York: John Wiley & Sons Ltd. .
  • Weiß , C. H. 2008 . Statistical mining of interesting association rules . Statistics and Computing , 18 ( 2 ) : 185 – 194 .
  • Weiß , C. H. 2011 . Rule generation for categorical time series with Markov assumptions . Statistics and Computing , 21 ( 1 ) : 1 – 16 .
  • Wimmer , G. , Köhler , R. , Grotjahn , R. and Altmann , G. 1994 . Towards a theory of word length distribution . Journal of Quantitative Linguistics , 1 ( 1 ) : 98 – 106 .
  • Ye , N. , Zhang , Y. and Borror , C. M. 2004 . Robustness of the Markov-chain model for cyber-attack detection . IEEE Transactions on Reliability , 53 ( 1 ) : 116 – 123 .
  • Zaki , M. J. , Parimi , N. , De , N. , Gao , F. , Phoophakdee , B. , Urban , J. , Chaoji , V. , Al Hasan , M. and Salem , S. 2005 . “ Towards generic pattern mining ” . In Proceedings of the 3rd International Conference on Formal Concept Analysis (ICFCA 2005) , 1 – 20 . France : Lens .
  • Zörnig , P. and Altmann , G. 1995 . Unified representation of Zipf distributions . Computational Statistics and Data Analysis , 19 ( 4 ) : 461 – 473 .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.