213
Views
3
CrossRef citations to date
0
Altmetric
Original Articles

Chinese Word Frequency Approximation Based on Multitype Corpora

, &
Pages 142-166 | Published online: 14 May 2010

References

  • Carpenter , B. 2005 . “ Scaling high-order character language models to gigabytes ” . In ACL Software Workshop Ann Arbor, USA
  • Chen , A. , Zhou , Y. , Zhang , A. and Sun , G. 2005 . “ Unigram language model for Chinese word segmentation ” . In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing , Edited by: Huang , Chu-Ren and Levow , G. A. 138 – 141 . Jeju Island, Korea : Association for Computational Linguistics .
  • Chen , G. L. 1994 . On Chinese Morphology , Shanghai : Xuelin Publisher .
  • Cheng , K. S. , Young , G. H. and Wong , K. F. 1999 . A study on word-based and integral-bit Chinese text compression algorithm . Journal of the American Society for Information Science , 50 ( 3 ) : 218 – 228 .
  • Dai , X. L. 1992 . Chinese Morphology and its Interface with the Syntax , Ohio State University . PhD dissertation
  • Emerson , T. 2005 . “ The Second International Chinese Word Segmentation Bakeoff ” . In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing , Edited by: Huang , Chu-Ren and Levow , G. A. Jeju Island, Korea : Association for Computational Linguistics .
  • Gao , J. F. , Li , M. , Huang , C. and Wu , A. 2005 . Chinese word segmentation and named entity recognition: A pragmatic approach . Journal of Computational Linguistics and Chinese Language Processing , 31 ( 4 ) : 531 – 574 .
  • Goldwater , S. , Griffiths , T. L. and Johnson , M. 2006 . “ Contextual dependencies in unsupervised word segmentation ” . In Proceedings of the International Conference of COLING-ACL 2006 , Edited by: Cardie , C. and Isabelle , P. 673 – 680 . Sydney : Association for Computational Linguistics .
  • Holland , J. 1975 . Adaptation in Natural and Artificial Systems , Ann Arbor, MI : University of Michigan Press .
  • Kit , C. , Xu , Z. and Webster , J. J. 2004 . Integrating N-gram model and case-based learning for Chinese word segmentation . Journal of Chinese Language and Computing , 14 ( 3 ) : 213 – 219 .
  • Lafferty , J. , McCallum , A. and Pereira , F. 2001 . “ Conditional random fields: Probabilistic models for segmenting and labeling sequence data ” . In Proceedings of the 18th International Conference on Machine Learning (ICML2001) , Edited by: Brodley , C. E. and Danyluk , A. Pohoreckyj . 282 – 289 . Williamstown, MA : Morgan Kaufmann Publishers .
  • Levow , G. A. 2006 . “ The Third International Chinese Word Segmentation Bakeoff ” . In Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing , Edited by: Ng , H. T. and Kwong , O. O. Y. 108 – 117 . Sydney : Association for Computational Linguistics .
  • Liang , N. Y. 1987 . CDWS: A word segmentation system for written Chinese texts . Journal of Chinese Information Processing , 1 ( 2 ) : 44 – 52 . (in Chinese)
  • Liu , E. S. 1973 . Frequency Dictionary of Chinese Words , The Hague : Mouton and Co. N.V. Publishers .
  • Liu , Y. and Liang , N. Y. 1986 . Counting word frequencies of contemporary Chinese – An engineering of Chinese processing . Journal of Chinese Information Processing , 0 ( 1 ) : 17 – 25 .
  • Low , J. K. , Ng , H. T. and Guo , W. 2005 . “ A maximum entropy approach to Chinese word segmentation ” . In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing , 161 – 164 . Jeju Island, Korea : Association for Computational Linguistics .
  • Mao , J. , Cheng , G. , He , Y. and Xing , Z. 2007 . “ A trigram statistical language model algorithm for Chinese word segmentation ” . In Proceedings of the International Conference of FAW 2007 , Edited by: Preparata , F. P. and Fang , Q. 271 – 280 . Lanzhou, China. Berlin, Heidelberg : Springer Verlag .
  • Palmer , D. D. 1997 . “ A trainable rule-based algorithm for word segmentation ” . In Proceedings of the International Conference of ACL 1997 , Edited by: Cohen , P. R. and Wahlster , W. 321 – 328 . Madrid : Association for Computational Linguistics .
  • Peng , F. , Feng , F. and McCallum , A. 2004 . “ Chinese segmentation and new word detection using conditional random fields ” . In Proceedings of the International Conference of COLING 2004 , Edited by: Lemnitzer , L. , Meurers , D. and Hinrichs , E. 562 – 568 . Geneva, Switzerland : Association for Computational Linguistics .
  • Shannon , C. E. 1948 . A mathematical theory of communication . Bell System Technical Journal , 27 : 379 – 423 . 623–656
  • Sproat , R. , Shih , C. , Gale , W. and Chang , N. 1996 . A stochastic finite-state word-segmentation algorithm for Chinese . Journal of Computational Linguistics and Chinese Language Processing , 22 ( 3 ) : 377 – 404 .
  • Sproat , R. and Emerson , T. 2003 . “ The First International Chinese Word Segmentation Bakeoff ” . In Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing , Edited by: Ma , Q. and Xia , F. 133 – 143 . Sapporo, Japan : Association for Computational Linguistics .
  • Sun , M. S. and T'sou , B. K. Y. 1995 . “ Ambiguity Resolution in Chinese Word Segmentation ” . In Proceedings of the 10th Pacific Asia Conference on Language, Information and Computation , Edited by: T'sou , B. K. and Lai , T. B. Y. 121 – 126 . Hong Kong, China : City University of Hong Kong .
  • Sun , M. S. , Shen , D. Y. and T'sou , B. K. Y. 1998 . “ Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data ” . In Proceedings of the 36th ACL and 17th COLING , Edited by: Boitet , C. and Whitelock , P. 1265 – 1271 . Montreal, Canada : Association for Computational Linguistics .
  • Sun , M. S. , Zhang , Z. C. , T'sou , B. K. Y. and Lu , H. 2006 . “ Word frequency approximation for Chinese without using manually annotated corpus ” . In Proceedings of the 7th International Conference on Intelligent Text Processing and Computational Linguistics , Edited by: Gelbukh , A. F. 105 – 116 . Mexico City, Mexico. Berlin, Heidelberg : Springer Verlag . (CICLING)
  • Tang , T. C. 1992 . Chinese Morphology and Syntax , Taipei : Taiwan Student Publisher .
  • Teahan , W. J. , Wen , Y. , McNab , R. and Witten , I. H. 2000 . A compression-based algorithm for Chinese word segmentation . Journal of Computational Linguistics and Chinese Language Processing , 26 ( 3 ) : 375 – 393 .
  • Xue , N. W. 2003 . Chinese word segmentation as character tagging . Journal of Computational Linguistics and Chinese Language Processing , 8 ( 1 ) : 29 – 48 .
  • Zhu , D. X. 1982 . Lectures on Grammar , Beijing : The Commercial Press .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.