503
Views
1
CrossRef citations to date
0
Altmetric
Research Article

The Discriminativeness of Internal Syntactic Representations in Automatic Genre Classification

ORCID Icon, & ORCID Icon

References

  • Bekkerman, R., & Allan, J. (2004). Using bigrams in text categorization. CIIR Technical Report IR-408 Center of Intelligent Information Retrieval. USA: University of Massachusetts Amherst.
  • Biber, D. (1988). Variations across speech and writing. Cambridge, UK: Cambridge University Press.
  • Biber, D. (1992). The multidimensional approach to linguistic analyses of genre variation: An overview of methodology and finding. Computers in the Humanities, 26(5–6), 331–347.
  • Biber, D. (1995). Dimensions of register variation: A cross-linguistic comparison. Cambridge, UK: Cambridge University Press.
  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: Analyzing text with the natural language toolkit. Sebastopol, USA: O’Reilly Media.
  • Chen, K. J., Luo, C. C., Gao, Z. M., Chang, M. C., Chen, F. Y., Chen, C. R., & Huang, C. R. (1999). The CKIP Chinese Treebank. In Journêes ATALA sur les Corpus annotes pour la syntaxe. Talana, Paris VII..
  • Fábregas, A. (2007). The internal syntactic structure of relational adjectives. Probus, 19(1), 1–36.
  • Fabrizio, S. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.
  • Fang, A. C. (1996). The survey parser: Design and development. In Sidney Greenbaum (Ed.), Comparing English world wide: The international corpus of English (pp. 142–160). Oxford, UK: Clarendon.
  • Fang, A. C. (2006). A corpus-based empirical account of adverbial clauses across speech and writing in contemporary british english. In Salakoski, T. (ed.), Advances in natural language processing (pp. 32–43). Heidelberg, Berlin: Springer.
  • Fang, C. A., & Cao, J. (2015). Text genres and registers: The computation of linguistic features. New York: Springer, Heidelberg.
  • Fürnkranz, J. (1998). A study using n-gram features for text categorization. Austrian Research Institute for Artificial Intelligence, 3(1998), 1–10.
  • Halliday, M., Matthiessen, C. M., & Matthiessen, C. (2014). An introduction to functional grammar. London: Edward Arnold.
  • Hou, R., & Huang, C. R. (in press). Classification of regional and genre varieties of Chinese: A correspondence analysis approach based on comparable balanced corpora. Journal of Natural Language Engineering.
  • Hou, R., Huang, C. R., Ahrens, K., & Lee, Y. M. (2019). Linguistic characteristics of chinese register based on the Menzerath – Altmann law and text clustering. Digital Scholarship in the Humanities, fqz005. doi:10.1093/llc/fqz005
  • Hou, R., Huang, C. R., Do, H. S., & Liu, H. (2017a). A study on correlation between Chinese sentence and constituting clauses based on the Menzerath-Altmann law. Journal of Quantitative Linguistics, 24(4), 350–366.
  • Hou, R., Huang, C. R., & Liu, H. (2017b). A study on Chinese register characteristics based on regression analysis and text clustering. Corpus Linguistics and Linguistic Theory. AOP. doi:10.1515/cllt-2016-0062
  • Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of European Conference of Machine Learning (pp. 137–142). Berlin Heidelberg: Springer-Verlag.
  • Karlgren, J. (2000). Stylistic experiments for information retrieval (Doctoral thesis). Stockholm University.
  • Karlgren, J. (2004). The whys and wherefores for studying textual genre computationally. In Proceedings of AAAI fall symposium on style and meaning in language, art and music, Arlington.
  • Karlgren, J., & Cutting, D. (1994). Recognizing text genres with simple metrics using discriminant analysis. In Proceedings of the 15th international conference on computational linguistics (COLING 94) (pp. 1071–1075). Kyoto, Japan.
  • Kessler, B., Nunberg, G., & Sch¨utze, H. (1997). Automatic detection of text genre. In Proceedings of the 35th annual meeting of the association for computational linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics (pp. 32–38). Madrid, Spain.
  • Kishima, S., & Ito, K. (1998). U.S. Information processing apparatus using finite state machine. Patent No. 5,790,898. Washington, DC: U.S. Patent and Trademark Office.
  • Lee, Y. B., & Myaeng, S. H. (2002). Text genre classification with genre-revealing and subject-revealing features. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 145–150). New York, USA: ACM.
  • Lewis, D. D. (1992). An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 37–50). Copenhagen, Denmark: ACM.
  • Li, Y. H., & Jain, A. K. (1998). Classification of text documents. The Computer Journal, 41(8), 537–546.
  • Lidy, T., & Schindler, A. (2016). Parallel convolutional neural networks for music genre and mood classification. In Proceedings of MIREX2016 (pp. 1–4). New York, USA..
  • Lim, C. S., Lee, K. J., & Kim, G. C. (2005). Multiple sets of features for automatic genre classification of web documents. Information Processing & Management, 41(5), 1263–1276.
  • Lindemann, C., & Littig, L. (2006). Coarse-grained classification of web sites by their structural properties. In Proceedings of the 8th annual ACM international workshop on Web information and data management (pp. 35–42). Arlington, Virginia: ACM.
  • Liu, H., & Huang, C. R. (2016). EVALution-MAN 2.0: Expand the evaluation dataset for vector space models. In Workshop on Chinese Lexical Semantics, LNCS 10085 (pp. 261–268). New York, NY: Springer International Publishing.
  • Liu, M. C., & Wan, M. Y. 刘美君, 万明瑜. (2019). 中文动词及分类研究: 中文动词词汇语义网的构建及应用 [Mandarin verbs and its classification: The construction of Mandarin VerbNet and its NLP application]. 辞书研究 [Lexicographical Studies], 2, 42–60.
  • Manning, C. D. (2011). Part-of-speech tagging from 97% to 100%: Is it time for some linguistics? In International conference on intelligent text processing and computational linguistics (pp. 171–189). Berlin, Heidelberg: Springer.
  • Martin, J. R. (1984). Language, register and genre in children’s writing. Geelong, Australia: Deaking UP.
  • Mehler, A., Geibel, P., & Pustylnikov, O. (2007). Structural classifiers of text types: Towards a novel model of text representation. LDV Forum, 22(2), 51–66.
  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. ICLR 2013 (vol. 1301.3781), Scottsdale, Arizona, USA.
  • Miller, G. A., & Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1), 1–28.
  • Nanni, L., Costa, Y. M., Lumini, A., Kim, M. Y., & Baek, S. R. (2016). Combining visual and acoustic features for music genre classification. Expert Systems with Applications, 45, 108–117.
  • Neergaard, K. D., & Huang, C. R. (2019). Constructing the mandarin phonological network: Novel syllable inventory used to identify schematic segmentation. Complexity, 2019, 21. Article ID 6979830.
  • Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines. MSRTR: Microsoft Research, 3(1), 88–95.
  • Pustylnikov, O. (2006). How much information is provided by text structure? Automatic text classification using structural features (Doctoral dissertation, Master thesis). University of Bielefeld, Germany.
  • Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A comprehensive grammar of the English language. London: Pearson Longman.
  • Rish, I. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, 3(22), 41–46.
  • Scaringella, N., Zoia, G., & Mlynek, D. (2006). Automatic genre classification of music content: A survey. IEEE Signal Processing Magazine, 23(2), 133–141.
  • Selic, B., Gullekson, G., & Ward, P. T. (1994). Real-time object-oriented modeling (Vol. 2). New York: John Wiley & Sons.
  • Sigtia, S., & Dixon, S. (2014). Improved music feature learning with deep neural networks. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6959–6963). Florence, Tuscany.
  • Stamatatos, E., Fakotakis, N., & Kokkinakis, G. (2000). Automatic text categorization in terms of genre and author. Computational Linguistics, 26(4), 471–495.
  • Swales, J. (1990). Genre analysis: English in academic and research settings. Cambridge, UK: Cambridge University Press.
  • Tan, C. M., Wang, Y. F., & Lee, C. D. (2002). The use of bigrams to enhance text categorization. Information Processing & Management, 38(4), 529–546.
  • Wan, M. Y. (2017). 關於精細句法結構特徵在自動語體分類中的應用性研究 [The Application of Fine-grained Syntactic Features to Automatic Genre Classification] (PhD thesis). Cityu University of Hong Kong.
  • Wan, M. Y., & Fang, A. C. (2018). A re-examination of syntactic complexity by investigating the internal structure variations of adverbial clauses across speech and writing. In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation, Hong Kong.
  • Wan, M. Y., & Liu, M. C. (2018). Supervised word sense disambiguation with frame-based constructional features: A pilot study of fán煩 “to annoy/be annoying/be annoyed”. International Journal of Knowledge and Language Processing, 9(2), 33–46.
  • Wan, M. Y., Xiang, R., Chersoni, E., Klyueva, K., Ahrens, K., Miao, B., … Huang, C. R. (2019). Sentence boundary detection of financial data with domain knowledge enhancement and cross-lingual training. In Proceedings of the first workshop on financial technology and natural language processing (pp. 122–129). Macao, China.
  • Wang, S., Huang, C. R., Yao, Y., & Chan, W. S. (2019). The effect of morphological structure on semantic transparency ratings. Language and Linguistics, 20(2), 225–255.
  • Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and techniques. Massachusetts, USA: Morgan Kaufmann.
  • Wolters, M., & Kirsten, M. (1999). Exploring the use of linguistic features in domain and genre classification. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics (pp. 142–149). Bergen, Norway: Association for Computational Linguistics.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.