Search in:

Advanced search

Communication Methods and Measures Volume 14, 2020 - Issue 4

Submit an article Journal homepage

904

Views

CrossRef citations to date

Altmetric

Research Article

Reproducible Extraction of Cross-lingual Topics (rectr)

Chung-Hong Chana Mannheimer Zentrum Für Europäische Sozialforschung, Universität Mannheim, Mannheim, GermanyCorrespondence[email protected]

https://orcid.org/0000-0002-6232-7530 View further author information

Jing Zengb Department of Communication and Media Research, University of Zurich, Zurich, Switzerland

https://orcid.org/0000-0001-5970-7172 View further author information

Hartmut Wesslerc Institute for Media and Communication Studies, Universität Mannheim, Mannheim, GermanyView further author information

Marc Jungblutd Department of Media and Communication, LMU München, Munich, Germany

https://orcid.org/0000-0002-2677-0738 View further author information

Kasper Welberse Deptartment of Communication Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands

https://orcid.org/0000-0003-2929-3815 View further author information

Joseph W Bajjaliehf Cline Center for Advanced Social Research, University of Illinois at Urbana-Champaign, Urbana, Illinois, USAView further author information

Wouter van Atteveldte Deptartment of Communication Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands

https://orcid.org/0000-0003-1237-538X View further author information

Scott L. Althausf Cline Center for Advanced Social Research, University of Illinois at Urbana-Champaign, Urbana, Illinois, USAView further author information

show all

Pages 285-305 | Published online: 07 Sep 2020

Cite this article
https://doi.org/10.1080/19312458.2020.1812555
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Baum, M. A., & Zhukov, Y. M. (2019). Media ownership and news coverage of international conflict. Political Communication, 36(1), 36–63. http://sci-hub.tw/10.1080/10584609.2018.1483606
Web of Science ®Google Scholar
Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/org/0.21105/joss.00774
Google Scholar
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84. http://sci-hub.tw/10.1145/2133806.2133826
Web of Science ®Google Scholar
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5(Dec), 135–146. http://sci-hub.tw/10.1162/tacl_a_00051
Google Scholar
Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4(1), 8–23. http://sci-hub.tw/10.1080/21670811.2015.1096598
Web of Science ®Google Scholar
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. Advances in Neural Information Processing Systems, Vancouver, Canada, 288–296. https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models
Google Scholar
Chmielewski, M., & Kucker, S. C. (2019). An MTurk crisis? Shifts in data quality and the impact on study results. Social Psychological and Personality Science, 1948550619875149. http://sci-hub.tw/10.1177/1948550619875149
Web of Science ®Google Scholar
Conneau, A., Lample, G., Ranzato, M., Denoyer, L., & Jégou, H. (2017). Word translation without parallel data. arXiv Preprint arXiv:1710.04087. https://arxiv.org/abs/1710.04087
Google Scholar
De Vries, E., Schoonvelde, M., & Schumacher, G. (2018). No longer lost in translation: Evidence that Google Translate works for comparative bag-of-words text applications. Political Analysis, 26(4), 417–430. http://sci-hub.tw/10.1017/pan.2018.26
Web of Science ®Google Scholar
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv Preprint arXiv:1810.04805. https://arxiv.org/abs/1810.04805
Google Scholar
Eshima, S., Imai, K., & Sasaki, T. (2020). Keyword assisted topic models. arXiv Preprint arXiv, 2004, 05964. https://arxiv.org/abs/2004.05964
Google Scholar
Firth, J. R. (1957). A synopsis of linguistic theory, 1930–1955. Studies in Linguistic Analysis. Longman.
Google Scholar
Fung, I. C.-H., Zeng, J., Chan, C.-H., Liang, H., Yin, J., Liu, Z., Tse, Z. T. H., & Fu, K.-W. (2018). Twitter and Middle East respiratory syndrome, South Korea, 2015: A multi-lingual study. Infection, Disease & Health, 23(1), 10–16. http://sci-hub.tw/10.1016/j.idh.2017.08.005
PubMed Web of Science ®Google Scholar
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. http://sci-hub.tw/10.1093/pan/mps028
Web of Science ®Google Scholar
Hatzivassiloglou, V., Klavans, J. L., & Eskin, E. (1999). Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning. In 1999 Joint SIGDAT conference on empirical methods in natural language processing and very large corpora, College Park, MD, USA. https://www.aclweb.org/anthology/W99-0625
Google Scholar
Jacobi, C., van Atteveldt, W., & Welbers, K. (2016). Quantitative analysis of large amounts of journalistic texts using topic modelling. Digital Journalism, 4(1), 89–106. http://sci-hub.tw/10.1080/21670811.2015.1093271
Web of Science ®Google Scholar
Joulin, A., Bojanowski, P., Mikolov, T., Jégou, H., & Grave, E. (2018). Loss in translation: Learning bilingual word mapping with a retrieval criterion. arXiv Preprint arXiv:1804.07745. https://arxiv.org/abs/1804.07745
Google Scholar
Katki, H. A., Li, Y., Edelstein, D. W., & Castle, P. E. (2012). Estimating the agreement and diagnostic accuracy of two diagnostic tests when one test is conducted on only a subsample of specimens. Statistics in Medicine, 31(5), 436–448. http://sci-hub.tw/10.1002/sim.4422
PubMed Web of Science ®Google Scholar
Koltsova, O., & Koltcov, S. (2013). Mapping the public agenda with topic modeling: The case of the Russian livejournal. Policy & Internet, 5(2), 207–227. http://sci-hub.tw/10.1002/1944-2866.POI331
Google Scholar
Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G., & Govaert, G. (2015). Rmixmod: The R package of the model-based unsupervised, supervised and semi-supervised classification mixmod library. https://hal.archives-ouvertes.fr/hal-00919486/document
Google Scholar
Lind, F., Eberl, J.-M., Heidenreich, T., & Boomgaarden, H. G. (2019b). When the journey is as important as the goal: A roadmap to multilingual dictionary construction. International Journal of Communication, 13(1), 21. https://ijoc.org/index.php/ijoc/article/view/10578
Google Scholar
Lind, F., Eisele, O., Heidenreich, T., Galyga, S., Eberl, J.-M., & Boomgaarden, H. G. (2019a). A bridge over the language gap—Employing topic modelling for text analyses across languages for country comparative research. Presented at the POLTEXT Conference, Tokyo.
Google Scholar
Livingstone, S. (2003). On the challenges of cross-national comparative media research. European Journal of Communication, 18(4), 477–500. http://sci-hub.tw/10.1177/0267323103184003
Web of Science ®Google Scholar
Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., & Tingley, D. (2015). Computer-assisted text analysis for comparative politics. Political Analysis, 23(2), 254–277. http://sci-hub.tw/10.1093/pan/mpu019
Web of Science ®Google Scholar
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Häussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2–3), 93–118. http://sci-hub.tw/10.1080/19312458.2018.1430754
Web of Science ®Google Scholar
Mikolov, T., Le, Q. V., & Sutskever, I. (2013a). Exploiting similarities among languages for machine translation. ArXiv Preprint ArXiv:1309.4168. http://arxiv.org/abs/1309.4168
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 3111–3119. https://arxiv.org/abs/1310.4546
Google Scholar
Mimno, D., Wallach, H. M., Naradowsky, J., Smith, D. A., & McCallum, A. (2009). Polylingual topic models. In Proceedings of the 2009 Conference on empirical methods in natural language processing: (Vol. 2, pp. 880–889). Association for Computational Linguistics.
Google Scholar
Pires, T., Schlinger, E., & Garrette, D. (2019). How multilingual is Multilingual BERT? arXiv Preprint arXiv:1906.01502. https://arxiv.org/pdf/1906.01502.pdf
Google Scholar
Proksch, S.-O., Lowe, W., Wäckerle, J., & Soroka, S. (2019). Multilingual sentiment analysis: A new approach to measuring conflict in legislative speeches. Legislative Studies Quarterly, 44(1), 97–131. http://sci-hub.tw/10.1111/lsq.12218
Web of Science ®Google Scholar
Pruss, D., Fujinuma, Y., Daughton, A. R., Paul, M. J., Arnot, B., Szafir, D. A., & Boyd-Graber, J. (2019). Zika discourse in the Americas: A multilingual topic analysis of Twitter. PloS One, 14(5), e0216922. https://doi.org/https://10.1371/journal.pone.0216922
PubMed Web of Science ®Google Scholar
Reber, U. (2019). Overcoming language barriers: Assessing the potential of machine translation and topic modeling for the comparative analysis of multilingual text corpora. Communication Methods and Measures, 13(2), 102–125. http://sci-hub.tw/10.1080/19312458.2018.1555798
Web of Science ®Google Scholar
Roberts, M. E., Stewart, B. M., & Tingley, D. (2014). stm: R package for structural topic models. Journal of Statistical Software, 10(2), 1–40. http://sci-hub.tw/10.18637/jss.v091.i02
Google Scholar
Rudkowsky, E., Haselmayer, M., Wastian, M., Jenny, M., Emrich, Š., & Sedlmair, M. (2018). More than bags of words: Sentiment analysis with word embeddings. Communication Methods and Measures, 12(2–3), 140–157. http://sci-hub.tw/10.1080/19312458.2018.1455817
Web of Science ®Google Scholar
Sandve, G. K., Nekrutenko, A., Taylor, J., & Hovig, E. (2013). Ten simple rules for reproducible computational research. PLoS Computational Biology, 9(10), e1003285. http://sci-hub.tw/10.1371/journal.pcbi.1003285
PubMed Web of Science ®Google Scholar
Shireman, E., Steinley, D., & Brusco, M. J. (2017). Examining the effect of initialization strategies on the performance of Gaussian mixture modeling. Behavior Research Methods, 49(1), 282–293. http://sci-hub.tw/10.3758/s13428-015-0697-6
PubMed Web of Science ®Google Scholar
Snow, R., O’Connor, B., Jurafsky, D., & Ng, A. Y. (2008). Cheap and fast—But is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing (pp. 254–263). Association for Computational Linguistics.
Google Scholar
Van Atteveldt, W., Althaus, S., & Wessler, H. (2020). The trouble with sharing your privates: Pursuing ethical open science and collaborative research across national jurisdictions using sensitive data. Political Communication, 1–7. http://sci-hub.tw/10.1080/10584609.2020.1744780
Web of Science ®Google Scholar
Van Atteveldt, W., Van, Strycharz, J., Trilling, D., & Welbers, K. (2019). Toward open computational communication science: A practical road map for reusable data and code. International Journal of Communication, 13(5), 20. https://ijoi.org/index.php/ijoc/article/view/10631
Google Scholar
Watanabe, K., & Zhou, Y. (2020). Theory-driven analysis of large corpora: Semisupervised topic classification of the UN speeches. Social Science Computer Review, 0894439320907027. http://sci-hub.tw/10.1177/0894439320907027
Web of Science ®Google Scholar
Wittgenstein, L. (1953). Philosophical investigations. John Wiley & Sons.
Google Scholar
Xie, P., & Xing, E. P. (2013). Integrating document clustering and topic modeling. ArXiv Preprint ArXiv:1309.6874. https://arxiv.org/abs/1309.6874
Google Scholar
Yan, X., Guo, J., Lan, Y., & Cheng, X. (2013). A biterm topic model for short texts. In Proceedings of the 22nd international conference on World Wide Web, Rio de Janerio, Brazil (pp. 1445–1456). Association for Computing Machinery.
Google Scholar
Zhang, D., Mei, Q., & Zhai, C. (2010, July). Cross-lingual latent topic extraction. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 1128–1137). Association for Computational Linguistics. http://sci-hub.tw/10.5555/1858681.1858796
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Reproducible Extraction of Cross-lingual Topics (rectr)

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Reproducible Extraction of Cross-lingual Topics (rectr)

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date