2,389
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Inter-annotator Agreement Using the Conversation Analysis Modelling Schema, for Dialogue

ORCID Icon, ORCID Icon & ORCID Icon

References

  • Albert, S., de Ruiter, L. E., & de Ruiter, J. (2015). CABNC: The Jeffersonian transcription of the spoken British national corpus. https://saulalbert.github.io/CABNC/
  • Allen, J., & Core, M. (1997). Draft of DAMSL: Dialog act markup in several layers (Tech.Rep.).
  • Artstein, R., & Poesio, M. (2005a). Bias decreases in proportion to the number of annotators. In Proceedings of the conference on formal grammar and mathematics of language (fg-mol) (CSLI Publications) (pp. 141–150). http://web.stanford.edu/group/cslipublications/cslipublications/FG/2005/artstein.pdf
  • Artstein, R., & Poesio, M. (2005b, September). Kappa 3 = Alpha (or Beta) (Tech. Rep. No. (University of Essex)). Vol. 1. http://www.cs.pitt.edu/~wiebe/courses/CS3730/Fall08/poesioTechReportKappaCubed.pdf%5Cnpapers2://publication/uuid/F37A7D18-90E8-453B-9415-D0A821BF589D
  • Artstein, R., & Poesio, M. (2008). Inter-Coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596. https://doi.org/10.1162/coli.07-034-R2
  • Artstein, R. (2018). Inter-annotator agreement. In N. Ide & J. Pustejovsky (Eds.), Handbook of linguistic annotation (pp. 297–313). Springer.
  • Asri, L. E., Schulz, H., Sharma, S., Zumer, J., Harris, J., Fine, E., … Suleman, K. (2017). Frames: A corpus for adding memory to Goal-Oriented dialogue systems. In Proceedings of the sigdial 2017 conference (pp. 207–219). Saarbrucken, Germany: Association for Computational Linguistics. http://www.aclweb.org/anthology/W17-5526
  • Aulamo, M., Creutz, M., & Sjoblom, E. (2019). Annotation of subtitle paraphrases using a new web tool. In Proceedings of 4th conference of the association digital humanities in the nordic countries CEUR-WS.org. CEUR-WS.org. http://urn.fi/urn:nbn:fi:
  • Austin, J. L. (1962). How to do things with words. Oxford University Press. http://pubman.mpdl.mpg.de/pubman/item/escidoc:2271128/component/escidoc:2271430/austin1962how-to-do-things-with-words.pdf
  • Banerjee, M., Capozzoli, M., McSweeney, L., & Sinha, D. (1999). Beyond kappa: A review of interrater agreement measures. Canadian Journal of Statistics, 27(1), 3–23. https://doi.org/10.2307/3315487
  • Bayerl, P. S., & Paul, K. I. (2011). What determines inter-coder agreement in manual annotations? A meta-analytic investigation. Computational Linguistics, 37(4), 699–725. https://doi.org/10.1162/COLI_a_00074
  • Bordes, A., Boureau, Y.-L., & Weston, J. (2017). Learning End-to-EndGoal-Oriented Dialog (ICLR 2017 (Association for Computational Linguistics)). https://arxiv.org/pdf/1605.07683.pdf.
  • Boxman-Shabtai, L. (2020). Meaning multiplicity across communication subfields: Bridging the gaps. Journal of Communication, 70 (3), 401–423. https://doi.org/10.1093/joc/jqaa008
  • Boyer, K. E., Ha, E. Y., Phillips, R., Wallis, M. D., Vouk, M. A., & Lester, J. (2009). Inferring tutorial dialogue structure with hidden Markov modeling. In Proceedings of the fourth workshop on innovative use of nlp for building educational applications - edappsnlp ’09 (Association for Computational Linguistics) (pp. 19–26). h ttps:// ww w.cs.rochester.edu/$\sim$tetreaul/ bea4/ Boyer-BEA4.pdfhttp://portal.acm.org/citation.cfm?doid=1609843.1609846
  • Boyer, K. E., Ha, E. Y., Phillips, R., Wallis, M. D., Vouk, M. A., & Lester, J. (2010). Dialogue act modeling in a complex task-oriented domain. In Proceedings of sigdial 2010: the 11th annual meeting of the special interest group in discourse and dialogue (Association for Computational Linguistics) (pp. 297–305).
  • British Standards Institution. (2012). ISO 24617-2: Language resource management - Semantic annotation framework (SemAF) Part 2: Dialogue acts. https://bsol-bsigroup-com
  • Bunt, H. (1978). Conversational principles in question-answer dialogues. Tubingen. pp. 119–142.
  • Bunt, H. (2006). Dimensions in dialogue act annotation. Proceeding of LREC 2006 (: European Language Resources Association (ELRA)).
  • Bunt, H. (2011). The semantics of dialogue acts. In International conference on computational semantics iwcs ’11 (pp. 1–13). Oxford, England: Association for Computational Linguistics. http://www.aclweb.org/anthology/W11-0101http://aclweb.org/anthology/W/W11/W11-0101.pdf
  • Bunt, H. (2017). Guidelines for using ISO standard 24617-2. (Tech. rep). Tilburg Center for Cognition and Communication. https://dialogbank.uvt.nl/wpcontent/uploads/tdb/2015/12/ISO24617-2_Annotation_Guidelines2017.pdf.
  • Bunt, H. (2000, January). Dialogue pragmatics and context specification. In H. Bunt & W. Black (Eds.), Abduction, belief and context in dialogue. Studies in computational pragmatics (pp. 81–149). John Benjamins. https://doi.org/10.1075/nlp.1.03bun.
  • Byrt, T., Bishop, J., & Carlin, J. B. (1993). Bias, prevalence and Kappa. Journal of Clinical Epidemiology, 46(5), 423–429. https://doi.org/10.1016/0895-4356(93)90018-V
  • Carletta, J. (1996). Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics, 22(2), 249–254 https://aclanthology.org/J96-2004/.
  • Chowdhury, S. A., Stepanov, E. A., & Riccardi, G. (2016). Transfer of corpus specific dialogue act annotation to ISO standard: Is it worth it? In The international conference on language resources and evaluation European Language Resources Association (ELRA) (Vol. 9, pp. 132–135). https://aclanthology.org/L16-1020/
  • Clift, R. (2016). Conversation analysis. Cambridge University Press.
  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104
  • Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220. https://doi.org/10.1037/h0026256
  • Collins, H., Leonard-Clarke, W., & O’Mahoney, H. (2019). ‘Um, er’: How meaning varies between speech and its typed transcript. Qualitative Research, 19 (6), 653–668. https://doi.org/10.1177/1468794118816615
  • Craggs, R., & Wood, M. M. (2005). Evaluating discourse and dialogue coding schemes. Computational Linguistics, 31(3), 289–295. https://doi.org/10.1162/089120105774321109
  • Cuayahuitl, H., Yu, S., Williamson, A., & Carse, J. (2016). Deep reinforcement learning for multi-domain dialogue systems. In Nips workshop on deep reinforcement learning (pp. 1–9). Barcelona, Spain. https://arxiv.org/pdf/1611.08675.pdf
  • Di Eugenio, B., & Glass, M. (2004). The Kappa statistic: A second look. Computational Linguistics, 30(1), 95–101. https://doi.org/10.1162/089120104773633402
  • Di Eugenio, B. (2000). On the usage of kappa to evaluate agreement on coding tasks. In 2nd international conference on language resources and evaluation, lrec 2000 (Barcelona, Spain: European Language Resources Association (ELRA)) (pp. 441–444).
  • Ekman, P., & Scherer, K. (1984). Structures of social action - Studies in conversation analysis (J. Atkinson & J. Heritage, Eds.). Cambridge University Press. http://ebooks.cambridge.org/ref/id/CBO9780511665868
  • Eric, M., & Manning, C. D. (2017). Key-Value retrieval networks for task-oriented dialogue. In Proceedings of the 18th annual sigdial meeting on discourse and dialogue (Saarbrucken, Germany: Association for Computational Linguistics) (pp. 37–49). https://nlp.stanford.edu/blog/a-new-multi-turn-multi-
  • Firdaus, M., Golchha, H., Ekbal, A., & Bhattacharyya, P. (2020). A deep multi-task model for dialogue act classification, intent detection and slot filling. Cognitive Computation (Springer Science,Business Media). https://doi.org/10.1007/s12559-020-09718-4
  • Ge, W., & Xu, B. (2015). Dialogue management based on multi-domain corpus. In Annualmeeting of the special interest group on discourse and dialogue (sigdial) (Prague, Czech: Republic Association for Computational Linguistics) (pp. 364–373). http://www.sigdial.org/workshops/conference16/proceedings/pdf/SIGDIAL48.pdf
  • Geertzen, J., & Bunt, H. (2010). Measuring annotator agreement in a complex hierarchical dialogue act annotation scheme. In Proceedings of the 7th sigdial workshop on discourse and dialogue (pp. 126–133). Sydney, Australia: Association for Computational Linguistics. http://ls0143.uvt.nl/dit/
  • Geertzen, J., Petukhova, V., & Bunt, H. (2008). Evaluating dialogue act tagging with naive and expert annotators. In Proceedings of the 6th international conference on language resources and evaluation, lrec 2008 (pp. 1076–1082). Marrakech, Morocco: European Language Resources Association (ELRA).
  • Geiß, S. (2021). Statistical power in content analysis designs: How effect size, sample size and coding accuracy jointly affect hypothesis testing – A monte carlo simulation approach. Computational Communication Research, 3 (1), 61–89. https://doi.org/10.5117/ccr2021.1.003.geis
  • Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. Academic Press. http://www.getcited.org/pub/102129430
  • Green, J., Franquiz, M., & Dixon, C. (1997). The myth of the objective transcript: Transcribing as a situated act. TESOL Quarterly, 31 (1), 172. https://doi.org/10.2307/3587984
  • Griol, D., Hurtado, L., Segarra, E., & Sanchis, E. (2008). A statistical approach to spoken dialog systems design and evaluation. Speech Communication, 50(8–9), 666–682. https://doi.org/10.1016/j.specom.2008.04.001
  • Grosz, B. J. (2018). Smart enough to talk with us? Foundations and challenges for dialogue capable ai systems. Computational Linguistics, 44(1), 1–15. https://doi.org/10.1162/COLI_a_00313
  • Hearst, M. A. (1997). TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23 (1), 33–64. http://dl.acm.org/citation.cfm?id=972687%5Cnhttp://dl.acm.org/citation.cfm?id=972684.972687
  • Hsu, L. M., & Field, R. (2003). Interrater agreement measures: Comments on Kappa n, Cohen’s Kappa, Scott’s π, and Aickin’s α Understanding Statistics . 2(3 p205–219 doi:10.1207/s15328031us0203_03).
  • Iseki, Y. (2019). Characteristics of everyday conversation derived from the analysis of dialog act annotation. In 2019 22nd conference of the oriental cocosda international committee for the co-ordination and standardisation of speech databases and assessment techniques (o-cocosda) (pp. 1–6). Cebu, Philippines: IEEE.
  • Jurafsky, D., Shriberg, E., & Biasca, D. (1997). Switchboard SWBD-DAMSL ShallowDiscourse-function annotation coders manual (Tech. Rep. (CU Boulder)). ftp://ftp.dcs.shef.ac.Uk/share/nlp/amities/bib/ics-tr-97-02.pdf
  • Kazai, G. (2011). In search of quality in crowdsourcing for search engine evaluation. In Proceed- ings ofthe 33rd european conference on information retrieval (ecir) (Vol. 6611 Berlin, Heidelberg: LNCS, pp. 165–176). https://www.mturk.com/
  • Keizer, S., & Rieser, V. (2017). Towards learning transferable conversational skills using multi-dimensional dialogue modelling. In Semdial 2017. Saarbru¨cken, Germany (SEMDIAL).
  • Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2nd ed.). Sage Publications.
  • Kumar, V., Sridhar, R., Narayanan, S., & Bangalore, S. (2008). Enriching spoken language translation with dialog acts. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Short Papers - HLT ’08(June) (Columbus, Ohio: Association for Computational Linguistics), 225. http://www.aclweb.org/anthology/P08-2057http://portal.acm.org/citation.cfm?doid=1557690.1557755
  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310
  • Li, X., Chen, Y.-N., Li, L., Gao, J., & Celikyilmaz, A. (2017). End-to-End Task-Completion neural dialogue systems. In Proceedings of the the 8th international joint conference on natural language processing (pp. 733–743). Taipei, Taiwan: AFNLP. http://aclweb.org/anthology/I17-1074http://arxiv.org/abs/1703.01008
  • Liddicoat, A. J. (2007). An introduction to conversation analysis (pp. 319). Continuum.
  • Macagno, F., & Bigi, S. (2018). Types of dialogue and pragmatic ambiguity Oswald, Steve and Herman, Thierry and Jacquin, Jerome. In Argumentation and language-linguistic, cognitive and discursive explorations (Vol. 32, pp. 191–218). Springer. isbn: 9783319739724. https://doi.org/10.1007/978-3-319-73972-4_9
  • Mezza, S., Cervone, A., Tortoreto, G., Stepanov, E. A., & Riccardi, G. (2018). ISO-Standard domain-independent dialogue act tagging for conversational agents. In Coling 2018 (pp. 3539–3551). Santa Fe, New Mexico (Association for Computational Linguistics). http://arxiv.org/abs/1806.04327https://github.com/
  • Norrick, N. (2004). Saarbrucken corpus of spoken English (SCoSE). https://ca.talkbank.org/access/SCoSE.html
  • Nowak, S., & Ru¨ger, S. (2010). How reliable are annotations via crowdsourcing? - A study about inter-annotator agreement for multi-label image annotation. In Mir ’10 proceedings of the international conference on multimedia information retrieval (Philadelphia, Pennsylvania: Association for Computing Machinery) (p. 557). https://dl.acm.org/citation.cfm?id=1743478
  • Oyama, S., Baba, Y., Sakurai, Y., & Kashima, H. (2013). Accurate integration of crowdsourced labels using workers’ self-reported confidence scores. In Ijcai international joint conference on artificial intelligence (Beijing, China: AAAI Press) (pp. 2554–2560).
  • Poesio, M., & Vieira, R. (1998). A corpus-based investigation of definite description use. Computational Linguistics, 24(2 183–216 doi: https://aclanthology.org/J98-2001/).
  • Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50 (1), 696–735. http://www.jstor.org/stable/412243http://about.jstor.org/terms
  • Schegloff, E. A. (2007). Sequence organization in interaction: A primer in conversation analysis I. Cambridge University Press.
  • Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. The Public Opinion Quarterly, 19 (3), 321–325. https://www.jstor.org/stable/2746450
  • Searle, J. (1969). Speech acts: An essay in the philosophy of language. Cambridge University Press.
  • Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H., & Hayward, C. S. U. (2004). The ICSI meeting recorder dialog act (MRDA) corpus. In Sigdial 2004 (Berkeley CA: International Computer Science Inst) (pp. 97–100). https://aclanthology.info/pdf/W/W04/W04-2319.pdfhttp://www.aclweb.org/anthology/W04-2319
  • Sidnell, J. (2010). Conversation analysis - An introduction. Whiley-Blackwell. http://linguistics.oxfordre.com/view/10.1093/acrefore/9780199384655.001.0001/acrefore-9780199384655-e-40
  • Snow, R., Connor, B. O., Jurafsky, D., Ng, A. Y., Labs, D., & St, C. (2008). Cheap and fast - But is it good ? Evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 conference on empirical methods in natural language processing (pp. 254–263). Honolulu: Association for Computational Linguistics. http://blog.doloreslabs.com/?p=109
  • Weston, J., Bordes, A., Chopra, S., Rush, A. M., van Merrienboer, B., Joulin, A., & Mikolov, T. (2015). Towards AI-Complete question answering: A set of prerequisite toy tasks. arXiv. http://allenai.org/aristo.htmlhttp://arxiv.org/abs/1502.05698 ICLR
  • Wiebe, J. M., Bruce, R. F., & O’Hara, T. P. (1999). “Development and use of a gold standard data set for subjectivity classifications”. In: ACL ‘99: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics College Park, Maryland. ACM, pp. 246–253. https://doi.org/10.3115/1034678.1034721.
  • Williams, J. D., Raux, A., & Henderson, M. (2016). The dialog state tracking challenge series: A review. Dialogue and Discourse, 7 (3), 4–33. https://pdfs.semanticscholar.org/4ba3/39bd571585fadb1fb1d14ef902b6784f574f.pdf
  • Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103 (3), 374–378. https://doi.org/10.1037/0033-2909.103.3.374