Search in:

Advanced search

Communication Methods and Measures Volume 16, 2022 - Issue 3

Submit an article Journal homepage

Open access

2,389

Views

CrossRef citations to date

Altmetric

Research Article

Inter-annotator Agreement Using the Conversation Analysis Modelling Schema, for Dialogue

Nathan DuranDepartment of Computer Science and Creative Technologies, University of the West of England, Bristol, UKCorrespondence[email protected]

https://orcid.org/0000-0001-6084-4406 View further author information

Steve BattleDepartment of Computer Science and Creative Technologies, University of the West of England, Bristol, UK

https://orcid.org/0000-0002-7154-7869 View further author information

Jim SmithDepartment of Computer Science and Creative Technologies, University of the West of England, Bristol, UK

https://orcid.org/0000-0001-7908-1859 View further author information

Pages 182-214 | Published online: 17 Jan 2022

Cite this article
https://doi.org/10.1080/19312458.2021.2020229
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Albert, S., de Ruiter, L. E., & de Ruiter, J. (2015). CABNC: The Jeffersonian transcription of the spoken British national corpus. https://saulalbert.github.io/CABNC/
Google Scholar
Allen, J., & Core, M. (1997). Draft of DAMSL: Dialog act markup in several layers (Tech.Rep.).
Google Scholar
Artstein, R., & Poesio, M. (2005a). Bias decreases in proportion to the number of annotators. In Proceedings of the conference on formal grammar and mathematics of language (fg-mol) (CSLI Publications) (pp. 141–150). http://web.stanford.edu/group/cslipublications/cslipublications/FG/2005/artstein.pdf
Google Scholar
Artstein, R., & Poesio, M. (2005b, September). Kappa 3 = Alpha (or Beta) (Tech. Rep. No. (University of Essex)). Vol. 1. http://www.cs.pitt.edu/~wiebe/courses/CS3730/Fall08/poesioTechReportKappaCubed.pdf%5Cnpapers2://publication/uuid/F37A7D18-90E8-453B-9415-D0A821BF589D
Google Scholar
Artstein, R., & Poesio, M. (2008). Inter-Coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596. https://doi.org/10.1162/coli.07-034-R2
Web of Science ®Google Scholar
Artstein, R. (2018). Inter-annotator agreement. In N. Ide & J. Pustejovsky (Eds.), Handbook of linguistic annotation (pp. 297–313). Springer.
Google Scholar
Asri, L. E., Schulz, H., Sharma, S., Zumer, J., Harris, J., Fine, E., … Suleman, K. (2017). Frames: A corpus for adding memory to Goal-Oriented dialogue systems. In Proceedings of the sigdial 2017 conference (pp. 207–219). Saarbrucken, Germany: Association for Computational Linguistics. http://www.aclweb.org/anthology/W17-5526
Google Scholar
Aulamo, M., Creutz, M., & Sjoblom, E. (2019). Annotation of subtitle paraphrases using a new web tool. In Proceedings of 4th conference of the association digital humanities in the nordic countries CEUR-WS.org. CEUR-WS.org. http://urn.fi/urn:nbn:fi:
Google Scholar
Austin, J. L. (1962). How to do things with words. Oxford University Press. http://pubman.mpdl.mpg.de/pubman/item/escidoc:2271128/component/escidoc:2271430/austin1962how-to-do-things-with-words.pdf
Google Scholar
Banerjee, M., Capozzoli, M., McSweeney, L., & Sinha, D. (1999). Beyond kappa: A review of interrater agreement measures. Canadian Journal of Statistics, 27(1), 3–23. https://doi.org/10.2307/3315487
Web of Science ®Google Scholar
Bayerl, P. S., & Paul, K. I. (2011). What determines inter-coder agreement in manual annotations? A meta-analytic investigation. Computational Linguistics, 37(4), 699–725. https://doi.org/10.1162/COLI_a_00074
Web of Science ®Google Scholar
Bordes, A., Boureau, Y.-L., & Weston, J. (2017). Learning End-to-EndGoal-Oriented Dialog (ICLR 2017 (Association for Computational Linguistics)). https://arxiv.org/pdf/1605.07683.pdf.
Google Scholar
Boxman-Shabtai, L. (2020). Meaning multiplicity across communication subfields: Bridging the gaps. Journal of Communication, 70 (3), 401–423. https://doi.org/10.1093/joc/jqaa008
Web of Science ®Google Scholar
Boyer, K. E., Ha, E. Y., Phillips, R., Wallis, M. D., Vouk, M. A., & Lester, J. (2009). Inferring tutorial dialogue structure with hidden Markov modeling. In Proceedings of the fourth workshop on innovative use of nlp for building educational applications - edappsnlp ’09 (Association for Computational Linguistics) (pp. 19–26). h ttps:// ww w.cs.rochester.edu/$\sim$tetreaul/ bea4/ Boyer-BEA4.pdfhttp://portal.acm.org/citation.cfm?doid=1609843.1609846
Google Scholar
Boyer, K. E., Ha, E. Y., Phillips, R., Wallis, M. D., Vouk, M. A., & Lester, J. (2010). Dialogue act modeling in a complex task-oriented domain. In Proceedings of sigdial 2010: the 11th annual meeting of the special interest group in discourse and dialogue (Association for Computational Linguistics) (pp. 297–305).
Google Scholar
British Standards Institution. (2012). ISO 24617-2: Language resource management - Semantic annotation framework (SemAF) Part 2: Dialogue acts. https://bsol-bsigroup-com
Google Scholar
Bunt, H. (1978). Conversational principles in question-answer dialogues. Tubingen. pp. 119–142.
Google Scholar
Bunt, H. (2006). Dimensions in dialogue act annotation. Proceeding of LREC 2006 (: European Language Resources Association (ELRA)).
Google Scholar
Bunt, H. (2011). The semantics of dialogue acts. In International conference on computational semantics iwcs ’11 (pp. 1–13). Oxford, England: Association for Computational Linguistics. http://www.aclweb.org/anthology/W11-0101http://aclweb.org/anthology/W/W11/W11-0101.pdf
Google Scholar
Bunt, H. (2017). Guidelines for using ISO standard 24617-2. (Tech. rep). Tilburg Center for Cognition and Communication. https://dialogbank.uvt.nl/wpcontent/uploads/tdb/2015/12/ISO24617-2_Annotation_Guidelines2017.pdf.
Google Scholar
Bunt, H. (2000, January). Dialogue pragmatics and context specification. In H. Bunt & W. Black (Eds.), Abduction, belief and context in dialogue. Studies in computational pragmatics (pp. 81–149). John Benjamins. https://doi.org/10.1075/nlp.1.03bun.
Google Scholar
Byrt, T., Bishop, J., & Carlin, J. B. (1993). Bias, prevalence and Kappa. Journal of Clinical Epidemiology, 46(5), 423–429. https://doi.org/10.1016/0895-4356(93)90018-V
PubMed Web of Science ®Google Scholar
Carletta, J. (1996). Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics, 22(2), 249–254 https://aclanthology.org/J96-2004/.
Web of Science ®Google Scholar
Chowdhury, S. A., Stepanov, E. A., & Riccardi, G. (2016). Transfer of corpus specific dialogue act annotation to ISO standard: Is it worth it? In The international conference on language resources and evaluation European Language Resources Association (ELRA) (Vol. 9, pp. 132–135). https://aclanthology.org/L16-1020/
Google Scholar
Clift, R. (2016). Conversation analysis. Cambridge University Press.
Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104
Web of Science ®Google Scholar
Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220. https://doi.org/10.1037/h0026256
PubMed Web of Science ®Google Scholar
Collins, H., Leonard-Clarke, W., & O’Mahoney, H. (2019). ‘Um, er’: How meaning varies between speech and its typed transcript. Qualitative Research, 19 (6), 653–668. https://doi.org/10.1177/1468794118816615
Web of Science ®Google Scholar
Craggs, R., & Wood, M. M. (2005). Evaluating discourse and dialogue coding schemes. Computational Linguistics, 31(3), 289–295. https://doi.org/10.1162/089120105774321109
Web of Science ®Google Scholar
Cuayahuitl, H., Yu, S., Williamson, A., & Carse, J. (2016). Deep reinforcement learning for multi-domain dialogue systems. In Nips workshop on deep reinforcement learning (pp. 1–9). Barcelona, Spain. https://arxiv.org/pdf/1611.08675.pdf
Google Scholar
Di Eugenio, B., & Glass, M. (2004). The Kappa statistic: A second look. Computational Linguistics, 30(1), 95–101. https://doi.org/10.1162/089120104773633402
Web of Science ®Google Scholar
Di Eugenio, B. (2000). On the usage of kappa to evaluate agreement on coding tasks. In 2nd international conference on language resources and evaluation, lrec 2000 (Barcelona, Spain: European Language Resources Association (ELRA)) (pp. 441–444).
Google Scholar
Ekman, P., & Scherer, K. (1984). Structures of social action - Studies in conversation analysis (J. Atkinson & J. Heritage, Eds.). Cambridge University Press. http://ebooks.cambridge.org/ref/id/CBO9780511665868
Google Scholar
Eric, M., & Manning, C. D. (2017). Key-Value retrieval networks for task-oriented dialogue. In Proceedings of the 18th annual sigdial meeting on discourse and dialogue (Saarbrucken, Germany: Association for Computational Linguistics) (pp. 37–49). https://nlp.stanford.edu/blog/a-new-multi-turn-multi-
Google Scholar
Firdaus, M., Golchha, H., Ekbal, A., & Bhattacharyya, P. (2020). A deep multi-task model for dialogue act classification, intent detection and slot filling. Cognitive Computation (Springer Science,Business Media). https://doi.org/10.1007/s12559-020-09718-4
Google Scholar
Ge, W., & Xu, B. (2015). Dialogue management based on multi-domain corpus. In Annualmeeting of the special interest group on discourse and dialogue (sigdial) (Prague, Czech: Republic Association for Computational Linguistics) (pp. 364–373). http://www.sigdial.org/workshops/conference16/proceedings/pdf/SIGDIAL48.pdf
Google Scholar
Geertzen, J., & Bunt, H. (2010). Measuring annotator agreement in a complex hierarchical dialogue act annotation scheme. In Proceedings of the 7th sigdial workshop on discourse and dialogue (pp. 126–133). Sydney, Australia: Association for Computational Linguistics. http://ls0143.uvt.nl/dit/
Google Scholar
Geertzen, J., Petukhova, V., & Bunt, H. (2008). Evaluating dialogue act tagging with naive and expert annotators. In Proceedings of the 6th international conference on language resources and evaluation, lrec 2008 (pp. 1076–1082). Marrakech, Morocco: European Language Resources Association (ELRA).
Google Scholar
Geiß, S. (2021). Statistical power in content analysis designs: How effect size, sample size and coding accuracy jointly affect hypothesis testing – A monte carlo simulation approach. Computational Communication Research, 3 (1), 61–89. https://doi.org/10.5117/ccr2021.1.003.geis
Google Scholar
Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. Academic Press. http://www.getcited.org/pub/102129430
Google Scholar
Green, J., Franquiz, M., & Dixon, C. (1997). The myth of the objective transcript: Transcribing as a situated act. TESOL Quarterly, 31 (1), 172. https://doi.org/10.2307/3587984
Web of Science ®Google Scholar
Griol, D., Hurtado, L., Segarra, E., & Sanchis, E. (2008). A statistical approach to spoken dialog systems design and evaluation. Speech Communication, 50(8–9), 666–682. https://doi.org/10.1016/j.specom.2008.04.001
Web of Science ®Google Scholar
Grosz, B. J. (2018). Smart enough to talk with us? Foundations and challenges for dialogue capable ai systems. Computational Linguistics, 44(1), 1–15. https://doi.org/10.1162/COLI_a_00313
Web of Science ®Google Scholar
Hearst, M. A. (1997). TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23 (1), 33–64. http://dl.acm.org/citation.cfm?id=972687%5Cnhttp://dl.acm.org/citation.cfm?id=972684.972687
Web of Science ®Google Scholar
Hsu, L. M., & Field, R. (2003). Interrater agreement measures: Comments on Kappa n, Cohen’s Kappa, Scott’s π, and Aickin’s α Understanding Statistics . 2(3 p205–219 doi:10.1207/s15328031us0203_03).
Google Scholar
Iseki, Y. (2019). Characteristics of everyday conversation derived from the analysis of dialog act annotation. In 2019 22nd conference of the oriental cocosda international committee for the co-ordination and standardisation of speech databases and assessment techniques (o-cocosda) (pp. 1–6). Cebu, Philippines: IEEE.
Google Scholar
Jurafsky, D., Shriberg, E., & Biasca, D. (1997). Switchboard SWBD-DAMSL ShallowDiscourse-function annotation coders manual (Tech. Rep. (CU Boulder)). ftp://ftp.dcs.shef.ac.Uk/share/nlp/amities/bib/ics-tr-97-02.pdf
Google Scholar
Kazai, G. (2011). In search of quality in crowdsourcing for search engine evaluation. In Proceed- ings ofthe 33rd european conference on information retrieval (ecir) (Vol. 6611 Berlin, Heidelberg: LNCS, pp. 165–176). https://www.mturk.com/
Google Scholar
Keizer, S., & Rieser, V. (2017). Towards learning transferable conversational skills using multi-dimensional dialogue modelling. In Semdial 2017. Saarbru¨cken, Germany (SEMDIAL).
Google Scholar
Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2nd ed.). Sage Publications.
Google Scholar
Kumar, V., Sridhar, R., Narayanan, S., & Bangalore, S. (2008). Enriching spoken language translation with dialog acts. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Short Papers - HLT ’08(June) (Columbus, Ohio: Association for Computational Linguistics), 225. http://www.aclweb.org/anthology/P08-2057http://portal.acm.org/citation.cfm?doid=1557690.1557755
Google Scholar
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310
PubMed Web of Science ®Google Scholar
Li, X., Chen, Y.-N., Li, L., Gao, J., & Celikyilmaz, A. (2017). End-to-End Task-Completion neural dialogue systems. In Proceedings of the the 8th international joint conference on natural language processing (pp. 733–743). Taipei, Taiwan: AFNLP. http://aclweb.org/anthology/I17-1074http://arxiv.org/abs/1703.01008
Google Scholar
Liddicoat, A. J. (2007). An introduction to conversation analysis (pp. 319). Continuum.
Google Scholar
Macagno, F., & Bigi, S. (2018). Types of dialogue and pragmatic ambiguity Oswald, Steve and Herman, Thierry and Jacquin, Jerome. In Argumentation and language-linguistic, cognitive and discursive explorations (Vol. 32, pp. 191–218). Springer. isbn: 9783319739724. https://doi.org/10.1007/978-3-319-73972-4_9
Google Scholar
Mezza, S., Cervone, A., Tortoreto, G., Stepanov, E. A., & Riccardi, G. (2018). ISO-Standard domain-independent dialogue act tagging for conversational agents. In Coling 2018 (pp. 3539–3551). Santa Fe, New Mexico (Association for Computational Linguistics). http://arxiv.org/abs/1806.04327https://github.com/
Google Scholar
Norrick, N. (2004). Saarbrucken corpus of spoken English (SCoSE). https://ca.talkbank.org/access/SCoSE.html
Google Scholar
Nowak, S., & Ru¨ger, S. (2010). How reliable are annotations via crowdsourcing? - A study about inter-annotator agreement for multi-label image annotation. In Mir ’10 proceedings of the international conference on multimedia information retrieval (Philadelphia, Pennsylvania: Association for Computing Machinery) (p. 557). https://dl.acm.org/citation.cfm?id=1743478
Google Scholar
Oyama, S., Baba, Y., Sakurai, Y., & Kashima, H. (2013). Accurate integration of crowdsourced labels using workers’ self-reported confidence scores. In Ijcai international joint conference on artificial intelligence (Beijing, China: AAAI Press) (pp. 2554–2560).
Google Scholar
Poesio, M., & Vieira, R. (1998). A corpus-based investigation of definite description use. Computational Linguistics, 24(2 183–216 doi: https://aclanthology.org/J98-2001/).
Web of Science ®Google Scholar
Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50 (1), 696–735. http://www.jstor.org/stable/412243http://about.jstor.org/terms
Google Scholar
Schegloff, E. A. (2007). Sequence organization in interaction: A primer in conversation analysis I. Cambridge University Press.
Google Scholar
Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. The Public Opinion Quarterly, 19 (3), 321–325. https://www.jstor.org/stable/2746450
Web of Science ®Google Scholar
Searle, J. (1969). Speech acts: An essay in the philosophy of language. Cambridge University Press.
Google Scholar
Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H., & Hayward, C. S. U. (2004). The ICSI meeting recorder dialog act (MRDA) corpus. In Sigdial 2004 (Berkeley CA: International Computer Science Inst) (pp. 97–100). https://aclanthology.info/pdf/W/W04/W04-2319.pdfhttp://www.aclweb.org/anthology/W04-2319
Google Scholar
Sidnell, J. (2010). Conversation analysis - An introduction. Whiley-Blackwell. http://linguistics.oxfordre.com/view/10.1093/acrefore/9780199384655.001.0001/acrefore-9780199384655-e-40
Google Scholar
Snow, R., Connor, B. O., Jurafsky, D., Ng, A. Y., Labs, D., & St, C. (2008). Cheap and fast - But is it good ? Evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 conference on empirical methods in natural language processing (pp. 254–263). Honolulu: Association for Computational Linguistics. http://blog.doloreslabs.com/?p=109
Google Scholar
Weston, J., Bordes, A., Chopra, S., Rush, A. M., van Merrienboer, B., Joulin, A., & Mikolov, T. (2015). Towards AI-Complete question answering: A set of prerequisite toy tasks. arXiv. http://allenai.org/aristo.htmlhttp://arxiv.org/abs/1502.05698 ICLR
Google Scholar
Wiebe, J. M., Bruce, R. F., & O’Hara, T. P. (1999). “Development and use of a gold standard data set for subjectivity classifications”. In: ACL ‘99: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics College Park, Maryland. ACM, pp. 246–253. https://doi.org/10.3115/1034678.1034721.
Google Scholar
Williams, J. D., Raux, A., & Henderson, M. (2016). The dialog state tracking challenge series: A review. Dialogue and Discourse, 7 (3), 4–33. https://pdfs.semanticscholar.org/4ba3/39bd571585fadb1fb1d14ef902b6784f574f.pdf
Google Scholar
Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103 (3), 374–378. https://doi.org/10.1037/0033-2909.103.3.374
PubMed Web of Science ®Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Inter-annotator Agreement Using the Conversation Analysis Modelling Schema, for Dialogue

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Inter-annotator Agreement Using the Conversation Analysis Modelling Schema, for Dialogue

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date