498
Views
9
CrossRef citations to date
0
Altmetric
Articles

Inter-rater reliability of query/probe-based techniques for measuring situation awareness

, &
Pages 959-972 | Received 09 Jul 2013, Accepted 21 Mar 2014, Published online: 07 May 2014
 

Abstract

Query- or probe-based situation awareness (SA) measures sometimes rely on process experts to evaluate operator actions and system states when used in representative settings. This introduces variability of human judgement into the measurements that require inter-rater reliability assessment. However, the literature neglects inter-rater reliability of query/probe-based SA measures. We recruited process experts to provide reference keys to SA queries in trials of a full-scope nuclear power plant simulator experiment to investigate the inter-rater reliability of a query-based SA measure. The query-based SA measure demonstrated only ‘moderate’ inter-rater reliability even though the queries were seemingly direct. The level of agreement was significantly different across pairs of experts who had different levels of exposure to the experiment. The results caution that inter-rater reliability of query/probe-based techniques for measuring SA cannot be assumed in representative settings. Knowledge about the experiment as well as the domain is critical to forming reliable expert judgements.

Abstract

Practitioner Summary: When the responses of domain experts are treated as the correct answers to the queries or probes of SA measures used in representative or industrial settings, practitioners should take caution in assuming (or otherwise assess) inter-rater reliability of the situation awareness measures.

Acknowledgements

Thanks to Professor Neville Stanton, Justin Hollands, Mark Chignell and Birsen Donmez for providing valuable feedback on this work. We thank Andreas Bye, of the Halden Reactor Project, for his effort in making this study possible.

Notes

1. Indices based on judges assigning similar rank ordering for the targets are referred to as inter-rater reliability, whereas indices based on judges assigning identical rating level of the targets are referred to as inter-rater agreement. The reliability statistics are typically used for measurements in interval and ratio scales, whereas agreement statistics are typically used for measurements in nominal scales. Common usage often does not distinguish between inter-rater reliability and agreement, but it is important to identify the type of measurement scales to select the appropriate statistics, which assume certain scale properties.

2. Translated ‘Measuring Situation Awareness of Area Controllers within the Context of Automation’ from German.

3. Empirical evidence is also limited for SA measures adopting techniques other than the query/probe-based technique. The authors are aware of only three other empirical studies that include inter-rater reliability indices – each for a different rating scale (Patrick et al. Citation2006; Waag and Houck Citation1994; Vidulich and Hughes Citation1991).

Additional information

Funding

This research was supported through a grant from the Natural Science and Engineering Research Council of Canada and internal funding of the OECD Halden Reactor Project. We are indebted to Maren H. Rø-Eitrheim for supporting the recruitment of the process expert and gallery set-up for data collection.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.