ABSTRACT
Even when raters demonstrate agreement in the use of a measure, limited score variability or violation of often-ignored statistical assumptions can result in lower reliability estimates than intuitively expected. This article uses data drawn from two randomized controlled trials of schema therapy and cognitive behavioral therapy for the treatment of major depression and binge eating disorder or bulimia nervosa (N = 212) to illustrate the limits of the intraclass coefficient. Randomly selected therapy sessions were rated for therapeutic alliance quality by independent observers using the well-validated Vanderbilt Psychotherapy Process Scale and Vanderbilt Therapeutic Alliance Scale. Scores on subscales related to therapist behavior were restricted, indicating consistent alliance-supportive actions. Inter-rater reliability estimates were low despite high agreement between raters, however. The use of Bland–Altman plots to visualize agreement and data spread is suggested as a useful tool for researchers, consistent with the ideal of exploring reliability from a number of perspectives.
Acknowledgments
The authors would like to acknowledge study coordinators and data managers Helen Kleindienst, Caroline Bray, Sarah Rowe, Andrea Bartram and Julia Martin, and Dr James McKay for assistance with code development.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability statement
Data have not been made publicly available for ethical and privacy reasons.