Abstract
The distinction between interrater (or interobserver, interjudge, interscorer) "agreement" and "reliability" is discussed. A total of 3 approaches or techniques for the estimation of interrater agreement and reliability are illustrated and compared, using data from a hypothetical study. The 3 approaches are (a) simple percentage of agreement and kappa, (b) simple correlational techniques, and (c) generalizability (G) theory techniques. In discussing the relative advantages and disadvantages of the various approaches, the G theory techniques are emphasized-because they are the most comprehensive and flexible, allowing the researcher to isolate multiple sources of measurement error in one study. Some recommendations regarding the "method of choice" for various situations are offered.