Abstract
Background: Generalizability theory (G theory) is a statistical method to analyze the results of psychometric tests, such as tests of performance like the Objective Structured Clinical Examination, written or computer-based knowledge tests, rating scales, or self-assessment and personality tests. It is a generalization of classical reliability theory, which examines the relative contribution of the primary variable of interest, the performance of subjects, compared to error variance. In G theory, various sources of error contributing to the inaccuracy of measurement are explored. G theory is a valuable tool in judging the methodological quality of an assessment method and improving its precision.
Aim: Starting from basic statistical principles, we gradually develop and explain the method. We introduce tools to perform generalizability analysis, and illustrate the use of generalizability analysis with a series of common, practical examples in educational practice.
Conclusion: We realize that statistics and mathematics can be either boring or fearsome to many physicians and educators, yet we believe that some foundations are necessary for a better understanding of generalizability analysis. Consequently, we have tried, wherever possible, to keep the use of equations to a minimum and to use a conversational and slightly “off-serious” style.
Notes
1. The authors, both amateur cabinet makers, adhere to the maxim: measure twice, cut once.
2. Cardinet (Citation1975), Tourneur and Allal were the first to point out that the “object of measurement” may change, and can be viewed as one more source of variance.
3. We call this a “three facet” design, referring to the number of facets of generalization.
4. This rule indicates that the definition of a stratification facet is that the object of measurement (Person) is nested in it.
5. The term “stratification” is consistent with the terminology of Brennan, Citation2001, Section 5.2.