ABSTRACT
Undergraduate grade point average (GPA) is a commonly employed measure in educational research, serving as a criterion or as a predictor depending on the research question. Over the decades, researchers have used a variety of reliability coefficients to estimate the reliability of undergraduate GPA, which suggests that there has been no consensus on the most appropriate model. This paper reviews the assumptions of different reliability models and examines the effect of violating these assumptions on reliability estimates of GPA. Using longitudinal semester GPA data for 62,122 students from 26 four-year institutions, the reliability estimates for semester, annual, and fourth-year cumulative GPA ranged between .60–.65, .75–.79, and .89–.92, respectively. Depending on the measure, up to eight different reliability coefficients were estimated. In general, different estimates resulted in minor differences even when the assumptions of the underlying models are not met; however, larger differences were observed for the fourth-year cumulative GPA analyses.
Notes
1 Beatty et al. (Citation2015, p. 31) noted, “In a meta-analysis of the relationship between GPA and occupational performance (Roth et al., Citation1996), none of the 71 studies collected reported information on the reliability of GPA.”
2 Multi-factor congeneric models (Feldt & Brennan, Citation1989) will not be discussed in this paper. The use of structural equation models to estimate reliability is a growing area of research, but misapplication of the models is a concern (Haertel, Citation2006). IRT and G-theory approaches to estimating reliability are also not covered, as they have not used in the majority of the studies on the reliability of GPA.
3 For example, using the reliability estimate of .75 for first-year cumulative GPA, the increase from observed correlations of .20 and .50 are .0309 and .0774, respectively. However, the percent change from the observed to the corrected correlations is 15.47% regardless of which observed correlation is used.
4 The increases in the sizes of the differences between years 1 and 2 are due to the estimates in the first year being based on two-part coefficients whereas two-part and multi-part coefficients were used in the cumulative GPA analyses for years 2, 3, and 4.
5 Feldt and Charter (Citation2006) questioned whether coefficients should be averaged when different reliability estimates were used. They thought the coefficients should not be averaged, but acknowledged that “applied researchers do not work in an ideal world”, and that “Compromises are often necessary when analyzing real-world experiments and data published by others” (p. 225).
6 Mean reliability estimates for the three private institutions were higher (.05) than the estimates were for the 23 public institutions. The results for the public institutions were nearly identical to the overall results. This difference between the results for the public and private institutions should be interpreted with extreme caution as the difference may be due to second-order sampling error. Beatty et al. (Citation2015) found a difference of .03 in the mean ICCs at public and private institutions, but this difference disappeared when the ICCs were stepped up to estimate the reliability of cumulative GPA. Furthermore, they also found differences of only .01 in the mean ICCs and stepped up reliability estimates across institutional admission selectivity levels.