Exploring the Relationship Between Interrater Correlations and Validity of Peer Ratings: Human Performance: Vol 21, No 2

Views

CrossRef citations to date

Altmetric

Abstract

A field study was conducted to investigate the relationship between interrater correlations and validity estimates of peer ratings. Validity coefficients and interrater correlations were calculated for 281 work units in a large law enforcement organization in Israel. The main result was a weak positive linear relationship between these two variables. Furthermore, in some of the analyses conducted, a nonlinear quadratic component in the relationship between these measures was evident. Validity was low only when interrater correlation was very low (r = .4 and less). Above this level, validity was stable and almost did not change as interrater correlation increased. This finding, together with other studies (CitationBorman, 1975; CitationBuckner, 1959; CitationFreeberg, 1969; CitationWeekley & Gier, 1989), cast doubt on the assertion that interrater correlation in the field of performance rating is a proper measurement of reliability.

Notes

¹Actually, under such conditions, the empirically derived validity cannot be higher than the midpoint between the lower reliability and the higher reliability.

²We ran more simulations under a wide range of true validity and criterion reliability values and obtained the same function form in all of them. Certainly, when true validity and criterion reliability are very low, the function relating validity to reliability is quite flat. However, in most instances this is not the case. The value of 0.3 for true validity was chosen because the mean empirical validity value in the case of the composite criterion was 0.291 (see ). Considering that true validity should be equal to or higher than empirical validity, we assumed that the theoretical functions under this condition most closely represent the minimal expected function between validity and reliability in our study.

³It is important to differentiate between interrater agreement and interrater reliability, which are two different concepts. Interrater agreement refers to the degree to which the absolute magnitudes of the ratings given by each rater are the same for the given ratees. Interrater reliability refers to the degree to which raters provide similar rank order of the ratees (CitationTinsley & Weiss, 1975). Our study deals with interrater reliability.

⁴In most cases, the interrater reliability index used was interrater correlation.

⁵ CitationBuckner (1959) uses the term interrater agreement and not interrater reliability in his article. However, in the method section of his article (p. 61), he notes that interrater agreement estimates were made according to the basic equation for the coefficient of reliability. Thus, it seems that Buckner did not differentiate between agreement and reliability and that his interrater agreement measures are essentially interrater reliability measures.

⁶These criteria were necessary because not all members of the group participated in the evaluation process. The evaluation in a work team was held on a certain day at a certain time. Because of work constraints, not all the unit members were able to take part in the process (their peers were permitted to evaluate them even if they were not present). In addition, unit members were instructed not to evaluate individuals if they were insufficiently acquainted with them. Criteria measures were not available to all unit members for a variety of reasons related to organizational bureaucracy and rules. In some rare cases, raters did not differentiate at all among their ratees (they gave the same grade to all their coworkers). These raters were counted as team members that did not participate in the evaluation process.

⁷In small groups, a minimum of five members was required in addition to the 80% rule in all the criteria listed.

⁸To assess the internal consistency between the components of the composite criterion, we calculated several alpha coefficients. First, we checked the consistency for each criterion across years. The alpha coefficients found where 0.78 for supervisor evaluations, 0.57 for absenteeism, and 0.54 for discipline. This level of consistency is reasonable, taking into account the small number of years (2 or 3). Alpha coefficients among the three criteria were calculated within each year and across years. The correlations between the criteria were weak (0.03–0.14), yielding low consistency coefficients (about 0.24). Low correlations between nonjudgmental and judgmental measures are not rare. CitationHeneman (1986), for example, found a corrected correlation of only 0.27 between supervisory ratings and nonjudgmental measures. Despite this low level of consistency, we decided to use composite criterion in addition to the separate usage of its components. According to CitationSchmidt and Kaplan (1971), if all criterion elements are correlated to an underlying economic construct it is possible to weight them into a composite irrespective of their intercorrelations. In our study, all three criteria seem to correlate with what the organization conceive as work performance (that can be transformed to economical value). Police officers who show low level of absenteeism, evaluated highly by their supervisors, and who have no discipline problems are considered to be high performers, and vice versa.

⁹We decided to calculate work unit norms in the standardization process instead of the overall sample norms in order to control for possible differences between these units. Note that these groups were very different. Some were field units (patrol, investigation), while others were administrative and logistics units. The unit norms were simply the mean and standard deviation of the various criteria within each team.

¹⁰The analysis was conducted on the raw validity and reliability indexes, calculated for each work team. The major analyses were repeated with Fisher transformed Zs of these indexes. No meaningful differences were detected.

^† p < .10.

*p < .05.

**p < .01.

***p < .001.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Exploring the Relationship Between Interrater Correlations and Validity of Peer Ratings

Related Research Data

Information for

Open access

Opportunities

Help and information

Exploring the Relationship Between Interrater Correlations and Validity of Peer Ratings

Abstract

Notes

Reprints and Corporate Permissions

Academic Permissions

Related Research Data

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature