Abstract
Judicial performance evaluations (JPEs) are a critical part of selecting judges, especially in states using merit-based selection systems. This article shows empirical evidence that gender and race bias still exist in attorney surveys conducted in accordance with the ABA's Guidelines. This systematic bias is related to a more general problem with the design and implementation of JPE surveys, which results in predictable problems with the reliability and validity of the information obtained through these survey instruments. This analysis raises questions about the validity and reliability of the JPE. This is a particularly poor outcome, as it means that we are subjecting many judges to state-sponsored evaluations that are systematically biased against women and minorities.
Notes
A 1993 study of the results of the Colorado Judicial Performance Evaluation Commission's lawyer survey showed that male and female lawyers alike rated female judges consistently lower than male judges (Sterling Citation1993). Colorado has since adjusted its evaluation methods, but no rigorous follow-up studies have been conducted to confirm that the disparities have been resolved.
JPEs are one of the few instances where surveys are used for formal job performance evaluation. As such, the sampling error problems of survey research complicate the already difficult measurement error problems inherent in employee performance appraisal. It is beyond the scope of this article to address the survey-related problems with unrepresentative data. For a discussion of these issues, see Elek et al. (Citation2012) and Wood and Lazos (Citation2009).
Since the creation of the BARS procedure (Smith and Kendall Citation1963), a number of similar systems have been developed that share a number of the same basic features. For a brief description, see Prowse and Prowse (Citation2009). For the purposes of this article, the term BARS will be inclusive of these progeny.
The administrative performance scores are measured for only 87 judges, yielding only 311 observations; the Judging the Judges survey does not collect this information for Nevada Supreme Court Justices.
The precise wording and organization of the questions has changed slightly over time. The years column in Table 1 indicates which years the Judging the Judges survey used each of the questions. The Q (Question) column indicates which historical and question formulations make up the scores for each of the questions contained in the analysis. For example, the judge-level scores on Question 1 in the analysis are the average of two questions in the 1998–2000 data, but they are the results of a single question in the 2002–2008 data.
Principal factors method yields a single factor with an Eigenvalue of 10.570, N = 311, Likelihood Ratio test = 8137.79, p =.000. All twelve questions load onto this factor with factor loadings greater than 0.85.
Table 2 Inter-dimensional Correlation Matrix
Table 3 Intra-dimensional Correlation Matrices
Multicollinearity tests on the set of independent variables show no signs of problematic collinearity. No variance inflation factor is higher than 1.82, and no tolerance is lower than 0.62.