2,232
Views
1
CrossRef citations to date
0
Altmetric
Articles

Statistical Procedures for Assessing the Need for an Affirmative Action Plan: A Reanalysis of Shea v. Kerry

, &
Pages 1-8 | Received 13 Aug 2018, Accepted 08 Nov 2019, Published online: 18 Dec 2019
 

Abstract

In the 1980s, reports from Congress and the Government Accountability Office (GAO) presented statistical evidence showing that employees in the Foreign Service were overwhelmingly White male, especially in the higher positions. To remedy this historical discrimination, the State Department instituted an affirmative action plan during 1990–1992 that allowed females and race-ethnic minorities to apply directly for mid-level positions. A White male employee claimed that he had been disadvantaged by the plan. The appellate court unanimously held that the manifest statistical imbalance supported the Department’s instituting the plan. One judge identified two statistical issues in the analysis of the data that neither party brought up. This article provides an empirical guideline for sample size and a one-sided Hotelling’s T2 test to answer these problems. First, an approximate rule is developed for the minimum number of expected minority appointments needed for a meaningful statistical analysis of under-representation. To avoid the multiple comparison issue when several protected groups are involved, a modification of Hotelling’s T2 test is developed for testing the null hypothesis of fair representation against a one-sided alternative of under-representation in at least one minority group. The test yields p-values less than 1 in 10,000 indicating that minorities were substantially under-represented. Excluding secretarial and clerical jobs led to even larger disparities.

Supplemental materials for this article are available online.

Supplementary Materials

The supplementary material contains the following: (1) Legal background on Shea v. Kerry; (2) SAS codes to calculate the test statistic and its asymptotic distribution under the null hypothesis (3) Gender-racial compositions of generalists, specialists and civil positions in the State Department in 1989 and 1990.

Notes

1 United Steelworkers of America v. Weber, 443 U.S. 193 (1979). The court extended Weber to cover gender-based AAPs in Johnson v. Transportation Agency, Santa Clara County, 480 U.S. 616 (1987).

2 The Johnson decision, Ibid. cites Hazelwood School District v. United States, 433 U. S. 299 (1977) (must compare percentage of Blacks in employer’s work force with percentage of qualified Black teachers in the area to determine whether they are underrepresented in teaching positions).

3 In Kohlbek et al. v. City of Omaha, 447 F.3d 552 (8th Cir. 2006) the court noted that a disparity between observed and expected hiring levels does not necessarily demonstrate possible discrimination, rather the numbers must be statistically significantly different.

4 The appropriate effect size measure and corresponding thresholds should be set by the legal system and may vary with the type of case and number of individuals affected. The authors believe that if the odds of a minority applicant being successful are one half those of a majority member the imbalance should be considered substantial.

5 Although the effect size measures were not in the record, it is noteworthy that, with the exception of Asian Americans, none of the odds ratios are “small” and most are in the medium and large category.

6 Shea v. Kerry, 796 F.3d 42 (D.C. Cir, 2015).

7 796 F. 3d 42 at 66 (noting that the study in the record does not describe which, if any, statistical test was or standard of significance was used).

8 Ibid. at 66.

9 Ibid. at 66.

10 The number of American-Indian females in the Finance Office Division follows a binomial distribution with parameters 0.002 and 125. The probability of observing no American-Indian females is 0.7786.

11 When statistical tests are used to check whether the data satisfies some assumptions, the 0.10 level is often adopted, see Fleiss (Citation1986) and Cheng (Citation2015). Gastwirth et al. (Citation2009) found that the 0.15 level was appropriate when Levene’s test for homogeneity of the variance of several groups was used as a preliminary check. Of course, 0.78 is much larger than these levels of significance.

12 Clearly, courts should not give any weight to a nonsignificant finding in this situation.

13 The courts, not statisticians, should determine the value of the odds ratio or selection rates of the success of the minority and majority applicants that is legally important and the power or probability of obtaining a statistically significant result when the success rates differ by that value.

14 For the binomial model, one standard deviation equals nπ(1π). Two-sided tests are used in the jury discrimination context because members of either gender or of any race-ethnic group could be subject to unfair treatment. The court accepted this test in Castenada v. Partida, 430 U.S. 482 (1977). The analysis of data from this case is described in Zeisel and Kaye (Citation1997) and Finkelstein and Levin (Citation2001).

15 See, for example, Theorem 4.2.1 in Larsen and Marx (Citation2017).

16 The binomial calculation also gives 0.7786.

17 Using binomial with success probability 0.002, the minimum sample size is 1497.

18 796 F. 3d at 66.

19 The data are on page 219 of Vol. 1 of the Joint Appendix.

20 See and Table A1.

21 To check the accuracy of the Poisson approximation, the exact probability using the binomial distribution was calculated. It equals 0.002978; in very close agreement with the Poisson approximation.

22 In addition to the small-expected number of American-Indian female Finance Officers, their expected number of technical employees was 1.31, clearly less than three. They formed 0.3% of the nations’ technical workers and the Department employed 377. See the Joint Appendix, Volume I, page 219.

23 Power is the probability for a data-based criterion to detect a significant difference when the difference truly exists.

24 The Joint Appendix, Volume I, p. 219.

25 The calculations are based on the binomial distribution, rather than the Poison approximation used previously. There is a large literature concerning various approximations to the sampling distribution of a binomial proportion; see Brown, Cai, and Das Gupta (Citation2001) for further discussion.

26 Neither the courts nor Congress have established a specific alternative that is important to detect. Because an odds ratio of 0.5 is considered a medium effect (Olivier and Bell Citation2013), it may approximate a manifest imbalance. Methods for determining the sample size needed for reliable inference are discussed in Gastwirth and Xu (Citation2014).

27 Because statistical evidence of under-representation in disparate treatment cases just places a burden of explaining how the disparity arose from legitimate considerations on the employer, it seems reasonable for courts to check a system in which minorities have only one-half the odds of success as majority members. A firm might consider taking steps to remedy similar levels of minority under-representation in a location or division.

28 The probability a Poisson random variable with mean 18.2 is less than or equal to 11 is 0.0502 and the probability a Poisson random variable with mean 9.2 is less than or equal to 11 is 0.783.

29 The number 9100 is obtained by dividing 18.2 by 0.002.

30 The number 364 was chosen as it is in between the number of specialists in level 02 and level 03 in Table A3.

31 When the odds ratio is 0.5, π*=0.02564.

32 The 1989 data were also analyzed and yielded very similar conclusions.

33 In the present case, no White males were laid off or demoted to create a position for a historically under-represented minority. The plaintiff’s advancement was delayed but he remained eligible for promotion.

34 The opinion in the disparate impact case, Jones v. Boston 752 F. 3d 38 (1st Cir. 2014), noted that statistical significance is well-defined and questioned whether it is possible to have a principled definition of “practical” or “meaningful” difference. In contrast the court in Waisome v. Port Auth. of N.Y. & N.J., 948 F.2d 1370, 1376 (2d Cir.1991) found no disparate impact in the passing rates of Black and White applicants where, “though the disparity was found to be statistically significant, it was of limited magnitude”. In that case, however, the difference in the actual promotion rates was not statistically significant. Although the sample sizes in Waisome were much smaller than those in Jones, the odds ratio measures of “effect size” were similar; 0.28 for the promotion rates in Waisome and 0.205 for the rates of a positive drug test for African-American applicants to the Police Academy versus Whites. Because of the large sample in Jones, the data were statistically significant, even at the 0.001 level. See Gastwirth (Citation2017) for the discussion of statistical and practical significance in the equal employment cases and references to the related literature.

35 See Gastwirth (Citation2017) for cases involving police officers where courts approved employment tests having a disparate impact on a minority group even though the correlation between test score and job performance were low (.21) but statistically significant.

36 In McDonnell Douglas Corporation v. Green, 411 U.S. 792, 804-05, the Court said statistics as to petitioner’s employment policy and practice may be helpful to a determination of whether petitioner’s refusal to rehire respondent in this case conformed to a general pattern of discrimination. The plaintiff survived summary judgment in Boone v. Clinton, 675 F.Supp.2d 137, (D.D.C. 2009) where she provided both statistical and anecdotal evidence while the plaintiff in Nicholls v. Philips Semiconductor Mfg., 2011 WL 180565, (S.D.N.Y. 2011) did not because no additional evidence was submitted.

37 Furnco Const. Corp. v. Waters 480 U.S. 567, 579 (1978).

38 The opinions by Judge P.E. Higginbotham in Vuyanich v. Republic National Bank, 505 F. Supp. 224 (N.D. Tex. 1982) vacated and remanded on other grounds 723 F.2d 1195 (5th Cir. 1984) and Markey v. Tenneco, 707 F.2d 172 (5th Cir.1983) affirming 32 FEP Cases 705 (E.D La. 1982) discuss the use of applicant flow data to obtain a geographically weighted labor market area. Gastwirth (Citation1988, p.172) lists cases accepting a weighted labor market.

39 In very large samples, a small difference between the minority proportion of employees or hires and their proportion in the QUALF that may not have legal significance can be statistically significant. See Gastwirth and Xu (Citation2014).