Making Artificial Intelligence Transparent: Fairness and the Problem of Proxy Variables: Criminal Justice Ethics: Vol 40 , No 1

Abstract

AI-driven decisions can draw data from virtually any area of your life to make a decision about virtually any other area of your life. That creates fairness issues. Effective regulation to ensure fairness requires that AI systems be transparent. That is, regulators must have sufficient access to the factors that explain and justify the decisions. One approach to transparency is to require that systems be explainable, as that concept is understood in computer science. A system is explainable if one can provide a human-understandable explanation of why it makes any particular prediction. Explainability should not be equated with transparency. To address transparency and characterize its relation to explainability, we define transparency for a regulatory purpose. A system is transparent for a regulatory purpose (r-transparent) when and only when regulators have an explanation, adequate for that purpose, of why it yields the predictions it does. Explainability remains relevant to transparency but turns out to be neither necessary nor sufficient for it. The concepts of explainability and r-transparency combine to yield four possibilities: explainable and either r-transparent or not; and not explainable and either not r-transparent or r-transparent. Combining r-transparency with ideas from the Harvard computer scientist Cynthia Dwork, we propose four requirements on AI systems.

Keywords:

Notes

Disclosure Statement: No potential conflict of interest was reported by the author(s).

Notes

1 See Consumer Reports. 2015. “Special Report. Car Insurance Secrets.” July 30. https://www.consumerreports.org/cro/car-insurance/auto-insurance-special-report/index.htm.

2 The Sally example involves the incorrect prediction that Sally is a poor credit risk. Fairness questions arise in cases of correct predictions as well. Imagine an earlier point in Sally’s life. Her daughter has recovered, but Sally has not yet secured the job that pays enough to make her income considerably exceed her expenses. Given all the available evidence, it is not yet clear what path her life will take—continued financial uncertainty, or financial security. In these circumstances, Sally is not (yet) a good credit risk, but is it fair that she labors under a poor credit rating as a result of saving her daughter’s life? The low credit rating means that credit card companies can avoid the risk of extending her credit, but it limits Sally’s ability to manage her expenses at a critical time. A higher credit rating expands her ability to manage her expenses while imposing the risk of default on the credit card companies that extend her credit. Which option is fairer?

3 The analogy is not perfect. Unlike most gambling, you do have some control over the outcomes, since those depend on your individual attributes, and you haves some control over those. However, you typically know neither which attributes will be used, nor which values of those attributes are good and which bad.

4 Finlay, Artificial Intelligence, 5.

5 We envision a regulatory agency enforcing the proposals we make. See Sloan and Warner, “Beyond Bias,” 1, and Sloan and Warner, Privacy Fix. Our discussion does, however, apply equally to legislation. We turn to regulation (or legislation) because we assume market solutions are unlikely. Market solutions would require that consumers have the power to influence businesses by changing what they buy in light of sufficient correct information about purchasing options. We take it to be clear that, in the situations which concern us, consumers lack sufficient power and information.

6 There is no agreement on the precise meaning of predictive analytics and related terms like data mining, machine learning, and artificial intelligence. See, for example, Finlay, Artificial Intelligence, 5–16, 27–59. Finlay distinguishes and discusses relationships among machine learning, predictive analytics, data mining, artificial intelligence, neural nets, and deep learning.

7 In computer science, machine learning is generally regarded as a very important subfield of artificial intelligence, not as an equivalent. Some might consider machine learning to be a distinct neighbor subfield to AI.

8 Finlay, Artificial Intelligence, 6.

9 See Wickens, “Proxy Variables.”

10 See Lauer, Creditworthy, 56. Credit reporting on merchants began in the 1840s. Consumer credit reporting followed and was well established by the end of the 1880s.

11 See Bouk, How Our Days Became Numbered. Bouk notes that “Credit reporters’ archives of individual data, as well as their methods of assessing financial risk and judging individual character, suddenly became essential to life insurance corporations” (66).

12 Lauer, Creditworthy, 206. Numerous studies replicated that result, Lauer points out: “Study after study listed … variables … such as length of time at one’s current job and address, as the most predictive of good and bad loans. The leading determinants of creditworthiness, in other words, were only indirectly or loosely financial. Simply having a telephone in one’s home, a mortgage, and a checking or savings account—evidence of community and institutional connectedness—were among the best predictors of ‘good’ borrowers” (206).

13 “Personal Privacy in an Information Society” (https://epic.org/privacy/ppsc1977report/). Quoted in Lauer, Creditworthy, 243. For more details, see “Overview of the Privacy Act of 1974.” United States Department of Justice. https://www.justice.gov/opcl/role-privacy-protection-study-commission.

14 Stephens-Davidowitz, Everybody Lies, 262.

15 We take a broad enough view of proxy variables to regard almost everything that allocates costs and benefits as a proxy. For example, your credit rating itself almost certainly depends on the fraction of your credit card limits that you’ve currently used, but that’s presumably a proxy for whether you will have enough income and assets to pay.

16 The process is known as supervised learning. You must collect training data—and choose a type of prediction function (e.g., clustering, decision trees, or deep neural nets) for your predictive system. Then, corresponding to each type of prediction function is one or more learning (or training) algorithms that convert the training data into a classification or prediction function. Typically, one runs a training algorithm on a large portion of the training data, and sees how well the resulting prediction function predicts outcome data for the remaining training data. If the predictions are not as accurate as desired, one can repeatedly make various adjustments or try other training algorithms. See generally Alpaydin, Machine Learning, and AndBurkov, Hundred-Page Machine Learning Book. Another important issue, besides the use of proxy variables, is bias in training data. As many have pointed out, bias in the training data will translate into bias in the predictions made when the algorithm is in use. See, e.g., O’Neil, Weapons of Math Destruction, 115–18.

17 Sunstein, The Cost-Benefit Revolution, 23.

18 Finlay, Predictive Analytics, 4.

19 See RiskIQ. https://www.riskiq.com/.

20 See RiskIQ. “How the Risk Reporting Score Is Calculated.” https://vimeo.com/261343566.

21 For the purposes of this article, we need not have introduced the notation for distance at all. We could have written simply |score(x) – score(y)| every place where we write d(x, y). However, the d(x, y) notation is shorter, and it ties in with some ideas from the computer science research literature on fairness that we will discuss shortly.

22 See Jacobs and Wallach, “Measurement and Fairness.” Jacobs and Wallach note that there are different conceptions of fairness and analyze the impact of the failure to distinguish them on the computer science literature on fairness.

23 Roemer, Equality of Opportunity, 1. We do not offer our version of a level playing field fairness as a comprehensive theory of social justice. We offer it only as a plausible component of social justice in societies in which market economies allocate socio-economic positions based on a person’s attributes such as talent and degree of effort. In particular, we note that level playing field fairness is consistent with requirements for affirmative action as long as one sees affirmative action as necessary to level the playing field.

24 A more sophisticated approach would sort attributes into different types and combinations with different likelihoods of success, as John E. Roemer does in Equality of Opportunity, but we need not do so here.

25 Roemer, Equality of Opportunity, chapter 11.

26 For example, see Wright and Burawoy, How to Be an Anticapitalist, 10.

27 See Himmelstein et al., “Medical Bankruptcy.”

28 The allocation might be probabilistic, so a more precise description would require that the expected value of A(x) – A(y) be at most d(x, y).

29 Requirement (2) is the simplification of a requirement in Dwork et al., “Fairness Through Awareness”. Their Definition 2.1, which handles arbitrary distance metrics that do not require the existence of a score, is designed to implement the “fairness constraint, that similar individuals are treated similarly” (214). Our point in the discussion which follows is that it is not sufficient to ensure level playing field fairness.

30 That is, these are distinct individuals: [Owner: Sally, Bankruptcy: Yes, Home value: 250,000, Age: 38] and [Owner: Sally, Bankruptcy: Yes, Home value: 150,000, Age: 28]. Each filling out of the values represents a distinct individual.

31 There are 2⁵⁰ possible individuals generating about 2⁹⁹ distinct pairs, or about 10²⁹. If one billion d(x, y) values could be computed each second (i.e., one per nanosecond, which is unreasonably fast), then about 10²⁰ s would be needed, which is over a billion centuries.

32 It is common to assume that revealing an algorithm’s source code will reveal to regulators what they need to know. See for example, Citron and Pasquale, “The Scored Society.” Citron and Pasquale insist on the need to know source code. Compare Kroll et al., “Accountable Algorithms.” Kroll et al. discuss the assumption that one needs to know the source code and point out its difficulties. As they note, “The source code of computer systems is illegible to nonexperts” (638). If the code were legible at least to experts, then expert reports could make algorithms consumer-transparent However, “even experts often struggle to understand what software code will do: inspecting source code is a very limited way of predicting how a computer program will behave” (663). Indeed, some approaches to predictive analytics, including some that are very popular, such as support vector machines and deep learning of neural nets, give predictive models that are quite difficult for humans to comprehend. See, for example, Finlay, Predictive Analytics, Data Mining and Big Data, 126.

33 See, for example, Core et al., “Building Explainable Artificial Intelligence”; Gunning and Aha, “DARPA’s Explainable Artificial Intelligence”; and Samek, Wiegand, and Müller, “Explainable Artificial Intelligence.”

34 See generally Arrieta et al., “Explainable Artificial Intelligence.”

35 Finlay defines deep learning as “Predictive models based on complex neural networks (or related architectures), containing very large numbers of neurons and many hidden layers and/or complex overlapping networks. These tools are proving successful at complex ‘AI’ type problems such as object recognition and language translation.” Predictive Analytics, 82. In mildly more technical terms, “Deep-learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level. With the composition of enough such transformations, very complex functions can be learned.” LeCun, Bengio, Hinton, “Deep Learning,” 436.

36 Finlay defines a support vector machine as “An advanced type of non-linear model. Support vector machines have some similarities with neural networks.” Predictive Analytics, 86.

37 In Finlay’s terms, “[t]he goal of clustering is to identify similarities and/or connections within data such that you can group (cluster) similar cases together. The idea is that because cases in a given cluster have a lot of very similar attributes, then you can treat everyone/everything in that cluster in a similar way.” Artificial Intelligence for Everyone, 82.

38 Not all scoring systems lack explainability. Some, if not all, the major credit reporting agencies will offer you explanations of your credit rating that meet our definition of explainability.

39 As Cathy O’Neil, explains, “[scores] carry out thousands of ‘people like you’ calculations. And if enough of these “similar” people turn out to be deadbeats or, worse, criminals, that individual will be treated accordingly”. Weapons of Math Destruction, 145.

40 Finlay, Predictive Analytics, Data Mining and Big Data, 124.

41 See, for example, Ryu, Kim, and Lee, “Deep Learning Improves Prediction.”

42 See, for example, the forum on understanding credit scores at “MyFICO® Forums.” https://ficoforums.myfico.com/.

43 Kroll et al., “Accountable Algorithms,” 689.

44 Ibid.

45 Ibid., 638.

46 If the patients in the training data are only identified as “patient 1,” “patient 2,” and so on, one will not be able to tell just by examining the data whether it is racially and gender diverse.

47 A testing process could involve A-B testing. Starting with general criteria that almost everybody meets, it could randomly select a small number to get a new treatment/cost/benefit, for purposes of studying it.

48 See Lauer, Creditworthy, 215. Lauer notes that the mid-1960s outrage over data collection for credit rating soon died down. See also Bouk, How Our Days Became Numbered. Bouk remarks that “conflicts over traditional, predictive risk making spur the creation of new forms of risk making supposed to change fates, such that more risks get made from more people in new ways” (240). For insightful studies of consumers’ acceptance of statistical analyses, see Igo, Averaged American, and Muller, Tyranny of Metrics. For studies focused on consumer acceptance in the twenty-first century see Bernard, Triumph of Profiling, and Mau, Metric Society. We discuss the options of resistance, acquiescence, and acceptance in detail in Sloan and Warner, Privacy Fix.

49 Rule, Privacy in Peril, 144.

50 Compare the criminal case of State of Wisconsin v. Eric L. Loomis, 371 Wis.2d 235 (2016), which denies the defendant access to information that our proposal would require for r-transparency, with the civil case of Houston Federation of Teachers v. Houston Independent School District, 251 F.Supp.3d 1168 (2017) (United States District Court, S.D. Texas 2017), which requires access to at least some such information.

Making Artificial Intelligence Transparent: Fairness and the Problem of Proxy Variables

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Making Artificial Intelligence Transparent: Fairness and the Problem of Proxy Variables

Abstract

Notes

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature