Publication Cover
Social Epistemology
A Journal of Knowledge, Culture and Policy
Volume 31, 2017 - Issue 4
1,102
Views
4
CrossRef citations to date
0
Altmetric
Original Articles

Should juries deliberate?

 

Abstract

Trial by jury is a fundamental feature of democratic governance. But what form should jury decision-making take? I argue against the status quo system in which juries are encouraged and even required to engage in group deliberation as a means to reaching a decision. Jury deliberation is problematic for both theoretical and empirical reasons. On the theoretical front, deliberation destroys the independence of jurors’ judgments that is needed for certain attractive theoretical results. On the empirical front, we have evidence from both legal and non-legal contexts that group deliberation often leads to group judgments that are worse in a number of respects than judgments generated by non-interactional methods of judgment aggregation. Finally, I examine some possible alternatives to free-wheeling jury deliberation, including the constrained and structured deliberation embodied in the DELPHI method, voting (without deliberation), and averaging of probabilistic judgments.

Disclosure statement

No potential conflict of interest was reported by the author.

Notes

1. See Hans (Citation2008) and “The Jury is Out,” The Economist (Citation2009) for discussion of the range of countries that used juries and in what form. As noted, all common law countries use jury trials. Among civil law countries, some employ all-citizen juries (e.g. Spain and Austria) and still others (e.g. Germany, France, and Italy) use “mixed tribunals” or “mixed juries” consisting of both professional judges and lay citizens for more serious offences. While mixed juries raise some special issues, the arguments of this paper will apply to them as well. Asian democracies that make some at least use of jury trials include Japan and South Korea. Democratic countries that do not use jury trials include South Africa and many countries in Latin America. Finally, jury trials are also used in some non-democratic countries, such as China.

2. To be fair, Laudan does acknowledge that if “we were to discover that there is a certain kind of relevant evidence (hearsay, ) whose importance juries are apt to overestimate, then excluding it might be appropriate” (Citation2006, 120). But I am inclined to think that this caveat kicks in more frequently than Laudan suspects and that we do indeed have evidence that jurors are apt to systematically overweight certain kinds of evidence. See Benforado (Citation2015) for extensive evidence and references to relevant studies.

Laudan also notes that there may be non-epistemic reasons for excluding certain types of evidence, such as ethical considerations. But he generally finds these wanting. For instance, against the claim that the inadmissibility of illegally obtained evidence is necessary so as not to incentivize police misconduct, Laudan holds that the evidence should be admissible but that the wrongs of illegal searches should be remedied separately, for instance through civil suits.

3. Another proposal is the use of “virtual trials,” which will be briefly considered in Section 4.4.

4. For example, consider the cases of Australia, Canada, and the Crown Courts of England and Wales. In Australia, the High Court’s ruling in Black v. the Queen (Citation1993) recommends jury instructions that include the following:

You also have a duty to listen carefully and objectively to the views of every one of your fellow jurors. You should calmly weigh up one anothers’ opinions about the evidence and test them by discussion. Calm and objective discussion of the evidence often leads to a better understanding of the differences of opinion which you may have and may convince you that your original opinion was wrong.

The Canadian Judicial Council’s (Citation2016) model jury instructions say:

You should make every reasonable effort, however, to reach a verdict. Consult with one another. Express your own views. Listen to the views of others. Discuss your differences with an open mind. Try your best to decide this case.

Finally, the Crown Court Bench Book (Citation2010) on Directing the Jury includes the following:

Subject to the application of section 17 Juries Act 1974, the jury must return a unanimous verdict. Section 17 enables a majority verdict to be returned after the jury has been deliberating for at least two hours. In practice, the minimum period is 2 h and 10 min. By section 17(4) the trial judge, before considering a majority verdict, should allow such period for deliberation as the nature and complexity of the case requires. In long and complex, and multi-handed, cases it may be appropriate not to consider a majority verdict direction until the jury has been deliberating for well over a day and, perhaps, longer. It is good practice (and good manners) for the trial judge to invite observations from the advocates when a majority verdict direction is under consideration.

5. Cf. Hong and Page (Citation2012) on diversity of backgrounds and its bearing on non-deliberative aggregation methods such as averaging of individuals’ estimates.

6. For deliberation increasing confidence without increasing accuracy, see also Heath and Gonzalez (Citation1995) and Baron et al. (Citation1996).

7. Thanks to an anonymous referee for suggesting this point.

8. Here is a brief overview of the consensus model of Wagner and Lehrer (Citation1981). Suppose that at the outset (round 0), there are n individuals with probabilities in some proposition. Suppose also that each individual i assigns weights of respect wij to each individual j in the group (including herself), where the weights are non-negative and sum to one. These weights can be understood, roughly, as representing her assessments of the reliabilities of the various members of the group with respect to the topic at issue. At the first round of iteration (round 1), we take each individual’s new probability to be a weighted linear average of the individuals’ round 0 probabilities, with the weights being her weights of respect for the individuals in the group. So, for each individual i,

Now, suppose that some individual assigns a positive weight of respect to herself, and each individual can be connected to every other through a chain of assignments of positive weights of respect (they call this communication of respect). Then, iterated updating by weighted linear averaging is guaranteed to result in convergence to a consensus probability assignment.

Does this model point to a way of reaching consensus without deliberation? There are two problems. The first is that it is doubtful whether (as Wagner and Lehrer claim) individuals are rationally obligated to update their credences by taking such weighted linear averages of the group members’ credences, in particular because updating by taking such a weighted linear average sometimes conflicts with the Bayesian norm of updating one’s probabilities by conditionalization (Laddaga and Loewer Citation1985). The second is that it is unclear how to implement the model in practice. In particular, it is unclear where to get the weights of respect from. There are many options. We could ask jurors to rate each other, but it is questionable whether their estimates of each others’ reliability should be taken very seriously. Alternatively, we could impose weights of respect from outside. We might, for instance, impose on them uniform weights of respect (so that each juror’s weights for herself and others are all equal), or use weights that correspond to each juror’s score on some set of test questions (along the lines of Cooke Citation1991). But imposing weights of respect from the outside in this way further undermines the claim that the resulting group-level probability judgment really constitutes any kind of consensus. Nevertheless, there are advocates of using the Wagner and Lehrer consensus model in real-world committee decision-making (Regan, Colyvan, and Markovchick-Nicholls Citation2006), and there may be some justification for attempting to bring the model to bear on juries as well.

9. See Drabsch (Citation2005, 16–17)

10. In addition to public confidence, it is worth noting that in studies of mock juries, jurors themselves seem to be more satisfied with the verdict and the quality of the deliberation than jurors on mock juries with majority decision rules (Hastie, Penrod, and Pennington Citation1983, 78–79). These effects do matter and must be weighed against other considerations. However, effects of a given procedure on juror satisfaction pale in significance compared to their effects on accuracy.

11. Somewhat oddly, to my mind at least, Dawkins does not propose doing away with jury deliberation entirely. Instead, his proposed remedy is simply to have two separate deliberating juries, with guilty verdicts from both of them required for conviction.

12. A clarificatory note: while standard presentations of the CJT assume that all individuals have the same competence level, extensions of the CJT weaken this condition. Grofman, Owen, and Feld (Citation1983) show that Condorcet-like results can still obtain if individuals are heterogeneous with respect to competence. Their Theorem VI states that for heterogeneous groups (where individuals need not all have the same competence level), if each individual has a competence level above 0.5, then the greater the probability that a majority judgment is correct. Moreover, their Theorem V allows that some individuals may have a competence level below 0.5. It states that if the distribution of individuals’ competence levels is symmetric, then results analogous to the CJT can be obtained, with the average competence level in place of the competence level that was previously assumed to be the same for everyone.

13. Another obstacle to using the CJT to justify reliance on juries is that the theorem concerns the probability of a majority judging correctly, whereas juries are often subject to unanimity decision rules.

14. Rawls (Citation1999), 315) makes this point in the context of justifying majority rule in political a airs. Another relevant possible cause of non-independence is the presence of opinion leaders. See Grofman, Owen, and Feld (Citation1983) and Estlund (Citation1994) for discussion of opinion leaders and independence in the context of the CJT.

15. For instance, the juror might be convinced by some piece of evidence that was presented but then ruled inadmissible.

16. See Sunstein (Citation2011).

17. Moreover, even without any deliberation, the judgments of participants in a prediction market will not be independent of each other, since they are influenced by signals sent by market prices. Group deliberation is neither necessary nor sufficient for the non-independence of participants’ judgments. Individuals can discuss matters with each other without their judgments becoming probabilistically dependent (e.g. if they are not at all influenced by the discussion), and individuals can have probabilistically dependent judgments even in the absence of group deliberation (e.g. in the prediction market case).

18. See Anderson and Holt (Citation1997).

19. On a more optimistic note, Sommers (Citation2006) found that racially diverse juries do better in various respects than more homogeneous ones. In particular, racially diverse juries exchanged a wider range of information than homogeneous ones, and this wasn’t wholly attributable to the performance of blacks on the juries; white participants on diverse juries also cited more facts, made fewer errors, and were more amenable to discussion of racism than whites on homogeneous juries. Crucially, however, this just provides evidence that diverse deliberating juries do better (in certain respects) than homogeneous deliberating juries. In no way does it suggest that deliberating juries (whether diverse or not) do better in any respects than non-deliberating ones.

20. See MacCoun (Citation2002, 116, 121) for the first point, and Schumann and Thompson (Citation1989) for the second.

21. But see Lewis (Citation1989) for an argument that the concept of a chancy punishment can justify the differing sentences for murder and for attempted murder.

22. Compare Rawls (Citation1999), who writes that “Men could not regulate their actions by means of rules if this precept [that similar cases should be treated similarly] were not followed.”

23. Perhaps the main concern about scaled punishments involves the issue of low-probability verdicts. The worry is that defendants will be subject to punishment under such a system even if the jury reports that the defendant is, say, only 20% likely to be guilty. This also creates a worry about potential abuse, in which prosecutors harass political or other targets, hoping to inflict some measure of punishment on them. Wansley (Citation2013, 353–354) argues that there are a number of obstacles to such abuse, including limited prosecutorial resources, political accountability for district attorneys, the appeals process, and threats of lawsuits for malicious prosecution. Regarding the intuitive repugnance of punishing defendants as a result of low-probability convictions, Wansley argues that the scale of punishments should be sharply non-linear, largely due to the decreasing marginal disutility of prison time and other punishments (one year in prison is far more than half as bad as two years in prison). Therefore, below some fairly high probabilistic threshold, punishments would probably involve no prison time and instead involve at some sort of probation or supervision. Other concerns involve whether this system would increase net incarceration and that it would undermine public confidence in and support for the criminal justice system.

24. Of course, if she thinks that the other jurors will go through the very same reasoning, then this gives her at least some grounds for doubting whether their votes will express their sincere beliefs.

25. See Feddersen and Pesendorfer (Citation1998) and List and Pettit (Citation2011, 114–119) for further discussion.

26. Cf. Galton’s (Citation1907a) argument against using the average of individuals’ estimates to serve as the group estimate: “That conclusion is clearly not the average of all the estimates, which would give a voting power to \cranks” in proportion to their crankiness. One absurdly large or small estimate would leave a greater impress on the result than one of reasonable amount, and the more an estimate diverges from the bulk of the rest, the more influence would it exert.” Galton himself favored taking the median estimate to serve as the group’s estimate. See also Bassett Jr. and Persky (Citation1999) and Levy and Peart (Citation2002). I leave open whether discarding outliers or instead taking the median would be better.

27. As noted by Laddaga and Loewer (Citation1985), linear averaging of probabilities does not commute with the Bayesian norm of Conditionalization, which states that one’s probability for H after learning E (and nothing stronger) should equal one’s previous conditional probability for H given E. Suppose that we take the linear average of a set of individuals’ probability functions, and that then they all learn E (and nothing stronger), and then we take the linear average of their post-learning probability functions. The new group-level probability function will not in general equal the old group-level probability function conditionalized on E. How worrying this is depends on why we think Conditionalization is important. Many authors (e.g. Russell, Hawthorne, and Buchak Citation2015) motivate Conditionalization by appeal to a diachronic Dutch Book argument, which shows that violating Conditionalization can lead one to accept a set of bets offered at different times that together guarantee a loss. But this should not be concerning in the jury context, for juries last only a short time and (typically) make just one decision, and anyway they are not vulnerable to exploitation as other groups, like corporations, might be.

28. No conclusions about the likely rate of false convictions follow if we don’t assume well-calibration. The rate of false convictions also depends on the ratio of truly guilty to truly innocent people who are brought to trial, as well as the probabilty that the total evidence at trial is misleading (i.e. the probability that the evidence supports guilt, given that the defendant is in fact guilty, and the probability that the evidence supports innocence, given that the defendant is in fact innocent). For instance, even with a 0.95 threshold for conviction, no false convictions will result if no innocent people are brought to trial, or if the evidence always points in favor of innocence (and the jury is able to pick up on this fact) whenever the defendant is in fact innocent. See DeKay (Citation1996) and Laudan (Citation2008) for discussion.

29. See Genest and Zidek (Citation1986) and Cooke (Citation1991) for surveys.

30. A notable variation of the standard DELPHI method allows individuals to discuss their reasons for their initial judgments after being shown a summary of individuals’ initial judgments. This may resemble certain particularly tightly structured jury deliberations in which jurors take periodic secret ballot straw polls with discussion in between. This of course threatens to bring back many of the bad features of group deliberation (though it reduces the influence of social pressures by preserving anonymity and, presumably, not requiring total consensus in the end). This sort of variant, often referred to as Estimate-Talk-Estimate has been shown to be successful in many contexts, and so is also worthy of consideration, even though it involves an element of group deliberation. See Burgman (Citation2015) for discussion of this sort of variant on DELPHI.

31. In their famous study of American juries, Kalven and Zeisel (Citation1966) found that in nine out of ten trials, the eventual jury verdict matched the pre-deliberation verdict preferences of a majority of jurors. If one concludes on that basis that deliberation is irrelevant (and no often nefarious, as I have suggested), then cost-saving considerations alone support doing away with jury deliberation. However, it is not clear that their finding really shows that deliberation is irrelevant. It could be, for instance, that nine out of ten trials are “easy cases” (so it would be surprising if the ultimate verdict diverged from the pre-deliberation majority preference) while the tenth is a tough case where deliberation plays a major role, for good or for ill.

32. In his contribution “What Should be Done?,” chapter 13 of Sunstein et al. (Citation2002). See also Schkade, Sunstein, and Kahneman (Citation1999).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.