543
Views
0
CrossRef citations to date
0
Altmetric
Review Article

Elementary probabilistic operations: a framework for probabilistic reasoning

ORCID Icon & ORCID Icon
Pages 259-300 | Received 20 Jun 2022, Accepted 22 Jul 2023, Published online: 29 Sep 2023

Abstract

The framework of elementary probabilistic operations (EPO) explains the structure of elementary probabilistic reasoning tasks as well as people’s performance on these tasks. The framework comprises three components: (a) Three types of probabilities: joint, marginal, and conditional probabilities; (b) three elementary probabilistic operations: combination, marginalization, and conditioning, and (c) quantitative inference schemas implementing the EPO. The formal part of the EPO framework is a computational level theory that provides a problem space representation and a classification of elementary probabilistic problems based on computational requirements for solving a problem. According to the EPO framework, current methods for improving probabilistic reasoning are of two kinds: First, reduction of Bayesian problems to a type of probabilistic problems requiring less conceptual and procedural competencies. Second, enhancing people’s utilization competence by fostering the application of quantitative inference schemas. The approach suggests new applications, including the teaching of probabilistic reasoning, using analogical problem solving in probabilistic reasoning, and new methods for analyzing errors in probabilistic problem solving.

Probabilistic reasoning in general and Bayesian reasoning in particular have been the subject of intensive research in psychology and educational science for many years. In fact, the subject has entered the mathematics curricula of most high schools in western countries (see, e.g., Chernoff & Sriraman, Citation2014). In the present article, probabilistic reasoning tasks are conceived of as instances of mathematical problem solving that has been studied by Newell and Simon (Citation1972) who adopted the problem space conception (for a similar view of probabilistic reasoning, see Sirota, Vallée-Tourangeau, et al., Citation2015). We present the framework of elementary probabilistic operations (EPO) to analyze the structure of elementary probabilistic problems, and to explicate empirical findings concerning people’s performance in probabilistic reasoning tasks.

This article is structured as follows: We first present an exposition of the EPO framework. Second, principle applications of the EPO framework are discussed: (a) a problem space representation of elementary probabilistic reasoning is presented; (b) we propose a classification of probabilistic problems on the basis of computational considerations; (c) different methods to improve Bayesian reasoning as well as facilitating effects of numeracy, level of education, and cognitive ability are discussed, employing the EPO approach; and (d) we analyze methods of training Bayesian reasoning. Finally, explanations based on the EPO framework are compared to traditional explanations, and additional applications of the EPO approach are deliberated in the general discussion section.

The framework of elementary probabilistic operations

The conception of probabilistic reasoning as the execution of a set of elementary probabilistic operations dates back, at least, to Von Mises (Citation1972) who reduced probabilistic reasoning to basic operations. More recently, the problem of the efficient computation of specific elementary probabilistic operations has become relevant in the context of expert systems (e.g., Darwiche, Citation2009; Neapolitan, Citation1990). Shenoy and Shafer (Citation1990) used the framework of elementary probabilistic operations for the description of computations in Bayesian networks and belief function systems (see also Kohlas, Citation2003).

In the following, we first present the different types of probabilities that are involved in elementary probabilistic reasoning. This is followed by an exposition of three elementary probabilistic operations for mapping different types of probabilities. These concepts are introduced informally with the help of the example in . A formal presentation of the framework is given in the Appendix. We use capital letters to refer to random variables and lowercase letters to denote the values of variables.

Figure 1. Three-variable diagnosis problem: problem description, relevant probabilities, and probabilistic operations.

Types of probabilities

Let X1,X2,,Xn represent n discrete random variables, each with a finite number of values (cf. Appendix, for a formal definition of the probability space underlying the definition of the random variables). For the sake of concreteness, let us consider a probabilistic problem involving three random variables: the hypothesis H and two pieces of evidence E (cf. , Part A). There exist three basic types of probabilities (or probability distributions): joint, marginal, and conditional probabilities.

The joint (probability) distribution P(X1,X2,,Xn) assigns to each possible combination of the values of the random variables a real value in the range [0,1] in such a way that the probabilities of all combinations sum to one. In our example, the joint probability distribution consists of eight point probabilities (cf. , Part C), and is denoted by P(H,E1,E2). For example, the point probability P(h,e1,e2) denotes the probability that a disease is present (h) and the two diagnostic procedures indicate the presence of the disease (e1 and e2). We use P(h,e1,e2) as an abbreviation for P(H=h,E1=e1,E2=e2).

The joint probability distribution contains the complete probability information about the random variables in question. This enables one to answer any possible probabilistic question about these variables. In addition, the joint probability distribution is defined on the finest partitioning of the sampling space consisting of combinations of the values of the random variables involved. These two characteristics, completeness and most fine-grained partitioning of the sampling space render the joint distribution of focal importance in the case of probabilistic reasoning.

The marginal distribution is the probability distributions over a single variable. In our example P(H) denotes the marginal distributions over variable H consisting of two point probabilities: P(h) and P(h¯). In addition, probability distributions involving more than a single variable but less than the full set of variables are called marginal joint distributions. In our example P(h,e1), P(h¯,e1), and P(e1,e2) denote marginal joint probabilities.

The conditional (probability) distribution PX1,X2,,XkXk+1,,Xn denotes the set of distributions over the variables X1,X2,,Xk, for each combination of values of variables Xk+1,Xk+2,,Xn (assuming P(Xk+1,,Xn)>0). In our example, PHE1,E2 denotes the distributions over the values of H for each combination of the values of E1 and E2. For example, PHe1,e¯2 denotes the distribution of the values of variable H in the case of E1=e1 and E2=e¯2. It consists of two point probabilities: Phe1,e¯2 and Ph¯e1,e¯2.

The three different types of probabilities make up the first component of the EPO framework. The second component is given by the elementary operations.

Elementary probabilistic operations (EPO)

There exist three elementary probabilistic operations: (1) combination of information, (2) marginalization, and (3) conditioning. These operators are used to switch between different types of probabilities. The first operation, the combination of probability information, combines probability information of different random variables and consists in the generation of a joint probability distribution by combining conditional and marginal probabilities. In the example in , the operation is applied in two different ways: (a) the information about variable H and E1 is combined: P(h,e1)=Pe1hP(h), and (b) this combined information is further combined with information about variable E2: P(h,e1,e2)=Pe2h,e1P(h,e1). These two operations can be integrated into a single expression: P(h,e1,e2)=Pe2h,e1Pe1hP(h). This is an instance of the chain rule used for combining probabilistic information (cf. Appendix).

The second elementary probabilistic operation is called marginalization. This operation can be considered as an operation of focussing information on a subset of the random variables. In the case of discrete variables, marginalization is realised by means of a summation performed on the joint distribution. The result of marginalization is a marginal (joint) distribution. The summation is taken over all combinations of values of those variables that do not make up the resulting marginal distribution. These variables are “summed out” of the distribution. In the example in , the operation of marginalization, P(e1,e2)=P(h,e1,e2)+P(h¯,e1,e2), focusses information on variables E1 and E2.

The final probabilistic operation is called conditioning. It consists in a re-evaluation of the probabilistic information within a new context (reference class) given by the values of the conditioning variables, and it is realized by means of division. In our example in , the operation Phe1,e2=P(h,e1,e2)P(e1,e2) results in a new evaluation of hypothesis H in light of the fact that both diagnoses are positive: E1=e1 and E2=e2.

Level of explanation of the EPO framework

The demonstration of the EPO framework in provides a concrete algorithm for solving a Bayesian problem. Referring to the three levels of explanation of an information processing system of Marr (Citation1982), the formal part of the EPO framework should be considered as a computational level theory. It specifies the basic logic of the computations involved in elementary probabilistic reasoning problems as well as the way they can be carried out (cf. Marr, Citation1982, , on p. 25). The EPO approach is not an algorithmic level theory. It is not concerned with details of specific algorithms for implementing the elementary probabilities and representations of the probabilistic information. As exhibited in the next section, the operations of the EPO framework also work with other types of quantities like frequencies.Footnote1 In addition, they can be realized in different ways. For instance, the operation of combination may be performed using probability and frequency trees or by means of frequency arrays (see, e.g., Sedlmeier & Gigerenzer, Citation2001), and the operation of marginalization is easily accomplished in the context of contingency tables. Finally, the operations are valid within other probabilistic frameworks, like the belief function framework (Kohlas, Citation2003; Shenoy & Shafer, Citation1990) or mental model theory (Johnson-Laird et al., Citation1999).

Figure 2. Possible goal hierarchy of an expert problem solver and operations applied to achieve the goals for the three-variable diagnosis-problem of .

Figure 2. Possible goal hierarchy of an expert problem solver and operations applied to achieve the goals for the three-variable diagnosis-problem of Figure 1.

Figure 3. Different formats of the general two-variable diagnosis problem.

Figure 4. Bayesian Problems with different probability formats and types of framing.

Figure 4. Bayesian Problems with different probability formats and types of framing.

As a computational level theory, the EPO approach enables an abstract specification of the problem space, underlying elementary probabilistic problems, and a classification of probabilistic problems according to computational requirements for solving a problem. This distinguishes the EPO approach from traditional explanatory approaches that are located on the algorithmic level (see, e.g., Johnson & Tubau, Citation2015).

Elementary probabilistic operators and quantitative inference schemas

As noted above, the operations of combination, marginalization, and conditioning also work with quantities other than probabilities, like frequencies, length, area, volume or weight. This is due to the fact that these quantities share important characteristics with probabilities. Specifically, similar to probabilities, these quantities conform to the axioms of non-negativity and additivity. Specifically, the axiom of non-negativity states that quantities assigned to real objects are always positive, and the axiom of additivity states that the quantity assigned to the object resulting by joining two or more objects without overlap results in the sum of the quantities assigned to the single objects.Footnote2 In contrast to probabilities, the norm axiom that states that probabilities are smaller or equal to one does not apply to quantities.

People have acquired quantitative inference schemas. These are problem schemas (cf. Hayes, Citation1989, p. 13) that are part of the curriculum of every elementary school. They are implementations of elementary probabilistic operations and combinations thereof, respectively, i.e., concrete algorithms that work with specific representations. The following three basic quantitative inference schemas implement the three (probabilistic) operations. They are called the part-whole schema, the concatenation schema, and the (recursive) partitioning/selection schema.

The part-whole schema implements the operation of conditioning: If Object X constitutes a part of Object Y, then the proportion p of the quantity qX, assigned to X, within the whole quantity qY, assigned to Y, is given by the fraction p=qX/qY. For example, 100 apples (quantity qX) out of a total of 300 apples (quantity qY) make up a proportion of p=100/300=1/3. The concatenation schema accomplishes the operation of marginalisation: If object X is made up exhaustively of disjoint parts X1,X2,,Xn with quantities q1,q2,,qn, then the quantity q of X, conforms to the sum of the quantities of the disjoint (and exhaustive) parts: q=q1+q2+qn. For example, putting two sticks of length 3 and 2 meters together (without overlap) results in a stick of length 5 meters. Finally, the (recursive) partitioning/selection schema implements the operation of combination: Assume that the quantity q is assigned to Object X, and q is partitioned according to the proportions p1,p2,,pm (p1+p2++pm=1). This results in m quantities: qp1,qp2,,qpm The partitioning schema can be applied recursively, by further partitioning the quantities resulting from a previous partitioning step. In this way, the chain rule (cf. Appendix) is implemented. An illustration of the application of the quantitative partitioning/selection schema can be found in Gigerenzer and Hoffrage (Citation1995, on p. 690) showing the results of cutting a beam recursively. Instead of the recursive partitioning of a quantity the schema may implement the recursive selection of entities (parts) from a given quantity. We, thus, call the schema partitioning/selection schema. For example, assume that there are 9 apples. Alice selects 2/3 of the apples leaving 1/3 for Bob. Thus, the whole quantity (9) was partitioned into two parts, 6 (Alice) and 3 (Bob). In this way, the information which person received how many apples was combined with the original quantitative information, the number of apples.

In addition to these quantitative inference schemas, implementing elementary quantitative operations, people also possess schemas that integrate two elementary operations. For example, if Object X is made up exhaustively of the disjoint parts X1,X2,,Xn with quantities q1,q2,,qn, then the proportion of the quantity of Object (Part) Xj with respect to the quantity assigned to Object X, conforms to the fraction qj/(q1+q2++qn). This quantitative inference schema integrates the operation of marginalization and conditioning. This combined marginalization-and-conditioning schema will be called quantitative proportion schema. Referring to the previous example, the number of apples selected by Alice is 6 leaving 3 to Bob. Thus, the proportion of apples possessed by Alice is 6/6+3.

A quantitative inference schema can be conceptualized as a production rule that operates on a specific quantity. If the condition for applying an operation is satisfied, the respective operation is executed. The quantitative inference schemas are not represented in the abstract manner described above. Rather, depending on their experience with different quantitative domains, people possess specific versions related to particular quantities, like frequencies or volumes. Consequently, quantitative inference schemas are located at Marr’s (Citation1982) algorithmic level. They implement concrete algorithms that work with specific data structures (quantities). This is in line with the observation of McDermott and Larkin (Citation1978) that problem schemas of novices are tied to concrete aspects of a problem. This concludes our exposition of the EPO framework. We, now, turn to applications.

Application of the EPO framework

In the following, we apply the EPO framework in four ways: First, a problem space representation of elementary probabilistic reasoning problems will be presented. Second, we provide a classification of elementary probabilistic reasoning tasks according to the computational requirements for solving the task. Third, the framework is used to explicate why and how different methods for improving Bayesian reasoning work. This comprises a discussion of the effects of individual characteristics, like numeracy, level of education, and cognitive ability, on Bayesian problem solving. Fourth, we employ the approach to reveal the common structure underlying different methods of teaching Bayesian reasoning.

A problem space representation of elementary probabilistic reasoning

The problem space concept is well-suited for explaining mathematical problem solving, like elementary probabilistic reasoning. Newell and Simon (Citation1972) conceptualized problem solving activities as a search through a problem space with a path connecting the initial state, given by the problem description, with the goal state that represents the solution. The problem space comprises five elements (cf. Newell & Simon, Citation1972, p. 810): (1) a set of knowledge states; (2) a set of cognitive operators that enable one to move between different knowledge states; (3) an initial state; (4) a goal state, and (5) the total knowledge available.

In the case of elementary probabilistic problems, the problem space consists of all sets of probabilities or frequencies that can be computed from the probabilities given in the problem description, making up the initial state, by applying elementary probabilistic operations together with elementary algebraic operations for manipulating equations. The problem can be solved if the goal state, consisting of the required set of probabilities, can be reached from the initial state by applying elementary probabilistic operations.

For other types of problems that can be solved by applying elementary probabilistic operations the states may be made up of relations between probabilities. For example, if the problem consists in proving a statement concerning the relationship between probabilities, like Pxy>Pxy¯Pyx>Pyx¯ (cf. Tversky & Kahneman, Citation1982a), the states consist of inequalities between probabilities with the initial state being made up by the left (or right) inequality of the equivalence statement. In this case too, the EPO together with elementary manipulations of equalities and inequalities enables one to reach the goal state, i.e., the right (left) hand side of the equivalence statement.

As with other problems conforming to a problem space representation, like the Tower-of-Hanoi, the different states of the problem space are not all available to the problem solver at the beginning of the problem-solving process. Rather, they are constructed during the process of problem solving by applying elementary probabilistic operations. Moreover, similarly to the Tower-of-Hanoi problem, problem solvers of elementary probabilistic problems may commit illegal moves. These consist in the application of operations other than the EPO. For example, about five percent of the participants in Study 1 of Gigerenzer and Hoffrage (Citation1995) used the multiply-all algorithm: P(e)P(h&e), and two percent applied the subtraction algorithm: PehPeh¯, where the symbol h denotes the hypothesis and e the evidence (cf. Gigerenzer & Hoffrage, Citation1995, Table 3 on p. 695). In general, these operations do not result in proper probabilities. In fact, the subtraction algorithm may even lead to negative values.

In the case of probabilistic reasoning the total possible available knowledge comprises the following elements: (a) elementary knowledge about probabilities, like the fact that probabilities are in the range between zero and one and that the point probabilities of the different values of a random variable sum up to one; (b) knowledge of the different types of probabilities and their relationship, including the different types of events (or sets) the probabilities refer to; (c) general strategies for solving elementary probabilistic problems; (d) quantitative inference schemas and conditions for their application; (e) subjective theorems; and (f) knowledge concerning elementary algebraic operations.Footnote3 Which of these knowledge elements actually make up the knowledge of the problem solver depends on her expertise. The knowledge elements available to the problem solver determine, together with the problem description, the representation of the problem, and, by consequence, the path taken through the problem space. However, the given list does not include the knowledge required for setting up the problem space, in particular, the knowledge that is necessary to identify the starting and goal state from the given problem description (see, e.g., Johnson & Tubau, Citation2015; Kintsch & Greeno, Citation1985, for a discussion of this aspect).

The sequence of computations in illustrates the strategy of working forward from the given probabilities to the required probability, a mode of problem solving that is typical for experts (Larkin et al., Citation1980). However, we actually solved the problem in a slightly different way. illustrates the goal hierarchy and the operations applied. The path through the problem space and the setting of sub-goals was determined by four pieces of knowledge: (a) knowledge concerning the definition of the required conditional probability (realizing the operation of conditioning): Phe1,e2=Ph,e1,e2/Pe1,e2, (b) the strategy (goal) to compute the relevant joint probabilities P(h,e1,e2) and P(h¯,e1,e2), (c) procedural knowledge of how to compute these quantities from given probabilities by means of combining the relevant pieces of information, and (d) knowledge on how to compute the probability P(e1,e2) from the joint probabilities by means of marginalization. Note that the strategy (goal setting) to compute the joint probabilities was prior to the computation of the marginal joint probabilities P(h,e1), P(h,e¯1), P(h¯,e1), and P(h¯,e¯1) that are required for computing the relevant joint probabilities. Thus, this is not a working-forward strategy in a strict sense. This example illustrates how knowledge about the probabilities and elementary operations, together with the strategy to compute joint probabilities determine the way through the problem space.

Elementary probabilistic operations and the classification of probabilistic problems

Probabilistic problems can be classified according to the computational requirements for arriving at a solution. The EPO framework provides a computational-level classification of problems by using the set of elementary probabilistic operations that have to be realized for solving a problem as the criterion. This classification imposes restrictions on any solution algorithm in that the latter has to implement the operations required for solving a specific type of probabilistic problem.

In the following, three basic probabilistic problem types will be considered: (1) total evidence problems, (2) conditioning problems, and (3) Bayesian problems. Each of the three problem types has been considered in the literature on probabilistic reasoning, and each of these problem types works with subjective and objective probabilities. We also discuss why the distinction between different types of elementary probabilistic problems is important for psychological research on probabilistic problem solving.

Total evidence problems

Total evidence problems can be solved by applying the elementary operations of combination and marginalization. The conditional probabilities PX1,X2,,XnY1,Y2,,Ym and marginal probabilities P(Y1,Y2,,Ym) are provided in the problem description, and the marginal (joint) distribution P(X1,X2,,Xn) has to be computed. In its simplest form, the target distribution (or target probability) involves a single variable only. Shafer and Tversky (Citation1985) provide a total evidence problem (which they call total evidence design) involving subjective probabilities (for another example of a total evidence problem, involving objective probabilities, see, e.g., Diaz & Batanero, Citation2009; Item 14). exhibits an example with four hypothetical scenarios together with the conditional probabilities of the target event given the different scenario as well as the marginal probabilities of the scenarios. The required probability concerns the target event r = Recovery of the economy from the Asian disease. The four hypothetical scenarios A-D, resulting from the combination of the values of two binary variables Y and Z, are exclusive and exhaustive. The probability of the target event can be found by combining (multiplying) the conditional and marginal probabilities of each scenario resulting in the joint probabilities of scenarios and the target event. The respective joint probabilities are: P(r,A)=P(r,y,z)=0.36, P(r,B)=P(r,y,z¯)=0.12, P(r,C)=P(r,y¯,z)=0.075 and P(r,D)=P(r,y¯,z¯)=0.0125. Applying the process of marginalization by summing up these joint probabilities gives the probability of the target event: P(r)=0.5675.

Table 1. Example of a total evidence problem (hypothetical scenarios and associated probabilities): four exclusive and exhaustive scenarios resulting from the combinations of the levels of two binary variables Y and Z.

Conditioning problems

Conditioning problems require marginalization and conditioning. The given information consists in the joint probability distribution P(X1,X2,,Xn) and the required distribution is the conditional distribution PX1,,XkXk+1,,Xn of the target variables X1,,Xk, given the other variables Xk+1,,Xn. In order to solve this type of problems, the problem solver has to apply the operation of marginalization first by summing over the target variables. Subsequently, the operation of conditioning has to be performed. Typical examples of conditioning problems concern computations within contingency tables. exhibits imaginary data concerning the joint distribution of the variables gender (G), extramarital sex (S) and divorce (D). Assume that we want to know the conditional probabilities of a divorce in the case of extramarital sex for men vs. women: PD=yesS=yes,G=female and PD=yesS=yes,G=male. These probabilities can be determined by first computing the marginal probabilities, P(S=yes,G=female) and P(S=yes,G=male) by marginalizing over variable D (cf. , last column) The required conditional probabilities are then computed by means of conditionalization, that is, dividing the joint probabilities P(D=yes,S=yes,G=female) and P(D=yes,S=yes,G=male) by the respective marginal probabilities, for example: P(D=yes|S=yes,G=female)=P(D=yes,S=yes,G=female)P(S=yes,G=female)=P(D=yes,S=yes,G=female)P(D=yes,S=yes,G=female)+P(D=no,S=yes,G=female).

Table 2. Example of a conditioning problem: Joint distribution of the variables gender, extramarital sex, and divorce (imaginary data).

The last term on the right-hand side integrates the operations of marginalization and conditioning into a single mathematical expression. Inserting the respective joint and marginal joint probabilities from gives: PD=yesS=yes,G=female=0.07/0.08=7/8 and PD=yesS=yes,G=male=0.02/0.03=2/3 .

The class of conditioning problems is one of the most intensely investigated type of probabilistic problems. In experiments on probabilistic reasoning with two binary variables, H (hypothesis) and E (evidence), the joint distribution, that is, the full set of joint point probabilities, P(h,e), P(h¯,e) P(h,e¯), and P(h¯,e¯), has been provided in different ways, for example, by means of contingency tables of frequencies or graphical information (see, e.g., Binder et al., Citation2015; Böcherer-Linder & Eichler, Citation2019). However, for the most commonly investigated conditioning problem the joint probability information is presented in terms of natural frequencies (Gigerenzer & Hoffrage, Citation1995; Kleiter, Citation1994). presents the commonly used general diagnosis problem with two binary variables in probability and natural frequency (NF) format.

Bayesian problems

Bayesian problems are characterized by requiring a complete set of probabilistic operations to obtain a solution. Thus, each Bayesian solution algorithm has to implement each of the three EPO. Prototypical examples are classical Bayesian problems involving two binary variables, like the general diagnosis problem in probability format (cf. ). In this case, the conditional probabilities (likelihoods), Peh and Peh¯, as well as the marginal (prior) probability P(h) [and, possibly, also P(h¯)] are provided and the problem solver has to compute the inverse conditional (posterior) probability Phe. This constitutes the classical Bayesian updating scenario where prior probabilities are updated using the likelihood information.

There exist, however, different variants of Bayesian problems depending on which probabilities are given in the problem description. For example, with two bivariate variables the probabilities Peh, P(h), and the joint probability P(e,h¯) may be provided (Huerta, Citation2014). In this case, the solution Phe also requires the execution of each of the elementary probabilistic operators. One important aspect of Bayesian problems consists in the fact that, in the general case, the solution can only be obtained by computing the (relevant) joint probabilities.Footnote4 Consequently, each path in the problem space from the initial state to the goal state leads through a state comprising the relevant joint probabilities that are required to perform the operations of marginalization and conditioning.

Probabilistic problem structures and natural frequency format

The presented classification regards probabilistic problems in NF format as conditioning problems and not as Bayesian problems since the joint frequencies are presented, making the combination of information needless. There has been a debate of whether solving Bayesian problems with probabilistic information presented in NF format can be regarded as Bayesian reasoning. Howson and Urbach (Citation1993) argued that this is not the case because the usage of natural frequencies makes probabilistic reasoning simpler. On the other hand, it has been argued that the essence of Bayesian reasoning consists in the fact that the probabilistic reasoning process achieves the same function as the application of Bayes theorem (Brase, Citation2002). Furthermore, Brase and Hill (Citation2015) claim that considering Bayesian reasoning as an application of the Bayes theorem is scientifically unproductive, and that their more general specification of Bayesian reasoning as a process of adaptively updating prior probabilities in the face of new information to arrive at a posterior probability, is more useful from a cognitive perspective.

From the perspective of elementary probabilistic operations, two related aspects have to be considered, a normative and a psychological one. The first one concerns the computational requirements of algorithms implementing the probabilistic operations. As noted above, classical Bayesian updating conceptualizes Bayesian reasoning as the revision of prior belief by incorporating new information. An important step consists in the combination of the prior beliefs with the new evidence. In the case of information presented in NF format this combination is performed in the problem description and not by the solution algorithm. Consequently, it is not sensible to classify solution algorithms for problems with information presented in NF format as Bayesian.

Concerning the psychological point of view, it is important to differentiate between Bayesian and conditioning problems because it is psychologically relevant which mental operations have to be performed in order to arrive at a correct solution (see also Ayal & Beyth-Marom, Citation2014). As detailed below (cf. the section Simplification of the Problem Structure: Reducing Bayesian Problems to Conditioning Problems), problems in NF format require less conceptual and procedural competencies. Consequently, Bayesian problems in NF format should not be considered as Bayesian problems at all.

Bayesian problems and subjective theorems

For most Bayesian problems, like the general diagnosis problem in standard probability format, the solution requires the execution of EPO. By contrast, for the famous Monty Hall problem (vos Savant, Citation1990) as well as the problem of the three prisoners (Mosteller, Citation1965), and structurally equivalent problems, it is possible to arrive at a correct solution by applying subjective theorems, i.e., simple inferential schemas that can be used for solving the problem (Falk, Citation1992; Shimojo & Ichikawa, Citation1989). This is possible because the probabilities involved are quite simple: The prior distribution is uniform and the conditional probabilities are either 0, 1, or 1/2. By consequence, employing a subjective theorem may result in a correct solution without applying elementary probabilistic operations.

The crux of this type of problems consists in the fact that it evokes subjective theorems that lead to an incorrect solution. Specifically, the subjective theorem constant ratio is evoked (Shimojo & Ichikawa, Citation1989, p. 7): If one alternative (out of three) is eliminated the ratio of probabilities of the remaining alternatives is the same as the ratio of the prior probabilities for them. This results in the incorrect answer of 1/2 instead of 1/3, assuming that the host in the Monty Hall problem or the jailer in the problem of three prisoners has no preference for one of the alternatives (see also Baratgin & Politzer, Citation2010).

By contrast, the subjective theorem irrelevant therefore invariant (cf. Shimojo & Ichikawa, Citation1989, p. 7) that demands the problem solver to ignore the conditioning event results in a correct solution. Thus, Krauss and Wang (Citation2003) propose to teach the application of this subjective theorem. The fact that the subjective theorem irrelevant therefore invariant (as well as other simple procedures, like repeated playing of the game and taking the view of the different members in the game (Tubau & Alonso, Citation2003; Tubau et al., Citation2015)) leads to a correct solution seems to contradict our previous claim that, in general, a correct solution of Bayesian problems requires the computation of joint probabilities. However, that this is not the case is revealed by the fact that simple modifications of these problems lead to a failure of the irrelevant therefore invariant theorem. Assume for the moment, that in the Monty Hall problem the probabilities of the three Doors A, B, and C, locating the car, are 1/4, 1/4, and 1/2 (instead of 1/3, for each door) then the subjective theorem does no longer work since the correct posterior of the car being located behind the chosen Door A, given the host opens Door C, is not 1/4, as proposed by the theorem, but 1/3. Similarly, if the host in the Monty Hall problem has a preference to open a specific door (i.e., he will always open this door if possible) then the theorem does not work. Thus, in general, the computation of joint probabilities, and, by consequence, the application of the operation of combination, is mandatory for solving Bayesian problems.

On the relative difficulty of problem classes

As explicated above, Bayesian problems demand the computation of the joint probabilities (frequencies). By consequence, Bayesian problems may, in general, be considered as more difficult than conditioning problems where the joint probabilities are provided. The facilitating effect of presenting probabilistic information in terms of joint frequencies or in NF format seems to support this conclusion.

However, the difficulty of an elementary probabilistic problem does not only depend on the specific types of elementary probabilistic operations involved but also on how many operations are required and how familiar participants are in performing an elementary probabilistic operation in a specific way. Consequently, presenting problems that require additional elementary probabilistic operations can render conditioning problems more complex than Bayesian problems (see, e.g., Ayal & Beyth-Marom, Citation2014, and Study 5 of Girotto & Gonzalez, Citation2001). In sum, the different problem types do not allow for a strict ordering with respect to problem difficulty. Following to this computational analysis of probabilistic problems, we, next, apply the EPO framework to explicate why and how specific methods for improving Bayesian problem-solving work.

Elementary probabilistic operators and methods for improving Bayesian reasoning

Due to the difficulty of Bayesian problems for people without training there have been numerous efforts to improve Bayesian reasoning using different methods. The EPO framework provides new explanations of why and how these methods work. We discuss five attempts to improve Bayesian reasoning: (1) reduction of Bayesian problems to conditioning problems by using NF, (2) problem descriptions fostering the combination of information, (3) methods that facilitate the operations of marginalization and conditioning, (4) the effect of the format of the probabilistic information: frequencies vs. probabilities, and (5) individual characteristics, like numeracy, level of education, and cognitive ability that may result in improved probabilistic reasoning.

Preliminary considerations: types of competencies in mathematical problem solving and quantitative inference schemas

Greeno et al. (Citation1984) identify three types of competencies that are important in mathematical problem solving: conceptual, procedural and utilizational. These competencies are not independent but may develop in a parallel way such that increasing one type of competence leads to an improvement of the other types of competencies (see, e.g., Rittle-Johnson et al., Citation2001). With respect to elementary probabilistic problem solving, conceptual competence concerns knowledge about different types of probabilities and their relationship, as well as of strategies. Procedural knowledge is concerned with the probabilistic operations and their concrete implementation. Finally, utilizational competence consists in knowledge concerning conditions for the proper application of specific realizations of EPO, in particular of the quantitative inference schemas. According to the EPO framework, successful methods for improving Bayesian reasoning can be classified into two non-exclusive categories: (a) simplification of Bayesian problems such that people with lower conceptual and procedural competencies are able to solve the problem, and (b) methods improving the utilization competence resulting in an increased application of quantitative inference schemas.

Before discussing previous findings, we illustrate how the problem formulation and the probability format may result in an increased use of quantitative inference schemas. Consider the two problem descriptions in . Both descriptions are Bayesian problems since, in both cases, the solution requires the application of all three elementary operations. The EPO framework predicts that the first problem description results in a higher solution rate than the second one. This is due to the fact that the first description increases the utilization competence by fostering the application of the proper quantitative inference schemas. This is based on the interplay of four aspects: First, the usage of frequencies: The sample size is given in the first but not in the second description, increasing the availability of acquired quantitative schemas working with frequencies. In addition, the numbers are simple, and, thus, the frequencies are easy to compute. Second, the first problem is framed as a selection problem, that is, subsets are selected from the sets of red and green apples. This should foster the application of the quantitative selection schema since people know how to compute the frequency of the entities within a subset selected from the whole set. By contrast, with diagnostic framing a diagnostic feature is introduced according to which items are selected. This renders the selection of the relevant subsets less transparent. In addition, diagnostic framing increases the likelihood of confusing the sensitivity Peh with the positive predictive value Phe. This error is frequently committed in the case of diagnostic framing with participants revealing high confidence of being correct (Lakhlifi et al., Citation2023). Third, the fact that the selected apples are put into the same basket encourages the focusing on these apples in the basket. Fourth, the question in the first description asks for a proportion and not for a probability. The latter two aspects promote the application of the quantitative proportion schema on the apples in the basket.

Following to this illustration of how the problem description and probability format can improve Bayesian reasoning by fostering the application of quantitative inference schemas, we discuss results concerning different attempts to enhance Bayesian problem solving. In particular, we show how the different methods foster the application of quantitative inference schemas that implement the relevant EPO. We start with an approach that simplifies the problem structure thus reducing the demand on conceptual and procedural competencies.

Simplification of the problem structure: reducing Bayesian problems to conditioning problems

One method to improve Bayesian reasoning consists in presenting joint probabilities. This reduces Bayesian problems to conditioning problems since no combination of information is required with this information structure. The most frequently used method for providing joint probability information consists in presenting the probabilistic information in NF format (see ). In this case, the joint frequencies and the embedding of the joint events, e&h and e&h¯, within the marginal ones, h and h¯, are presented. However, the information concerning the embedding of events, revealed by NF format, is irrelevant for solving the problem. A different type of nesting of events is important for performing the operations of marginalization and conditioning (cf. the section Methods Facilitating Marginalization and Conditioning).

The presentation of joint frequencies reduces the demand on procedural competence since the solution algorithms are simplified (see, e.g., Gigerenzer & Hoffrage, Citation1995, Citation2007). This is due to the fact that the presentation of the relevant joint probabilities, P(e&h) and P(e&h¯), renders the combination of probabilistic information superfluous. Only the operation of marginalization, P(e)=P(e&h)+P(e&h¯), and conditioning, Phe=Pe&h/Pe, have to be performed. The NF format also reduces the demand on the conceptual competence since knowledge concerning the importance of joint probabilities for solving Bayesian problems is not required: The problem solver does not need to know that relevant joint probabilities have to be computed for solving Bayesian problems.

Improving Bayesian problem solving by fostering the combination of information

The issue of improving participants’ performance on Bayesian problems by fostering the application of the quantitative partitioning/selection schema has, to the best of our knowledge, not been addressed so far. There are, however, studies that may be regarded as instances of promoting the application of the quantitative partitioning/selection schema. In particular, in their first study Johnson and Tubau (Citation2013) provided a Bayesian problem description similar to the example, given above (cf. ). Specifically, they presented a problem with a selection frame in a frequency and probability format. With frequency format, the frequencies of basic classes (three colors of apples), allowing for simple computations, were provided and the question asked for frequencies. With probability format the probabilities of the three basic classes were presented and the question asked for a probability. The solution rates were 86% with frequency and 27% with probability format (cf. Johnson & Tubau, Citation2013, , p. 36). This result indicates that, in most cases, the presentation of problems as selection problems is not sufficient to foster a problem schema that permits a correct solution. Note however, that the solution rate of 27% with probability format is still about as high as the 24% found with problems in natural sampling format (McDowell & Jacobs, Citation2017). By contrast, presenting, in addition, the problem in frequency format (with simple numbers) resulted in an increase in the solution rate of over 50% (compared to probability format) indicating the fostering of adequate problem schemas, including the quantitative selection schema. However, further research is required to understand precisely the conditions, under which the presentation of Bayesian problems in a selection/partitioning frame promotes the application of the partitioning/selection schema.

Methods facilitating marginalization and conditioning

Despite the fact that the presentation of joint probabilities increases the rate of correct solutions the improvements are far from being perfect. This is not surprising since the problem solution requires the application of the operations of marginalization and conditioning. The various attempts to foster these processes are of two types: (a) Guiding participants through the process of marginalization and conditioning by posing additional questions, and (b) increasing the saliency of the relevant nested set structure.

The method of asking participants to compute the marginal probability P(e) of the conditioning event e has been applied by Girotto and Gonzalez (Citation2001) as well as by Ottley et al. (Citation2016). After having presented the joint probabilities P(e&h) and P(e&h¯) in terms of natural frequencies, Girotto and Gonzalez (Citation2001) asked participants to first compute the marginal probability P(e) and then the conditional probability Phe by presenting the following question

In a group of 100 people, one can expect ____ individuals to have a positive reaction, ____ of whom will have the infection.

This forces participants to first perform the operation of marginalization followed by the operation of conditioning. In particular, asking participants to compute the marginal frequency prompts participants to apply the quantitative concatenation schema. The second part of the question promotes the application of the quantitative conditioning schema implementing the operation of conditioning. The same strategy of first asking for the relevant marginal probability has been applied by Ottley et al. (Citation2016, p. 532). The results of both studies reveal a positive effect of guiding participants through the processes of marginalization and conditioning by means of questioning.

The second approach to promote marginalization and conditioning consists in making the relevant nested set structure more salient. As noted above, the nested set relationships revealed by the NF format is irrelevant for solving the problem since only the joint probabilities are required. The nesting of sets revealed by the NF format does not provide any indication concerning the further path to the solution state. By contrast, the nested set relationship between the marginal event e and the joint events e&h and e&h¯ is important for performing the operation of marginalization and conditioning. Consequently, various methods have been applied to enhance the saliency of the relevant nesting. These methods may be divided into two types: First, enhancing the nested set relationship by using a problem description that make the embedding salient, and second using visual aids.

The first approach was taken by Brase et al. (Citation1998). They increased the transparency of the nesting of sets by means of specific problem descriptions. For example, in Experiment 2 in the non-salient condition the probability P(e) represented the number of candy cane halves, whereas P(e&h) and P(e&h¯) represented, respectively, the number of candy cane halves that taste peppermint and lemon. In the salient condition, P(e) referred to the number of candy canes and the joint probabilities, P(e&h) and P(e&h¯), represented the number of peppermint and lemon candy canes. Obviously, the salient condition renders the nesting relationship more transparent. This method and other ones used by Brase et al. (Citation1998) to increase the saliency of the nested set relationships turned out as being successful. According to the EPO framework, increasing the saliency of the nested set structure fosters the application of the quantitative proportion schema (the combined marginalization-and-conditioning schema): To get the proportion of peppermint candy canes, the number of peppermint candy canes has to be divided by the total number of candy canes. With candy cane halves the nested set relationship is less salient, thus resulting in a reduced application of the schema. In their third experiment, the candy canes of the same flavour were put in the same jar thus prompting participants to focus on the respective jar. As noted in the example above, this may foster the application of the marginalization and proportion schema, specifically in case of employing a frequency format.Footnote5

A second method to increase the saliency of the nested set relationship relevant for performing the operation of conditioning consisted in using Venn diagramsFootnote6. In particular, in the diagrammatic representation of Sloman et al. (Citation2003, on p. 298) the event e&h is represented by a circle that is contained within a bigger circle representing event e, thus depicting the nesting of the first set within the second one (see also Barbey & Sloman, Citation2007; on p. 249). Whereas Sloman et al. (Citation2003) found a facilitating effect of using Venn diagrams, Brase (Citation2009) and Moro et al. (Citation2011) found no such effect. The differing results of Sloman et al. (Citation2003) and Brase (Citation2009) may be due to the different forms of the Venn diagrams: The Venn diagrams of Brase (Citation2009) comprises two overlapping circles, one depicting the sub-population of people having the disease, i.e. representing event h, and the second circle representing the sub-population of persons with a positive diagnosis, i.e. representing event e (cf. Brase, Citation2009, , on p. 374). Consequently, the embedding of event e&h within event e is less transparent in the diagram of Brase (Citation2009) compared to that of Sloman et al. (Citation2003).

The divergent result of Moro et al. (Citation2011) cannot be explained by differing diagrammatic representations of the sets involved (cf. Moro et al. Citation2011, ). The major difference to the study of Sloman et al. (Citation2003) concerns the expertise of the participants: The solution rates in Sloman et al. (Citation2003) were considerably higher than those of Moro et al. (Citation2011) for exactly the same problem formulations. For example, in Exp. 1 of Sloman et al. (Citation2003) the solution rates were 20% for Bayesian problems in probability (P) format, and 48% for Bayesian problems in P format with an additional text that should make the nesting of the sets transparent. The respective solution rates in Moro et al. (Citation2011, Exp. 1) were 0% for both versions. Similarly, the solution rate for the problem in P format with an accompanying Venn-diagram was 48% in Exp. 2 of Sloman et al. (Citation2003) and 0% in Exp.2 of Moro et al. (Citation2011). This suggests that the facilitating effect of using Venn-diagrams is moderated by participants’ expertise. A similar result was observed concerning the effect of participants’ numeracy on solution rates (cf. the section Numeracy, Level of Education, Cognitive Ability, and Bayesian Reasoning).

Taken together, we discussed three attempts to foster the operation of marginalization and conditioning in the case of conditioning problems: (a) guiding participants through the operations of marginalization and conditioning, (b) using problem descriptions that make the nested-set relationship more salient, and (c) employing Venn-diagrams depicting the relationships between different sets. The first two methods turned out as being successful. The facilitating effects of Venn-diagrams seems to be moderated by participants’ expertise with Bayesian problems.

Improved probabilistic problem solving with frequencies

A current debate concerns the issue of whether presenting probabilistic information in terms of frequencies instead of probabilities results in better reasoning performance. The debate comprises two aspects: The first one is concerned with the concrete wording, specifically, whether the usage of the term frequency results in higher correct solutions than the usage of the term chances (see, Brase, Citation2008, Citation2014; Girotto & Gonzalez, Citation2001; Sirota, Kostovičová, et al., Citation2015).

The second aspect that is (more) relevant with respect to the EPO framework concerns the issue of whether the presentation of frequencies results in better performance than the usage of probabilities. Despite the fact that some studies found no facilitating effect of presenting absolute frequencies (e.g., Fiedler et al., Citation2000; Macchi, Citation2000), there is overwhelming evidence of the beneficial effects of absolute frequencies, even if these are not natural frequencies (see, e.g., Ayal & Beyth-Marom, Citation2014; Cosmides & Tooby, Citation1996; Johnson & Tubau, Citation2013). In addition, teaching probabilistic reasoning using frequencies is more efficient than using probabilities (see the discussion below).

The EPO approach explains the beneficial effects of employing frequencies instead of probabilities by the higher capacity of the former to foster the applications of quantitative inference schemas. This should also be the case with absolute frequencies that are not joint frequencies (see the example presented above) since people have learned the application of these problem schemas with frequencies. This explains why the difference between single event probabilities and frequencies is of psychological relevance in the case of probabilistic reasoning (Gigerenzer, Citation1994): The efficiency of cognitive algorithms, i.e., quantitative inference schemas, depends on the underlying representation of the probabilistic information.

Numeracy, level of education, cognitive ability, and Bayesian reasoning

Several studies examined the association between numeracy, level of education, and cognitive ability, on the one hand, and Bayesian reasoning with different probability formats, on the other hand. The EPO framework, together with the assumption that people with no training do not understand the importance of the joint distribution for solving Bayesian problems, predicts low solution rates for abstract unfamiliar Bayesian problems in probability (P) format, like the two-variable diagnosis problem of . Furthermore, since the scales for measuring numeracy (Lipkus et al., Citation2001; Schwartz et al., Citation1997) do not measure this type of conceptual knowledge, we expect that numeracy exerts no or only a small effect on solution rates for Bayesian problems in P format. For the same reasons we also assume no or only small effects of level of education and cognitive ability on solution rates.

By contrast, for problems in NF format or simpler and more familiar problems we expect an effect of numeracy, level of education, and cognitive ability, for different reasons: with NF format conceptual knowledge concerning the joint distribution is not required, and for simpler and more familiar problems, there is an increased likelihood that the problem description elicit quantitative inference schemas. This tendency to apply quantitative knowledge is expected to be higher for people high on numeracy, with a higher level of education, and with higher cognitive ability.

These predictions are supported by several studies. First, various studies reveal low performance for Bayesian problem in P format. For example, solution rates of about 2% were obtained by Chapman and Liu (Citation2009). Likewise, Siegrist and Keller (Citation2011) observed one correct solution out of 134 in Exp. 1, and one out of 71 in Exp. 2. Hill and Brase (Citation2012, Exp. 3) and Moro et al. (Citation2011, Exp. 1) found floor effects with zero correct solutions for diagnostic Bayesian problems in P format. Second, a number of studies found no increase in the solution rates with numeracy for problems in P format (Bramwell et al., Citation2006; Brase & Hill, Citation2017, Exp. 1; Chapman & Liu, Citation2009; Hill & Brase, Citation2012, Exp. 2 and 3, medical problem; Johnson & Tubau, Citation2013, complex problem formulation). Similarly, Siegrist and Keller (Citation2011, Exp. 1 and 2) observed no increase with level of education for unfamiliar problems, and Stanovich and West (Citation2000) found no effects of cognitive abilities for the diagnosis problem of Casscells et al. (Citation1978) and for the cab problem (Tversky & Kahneman, Citation1982b). Third, for simpler and more familiar problems or problems in NF format facilitating effects of numeracy (Brase & Hill, Citation2017; Chapman & Liu, Citation2009; Garcia-Retamero & Hoffrage, Citation2013; Hill & Brase, Citation2012, Exp. 3; Johnson & Tubau, Citation2013, Exp. 1, simple problem formulation), and education (Bramwell et al., Citation2006; Siegrist & Keller, Citation2011, Exp. 1 and 3) have been observed.

However, the results of some studies are in opposition to this general pattern of results. For example, Brase and Hill (Citation2017, Exp. 2) and Garcia-Retamero and Hoffrage (Citation2013) found an effect of numeracy also for problems in P format, whereas Micallef et al. (Citation2012) found no relationship between numeracy and errors for problems in NF format. Moreover, Sirota et al. (Citation2014) observed significant correlations between solution rates for problems in P format and various measures of cognitive ability.

The observed discrepancies may be due to a potential confounding between numeracy, level of education, and cognitive ability on the one hand, and the degree of experience with Bayesian problems, on the other hand. This might also explain the fact that the effects of numeracy, level of education, and cognitive abilities are generally associated with an overall higher performance on Bayesian reasoning problems with P and NF format. To illustrate, in the study of Garcia-Retamero and Hoffrage (Citation2013) professionals (physicians) with higher numeracy revealed higher solution rates also for problems in P format. By contrast, professionals (obstetricians) in the study of Bramwell, et al. (Citation2006) did not perform better than their clients for problems in P format. Notably, the solution rates of the professionals for problems in P format differed substantially between both studies: about 20% in the study of Garcia-Retamero and Hoffrage (Citation2013) and 5% in the study of Bramwell et al. (Citation2006). This indicates a greater experience with Bayesian reasoning problems of the physicians compared to the obstetricians. In fact, the obstetricians of Bramwell et al. (Citation2006) performed about equally well as the patients of Garcia-Retamero and Hoffrage (Citation2013).

This sort of confounding might also explain conflicting results concerning facilitating effects of cognitive abilities for Bayesian problems in P format. Since the time the studies of Stanovich and West (Citation2000) have been conducted, Bayesian reasoning problems have found increased access to curricula in High Schools (see, e.g., Chernoff & Sriraman, Citation2014; Gage & Spiegelhalter, Citation2018; Wassner et al. Citation2004; Weber et al., Citation2018), and receiving some training in Bayesian reasoning has certainly a greater impact for high ability participants.

In sum, the EPO framework can predict a great number of findings concerning the effect of numeracy, level of education, and ability on Bayesian problem solving with different probabilistic formats. Conflicting results could be due to confounding between numeracy, level of education, and cognitive abilities, on the one hand, and expertise with Bayesian problem solving, on the other hand.

Teaching Bayesian problem solving

According to the present view, except for specifically designed problems like those used in the first study of Johnson and Tubau (Citation2013), people with no training neither have the conceptual nor the procedural and utilizational competence to solve Bayesian problems. This means that they neither have a full understanding of the importance of joint probabilities for solving Bayesian problems, nor are they in possession of detailed knowledge concerning the EPO and their proper application (for a similar view, see Girotto & Pighin, Citation2015).

There have been various attempts to teach Bayesian reasoning. These endeavors focus on teaching how to combine probabilistic information. This corroborates our claim that combining probabilistic information constitutes a central aspect of Bayesian problem solving. In particular, all of these methods teach algorithms implementing the partitioning/selection schema with different types of quantities. The different algorithms and quantities comprise: (a) teaching natural sampling, that is, instructing people how to translate probabilities into natural frequencies (Kurzenhäuser & Hoffrage, Citation2002), (b) constructing frequency and probability trees (see, e.g., Gage & Spiegelhalter, Citation2018; Sedlmeier & Gigerenzer, Citation2001), (c) a combination of natural sampling and frequency trees (Hoffrage et al., Citation2015), (d) filling in frequency grids (Sedlmeier & Gigerenzer, Citation2001), (e) filling contingency tables with frequencies (Talboy & Schneider, Citation2017), and (f) partitioning of a square resulting in a mosaic plot with frequencies as labels (Talboy & Schneider, Citation2017).

Concerning the relative efficiency of different methods, the following results were obtained: Frequencies result in superior performance and memory compared to probabilities (Sedlmeier & Gigerenzer, Citation2001). Similarly, filling in contingency tables with frequencies leads, in general, to a higher transfer performance than creating mosaic plots (Talboy & Schneider, Citation2017). Finally, presenting natural frequencies and instructing people how to use them to construct a frequency tree, and get the solution from the latter leads to better performance than teaching natural sampling (Hoffrage et al., Citation2015). These results confirm the superiority of frequencies as quantities on which to perform Bayesian problem solving.

General discussion

The presented EPO framework characterizes elementary probabilistic problems as problems involving a finite sampling space that can be solved by applying elementary probabilistic operations: combination, marginalization, and conditioning. Consequently, the problem space underlying elementary probabilistic problems, with the states made up of sets of probabilities, is closed with respect to the proper application of the three elementary probabilistic operations. Importantly, the three elementary operations also work with quantities other than probabilities, and they can be implemented by means of different algorithms. Therefore, the problem states may be made up of sets of quantities other than probabilities. The given characterization of elementary probabilistic problems differentiates them from problems that test for statistical intuitions (see, e.g., Nisbett, et al., Citation1983) as well as from statistical problems requiring the computation of statistics or parameters (see, e.g., Catrambone, Citation1998; Falk & Lann, Citation2008). In addition, elementary probabilistic problems differ from more complicated probabilistic problems, like combinatorial ones that require difficult counting procedures, problems demanding more or less sophisticated set operations, or problems requiring limit computations. In the following, we first discuss how explanations of the EPO framework concerning the effects of methods for improving probabilistic reasoning differ from traditional explanations. This is followed by a discussion of new applications of the EPO approach.

Comparing explanations based on the EPO framework to traditional explanations

According to the EPO framework, methods for improving probabilistic reasoning are of two different types: (a) simplification of problems (reducing Bayesian problems to conditioning problems) and (b) increasing utilization competence by promoting the application of acquired problem schemas. Traditional explanations are concerned primarily with facilitating effects using natural frequencies (NF). Three types of explanations have been put forward: (a) the reduction of computational complexity, (b) the ecological rationality (ER) account and (c) the nested-set (N-S) explanation. The reduction of computational complexity account states that for problems in NF format computations are simplified thus increasing the solution rates (see, e.g., Gigerenzer & Hoffrage, Citation1995, Citation2007). This account is in line with the EPO framework. However, the later provides a precise explanation of why this simplification occurs: with NF, the joint frequencies are presented in the problem formulation, and, consequently, the operation of combination need not be performed. Additionally to the lower demand on procedural knowledge, the EPO account identifies a lower requirement with respect to conceptual competencies since knowledge concerning the importance of the joint distribution for solving Bayesian problems is not required with NF format.

The ER account claims that people are able to perform Bayesian computations if provided with the correct type of probabilistic information. Natural frequencies provide this type of information. This explanation is based on evolutionary arguments that claim that our ancestors encountered probabilistic information in natural frequency format. Consequently, inferential algorithms are tuned to this format (see, e.g., Cosmides & Tooby, Citation1996; Gigerenzer & Hoffrage, Citation1995). The ER explanation differs from the EPO account in two respects: First, according to the EPO framework, problems presented in the NF format are not Bayesian problems at all. Consequently, increased solution rates are no indication of people being able to perform Bayesian reasoning. Second, using the principles of evolutionary educational psychology (e.g., Geary, Citation2002; Tricot & Sweller, Citation2014), quantitative inference schemas do not constitute biological primary knowledge. This means that they are the product of cultural learning and not acquired automatically.

The N-S explanation claims that the NF format reveals the set structure of the problem and makes the relationship between the nested sets salient. This fosters the understanding of the problem (see, e.g., Barbey & Sloman, Citation2007; Evans et al., Citation2000; Mellers & McGraw, Citation1999; Sirota, Kostovičová, et al., Citation2015; Sloman et al., Citation2003). According to the EPO account, the nesting of sets made salient by the NF format (e&h within event h as well as that of e&h¯ within h¯) is not relevant for solving the problem as soon as the joint probabilities, P(e&h) and P(e&h¯), are provided. Thus, the nested set relationship exhibited by NF format is of no help for solving the problem. The only nesting that is relevant for solving problems with NF concerns the embedding of the joint events e&h and e&h¯ within the event e (see the discussion in the section Methods Facilitating Marginalization and Conditioning). However, this type of embedding is not revealed by the NF format.

According to a dual-process version of the N-S account, presenting probabilistic information in NF format results in an activation of the rule-based system leading people to process information of the nested set structure (Barbey & Sloman, Citation2007).Footnote7 The EPO approach accords with the account of Barbey and Sloman (Citation2007) in one respect: The former claims that specific problem descriptions can foster the application of quantitative inference schemas which are part of people’s acquired rule-based reasoning system. This aspect is, however, of less importance with respect to facilitating effects of the NF format, compared to the accompanying conceptual and procedural simplifications. This is evidenced by the fact that facilitating effects were also observed by presenting joint probabilities instead of natural frequencies (see, e.g., Fiedler et al., Citation2000; Macchi, Citation2000; Mellers & McGraw, Citation1999). In particular, Ottley et al. (Citation2016) found no difference between a problem formulation presenting the complete probabilistic information (i.e., all marginal and joint frequencies) and a structured problem version text that makes the embedding of different sets more salient. The respective solution rates were about 73% and 77% for the unstructured and structured version, respectively. Finally, Moro et al. (Citation2011, Exp. 3) investigated people’s knowledge concerning the relationship between different sets involved in Bayesian problems. Their results indicate that, in general, people had a good understanding of the relationship between the sets involved in the problems. However, this had no effect on the solution rates. Thus, the facilitating effect of the NF format is primarily due to the presentation of the (relevant) joint probabilities and not due to the nesting information.

Further applications of the EPO framework

In this section we discuss three new applications of the EPO framework: Usage of analogies in probabilistic problem solving, teaching probabilistic reasoning, and error analysis.

Analogical probabilistic problem solving

The EPO framework views probabilistic problem solving as analogous to solving other quantitative inference problems that require the application of quantitative operations. According to this conception, probabilities or frequencies are, like length, area, volume, and weight, but specific instances of a general quantity construct. Probabilities differ from other types of quantities only by the fact that the maximum possible value is one.Footnote8 The fact that quantitative inference schemas are applicable to different types of quantities, suggests that, similar to other types of problems, analogies may be helpful for solving probabilistic problems (e.g., Gick & Holyoak, Citation1980, Citation1983; Holyoak & Koh, Citation1987). Specifically, presenting analogical problems involving quantities other than probabilities may help people to solve probabilistic problems. exhibits an example of an analogical problem that involves volume as the target quantity and requires the application of quantitative inference schemas. Solving analogical quantitative reasoning problems should increase the availability of the respective quantitative inference schemas, and, by a consequence, promote their utilization in subsequent probabilistic reasoning tasks. This requires that people are able to identify the relevant structural similarities between the quantitative source problem and the probabilistic target problem. Quantitative analogies may be most helpful for Bayesian problems that are framed as selection problems rather than diagnostic problems (cf. ). Future research may investigate the facilitating effect of analogies in probabilistic reasoning.

Figure 5. Quantitative analogy problem.

Figure 5. Quantitative analogy problem.

Teaching probabilistic reasoning

The EPO approach may be helpful in teaching elementary probabilistic reasoning (for various approaches to teaching probabilistic reasoning, see, e.g., Gage & Spiegelhalter, Citation2018; Chernoff & Sriraman, Citation2014). This is due to the fact that the EPO approach is basically a computational level theory, focusing on the computational requirements for solving a specific type of probabilistic problem. In this way, it differs from traditional approaches that are located on the algorithmic level, focusing on specific procedures.

In particular, two aspects of the EPO approach seem useful in teaching probabilistic reasoning. First, the conception of probabilities as specific types of quantities subject to the same elementary operations. Current approaches focus primarily on frequencies (e.g., Gage & Spiegelhalter, Citation2018). However, frequencies have their limitations, like the requirement of huge sample sizes in the case of small probabilities or difficulties associated with incompatible sample sizes (cf., Ayal & Beyth-Marom, Citation2014, Study 3). For other quantities that are not based on whole numbers the problem of the hugeness of numbers does not appear. In addition, the relationship between probabilities and other quantities explicates the usefulness of various graphical displays and the working of sampling devices, like Roulette wheels and spinners (see Gage & Spiegelhalter, Citation2018).

The second useful aspect of the EPO framework for teaching probabilities consists in the fact that as a computational level framework, focusing on the computational requirements for solving a specific type of probabilistic problem, the approach focusses on general solution strategies and the respective sub-goals, rather than on specific solution procedures (cf., Novick, Citation1990). Catrambone and Holyoak (Citation1990) demonstrated that it is more effective, with respect to transfer, to have problem-solving knowledge represented by relatively small sub-goals and methods. The EPO framework addresses this aspect. For example, in Bayesian reasoning the three relevant sub-goals are (in this order): (a) computing the relevant joint probabilities, (b) calculating the probability of the conditioning event, and (c) performing the operation of conditioning (cf. ). Consequently, the EPO approach allows for heuristic training (Schoenfeld, Citation1979) by focusing on relevant sub-goals. This should result in more profound and transferable knowledge that enables students to solve more difficult probabilistic problems, like proving the validity of the independence axioms of probability (Dawid, Citation1979).

Error analysis

Another application of the EPO framework concerns the analysis of errors committed in probabilistic problem solving. The error analysis based on the EPO framework is similar to that of Diaz and Batanero (Citation2009) and Eichler et al. (Citation2020). These studies analyzed the errors in achieving specific sub-tasks that are required to get a final solution.

With respect to error analysis, the differentiation between conceptual, procedural, and utilization competence (Greeno et al., Citation1984) is useful. In particular, errors may be committed due to a lack of conceptual competence, like missing knowledge about different types of probabilities and their relevance or the inability to identify relevant sub-goals. Moreover, there may be an insufficient understanding of relevant procedures and their application. Participants may be able to identify relevant sub-goals in new problem contexts but fail to apply the appropriate method (Catrambone & Holyoak, Citation1990). Concerning elementary probabilistic problems, people may, for instance, understand that they have to combine the given probabilistic information. They may, however, employ inappropriate procedures, like the multiply-all algorithm, P(e)P(h&e), or wrong algebraic operation (e.g., addition instead of multiplication). In this way, the EPO approach constitutes a framework for identifying missing competencies required for solving a probabilistic problem.

Conclusion

The framework of elementary probabilistic operations (EPO) enables a new way of analyzing elementary probabilistic problems and provides new explanations of the functioning of different methods for improving probabilistic reasoning and the working of training procedures. In particular, as a computational level theory that exhibits the computational requirements underlying elementary probabilistic problems, the EPO framework (a) provides a precise specification of the problem space underlying elementary probabilistic reasoning, (b) enables a classification of elementary probabilistic problems according to computational requirements that each solution algorithm has to fulfill, (c) reveals the relationship between elementary probabilistic problems to problems involving other quantities, (d) exhibits that methods to improve Bayesian reasoning are of two types: first, problem descriptions resulting in a reduction of computational and conceptual requirements, and, second, improving the utilization competence by fostering the application of acquired quantitative inference schemas; (e) explains that methods for training Bayesian inference focus on the combination of information using quantities other than probabilities thus capitalizing on people’s knowledge of different types of quantities. Finally, the EPO framework turns out to be helpful in teaching probabilistic reasoning, reveals how to employ analogical reasoning for solving probabilistic problems, and provides a guide for analyzing errors.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Gage and Spiegelhalter (Citation2018) differentiate between frequency and expected frequency. The former denotes frequencies resulting from random experiments, whereas the latter refers to frequencies based on the "true" probabilities. In line with the usage in the psychological literature, we use the term frequency to refer to expected frequencies.

2 Additivity is, in general, not guaranteed for quantities like lengths and volumes (e.g., Cohn, Citation1980). However, assuming that there are at most countably many objects to which quantities can be assigned (which is the case for real existing things), additivity holds.

3 Algebraic manipulation of equations are required for specific problems. For example, a problem may demand the computation of P(x,y¯), given the probabilities P(x) and P(x,y) (see, e.g., Ayal & Beyth-Marom, Citation2014; Girotto & Gonzalez, Citation2001; Huerta, Citation2014). This can be achieved by using the equation underlying the operation of marginalization: P(x)=P(x,y)+P(x,y¯) and subtracting the term P(x,y) from both sides resulting in (after rearranging terms): P(x,y¯)=P(x)P(x,y).

4 In the context of expert systems, sophisticated methods have been developed to avoid the computation of the full joint probabilities (Lauritzen & Spiegelhalter, Citation1988; Neapolitan, Citation1990). However, these methods work only in the presence of (conditional) independences between different sets of variables involved in the system. In the absence of such independences the full joint distribution has to be computed.

5 Brase et al. (Citation1998) provide a different explanation by assuming that our probability module works correctly with whole objects only. However, more recent studies have revealed that this explanation is problematic (e.g., Falk & Lann, Citation2008; Sirota et al., Citation2014).

6 Sloman et al. (Citation2003) call them Euler circles.

7 The dual-process approach itself is not without criticisms (e.g., Gigerenzer & Regier, Citation1996; Keren, Citation2013; Keren & Schul, Citation2009; Kruglanski, Citation2013; Kruglanski & Gigerenzer, Citation2011; Osman, Citation2013; for a defense of dual-process theories, see Evans & Stanovich, Citation2013a, Citationb).

8 The relationship between the probabilities and other quantities holds only with respect to the mathematical and objective conception of probability but not in the case of the subjective conception (for a discussion of different conceptions of probability, see, e.g., Gillies, Citation2000; Mellor, Citation2005).

References

  • Ayal, S., & Beyth-Marom, R. (2014). The effects of mental steps and compatibility on Bayesian reasoning. Judgment and Decision Making, 9(3), 226–242. https://doi.org/10.1017/S1930297500005775
  • Baratgin, J., & Politzer, G. (2010). Updating: A psychologically basic situation of probability revision. Thinking & Reasoning, 16(4), 253–287. https://doi.org/10.1080/13546783.2010.519564
  • Barbey, A. K., & Sloman, S. A. (2007). Base-rate respect: From ecological rationality to dual processes. The Behavioral and Brain Sciences, 30(3), 241–254. https://doi.org/10.1017/S0140525X001653
  • Binder, K., Krauss, S., & Bruckmaier, G. (2015). Effects of visualizing statistical information – an empirical study on tree diagrams and 2 × 2 tables. Frontiers in Psychology, 6, 1186. https://doi.org/10.3389/fpsyg.2015.01186
  • Böcherer-Linder, K., & Eichler, A. (2019). How to improve performance in Bayesian inference tasks: A comparison of five visualizations. Frontiers in Psychology, 10, 267. https://doi.org/10.3389/fpsyg.2019.00267
  • Bramwell, R., West, H., & Salmon, P. (2006). Health professionals’ and service users’ interpretation of screening test results: Experimental study. BMJ, 333(7562), 284–289. https://doi.org/10.1136/bmj.38884.663102.AE
  • Brase, G. L. (2002). Ecological and evolutionary validity: Comments on Johnson-Laird, Legrenzi, Girotto, and Caverni’s (1999) mental-model theory of extensional reasoning. Psychological Review, 109(4), 722–728. https://doi.org/10.1037//0033-295X.109.4.722
  • Brase, G. L. (2008). Frequency interpretation of ambiguous statistical information facilitates Bayesian reasoning. Psychonomic Bulletin & Review, 15(2), 284–289. https://doi.org/10.3758/PBR.15.2.284
  • Brase, G. L. (2009). Pictorial representations in statistical reasoning. Applied Cognitive Psychology, 23(3), 369–381. https://doi.org/10.1002/acp.1460
  • Brase, G. L. (2014). The power of representation and interpretation: Doubling statistical reasoning performance with icons and frequentist interpretations of ambiguous numbers. Journal of Cognitive Psychology, 26(1), 81–97. https://doi.org/10.1080/20445911.2013.861840
  • Brase, G. L., Cosmides, L., & Tooby, J. (1998). Individuation, counting, and statistical inference: The role of frequency and whole-object representations in judgments under uncertainty. Journal of Experimental Psychology, 127(1), 3–21. https://doi.org/10.1037/0096-3445.127.1.3
  • Brase, G. L., & Hill, W. T. (2015). Good fences make for good neighbors but bad science: A review of what improves Bayesian reasoning and why. Frontiers in Psychology, 6, 340. https://doi.org/10.3389/fpsyg.2015.00340
  • Brase, G. L., & Hill, W. T. (2017). Adding up to good Bayesian reasoning: Problem format manipulations and individual skill differences. Journal of Experimental Psychology, 146(4), 577–591. https://doi.org/10.1037/xge0000280
  • Casscells, W., Schoenberger, A., & Graboys, T. B. (1978). Interpretation by physicians of clinical laboratory results. The New England Journal of Medicine, 299(18), 999–1001. https://doi.org/10.1056/NEJM197811022991808
  • Catrambone, R. (1998). The subgoal learning model: Creating better examples so that students can solve novel problems. Journal of Experimental Psychology, 127(4), 355–376. https://doi.org/10.1037/0096-3445.127.4.355
  • Catrambone, R., & Holyoak, K. J. (1990). Learning subgoals and methods for solving probability problems. Memory & Cognition, 18(6), 593–603. https://doi.org/10.3758/BF03197102
  • Chapman, G. B., & Liu, J. (2009). Numeracy, frequency, and Bayesian reasoning. Judgment and Decision Making, 4(1), 34–40. https://doi.org/10.1017/S1930297500000681
  • Chernoff, E. J., & Sriraman, B. (2014). Probabilistic thinking: Presenting plural perspectives. Springer.
  • Cohn, D. L. (1980). Measure theory. Birkhäuser.
  • Cosmides, L., & Tooby, J. (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition, 58(1), 1–73. https://doi.org/10.1016/0010-0277(95)00664-8
  • Darwiche, A. (2009). Modeling and reasoning with Bayesian networks. Cambridge University Press.
  • Dawid, A. P. (1979). Conditional independence in statistical theory. Journal of the Royal Statistical Society, Series B , 41(1), 1–15. https://doi.org/10.1111/j.2517-6161.1979.tb01052.x
  • Diaz, C., & Batanero, C. (2009). University students’ knowledge and biases in conditional probability reasoning. International Electronic Journal of Mathematics Education, 4(3), 131–162. https://doi.org/10.29333/iejme/234
  • Eichler, A., Böcherer-Linder, K., & Vogel, M. (2020). Different visualizations cause different strategies when dealing with Bayesian situations. Frontiers in Psychology, 11, 1897. https://doi.org/10.3389/fpsyg.2020.01897
  • Evans, J. S. B. T., Handley, S. J., Perham, N., Over, D. E., & Thompson, V. A. (2000). Frequency versus probability formats in statistical word problems. Cognition, 77(3), 197–213. https://doi.org/10.1016/S0010-0277(00)00098-6
  • Evans, J. S. B. T., & Stanovich, K. E. (2013a). Dual-process theories of higher cognition: Advancing the debate. Perspectives on Psychological Science, 8(3), 223–241. https://doi.org/10.1177/1745691612460685
  • Evans, J. S. B. T., & Stanovich, K. E. (2013b). Theory and metatheory in the study of dual processing: Reply to comments. Perspectives on Psychological Science, 8(3), 263–271. https://doi.org/10.1177/1745691613483774
  • Falk, R. (1992). A closer look at the probabilities of the notorious three prisoners. Cognition, 43(3), 197–223. https://doi.org/10.1016/0010-0277(92)90012-7
  • Falk, R., & Lann, A. (2008). The allure of equality: Uniformity in probabilistic and statistical judgment. Cognitive Psychology, 57(4), 293–334. https://doi.org/10.1016/j.cogpsych.2008.02.002
  • Fiedler, K., Brinkmann, B., Betsch, T., & Wild, B. (2000). A sampling approach to biases in conditional probability judgments: Beyond base rate neglect and statistical format. Journal of Experimental Psychology, 129(3), 399–418. https://doi.org/10.1037//0096-3445.129.3.399
  • Gage, J., & Spiegelhalter, D. (2018). Teaching probability. Cambridge University Press.
  • Garcia-Retamero, R., & Hoffrage, U. (2013). Visual representation of statistical information improves diagnostic inferences in doctors and their patients. Social Science & Medicine, 83(1), 27–33. https://doi.org/10.1016/j.socscimed.2013.01.034
  • Geary, D. C. (2002). Principles of evolutionary educational psychology. Learning and Individual Differences, 12(4), 317–345. https://doi.org/10.1016/S1041-6080(02)00046-8
  • Gick, M. L., & Holyoak, J. K. (1980). Analogical problem solving. Cognitive Psychology, 12(3), 306–355. https://doi.org/10.1016/0010-0285(80)90013-4
  • Gick, M. L., & Holyoak, J. K. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15(1), 1–38. https://doi.org/10.1016/0010-0285(83)90002-6
  • Gigerenzer, G. (1994). Why the distinction between single-event probabilities and frequencies is important for psychology (and vice versa). In G. Wright & P. Ayton (Eds.), Subjective probability (pp. 129–161). Wiley.
  • Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102(4), 684–704. https://doi.org/10.1037/0033-295X.102.4.684
  • Gigerenzer, G., & Hoffrage, U. (2007). The role of representation in Bayesian reasoning: Correcting common misconceptions. Behavioral and Brain Sciences, 30(3), 264–267. https://doi.org/10.1017/S0140525X07001756
  • Gigerenzer, G., & Regier, T. (1996). How do we tell an association from a rule? Comment on Sloman (1996). Psychological Bulletin, 119(1), 23–26. https://doi.org/10.1037/0033-2909.119.1.23
  • Gillies, D. (2000). Philosophical theories of probability. Taylor & Francis.
  • Girotto, V., & Gonzalez, M. (2001). Solving probabilistic and statistical problems: A matter of information structure and question form. Cognition, 78(3), 247–276. https://doi.org/10.1016/S0010-0277(00)00133-5
  • Girotto, V., & Pighin, S. (2015). Basic understanding of posterior probability. Frontiers in Psychology, 6, 680. https://doi.org/10.3389/fpsyg.2015.00680
  • Greeno, J. G., Riley, M. S., & Gelman, R. (1984). Conceptual competence and children’s counting. Cognitive Psychology, 16(1), 94–143. https://doi.org/10.1016/0010-0285(84)90005-7
  • Hayes, J. R. (1989). The complete problem solver (2nd ed.). Erlbaum.
  • Hill, W. T., & Brase, G. L. (2012). When and for whom do frequencies facilitate performance? On the role of numerical literacy. Quarterly Journal of Experimental Psychology, 65(12), 2343–2368. https://doi.org/10.1080/17470218.2012.687004
  • Hoffrage, U., Krauss, S., Martignon, L., & Gigerenzer, G. (2015). Natural frequencies improve Bayesian reasoning in simple and complex tasks. Frontiers in Psychology, 6, 1473. https://doi.org/10.3389/fpsyg.2015.01473
  • Holyoak, K. J., & Koh, K. (1987). Surface and structural similarity in analogical transfer. Memory & Cognition, 15(4), 332–340. https://doi.org/10.3758/BF03197035
  • Howson, C., & Urbach, P. (1993). Scientific reasoning: The Bayesian approach. (2nd ed.). Open Court.
  • Huerta, M. P. (2014). Researching conditional probability problem solving. In E. J. Chernoff, & B. Sriraman, B. (Eds.), Probabilistic thinking: Presenting plural perspectives (pp. 613–639). Springer. https://doi.org/10.1007/978-94-007-7155-0_33
  • Johnson, E. D., & Tubau, E. (2013). Words, numbers, & numeracy: Diminishing individual differences in Bayesian reasoning. Learning and Individual Differences, 28, 34–40. https://doi.org/10.1016/j.lindif.2013.09.004
  • Johnson, E. D., & Tubau, E. (2015). Comprehension and computation in Bayesian problem solving. Frontiers in Psychology, 6, 938. https://doi.org/10.3389/fpsg.2015.00938
  • Johnson-Laird, P. N., Legrenzi, P., Girotto, V., Legrenzi, M. S., & Caverni, J.-P. (1999). Naive probability: A mental model theory of extensional reasoning. Psychological Review, 106(1), 62–88. https://doi.org/10.1037/0033-295X.106.1.62
  • Keren, G. (2013). A tale of two systems: A scientific advance or a theoretical stone soup? Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8(3), 257–262. https://doi.org/10.1177/1745691613483474
  • Keren, G., & Schul, Y. (2009). Two is not always better than one: A critical evaluation of two-system theories. Perspectives on Psychological Science, 4(6), 533–550. https://doi.org/10.1111/j.1745-6924.2009.01164.x
  • Kintsch, W., & Greeno, J. G. (1985). Understanding and solving word arithmetic problems. Psychological Review, 92(1), 109–129. https://doi.org/10.1037/0033-295X.92.1.109
  • Kleiter, G. (1994). Natural sampling: Rationality without base rates. In G. H. Fischer & D. Laming (Eds.), Contributions to mathematical psychology, psychometrics, and methodology (pp. 375–388). Springer. https://doi.org/10.1007/978-1-4612-4308-3_27
  • Kohlas, J. (2003). Information algebras: Generic structures for inference. Springer.
  • Krauss, S., & Wang, X. T. (2003). The psychology of the Monty Hall problem: Discovering psychological mechanisms for solving a tenacious brain teaser. Journal of Experimental Psychology. General, 132(1), 3–22. https://doi.org/10.1037/0096-3445.132.1.3
  • Kruglanski, A. W. (2013). The default interventionist perspective as a unimodel: Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8(3), 242–247. https://doi.org/10.1177/1745691613483477
  • Kruglanski, A. W., & Gigerenzer, G. (2011). Intuitive and deliberate judgments are based on common principles. Psychological Review, 118(1), 97–109. https://doi.org/10.1037/a0020762
  • Kurzenhäuser, S., & Hoffrage, U. (2002). Teaching Bayesian reasoning: An evaluation of a classroom tutorial for medical students. Medical Teacher, 24(5), 516–521. https://doi.org/10.1080/0142159021000012540
  • Lakhlifi, C., François‑Xavier Lejeune, F.-X., Rouault, M., Khamassi, M., & Rohaut, B. (2023). Illusion of knowledge in statistics among clinicians: Evaluating the alignment between objective accuracy and subjective confidence, an online survey. Cognitive Research, 8(1), 23. https://doi.org/10.1186/s41235-023-00474-1
  • Larkin, J. H., McDermott, J. R., Simon, D. P., & Simon, H. A. (1980). Expert and novice performance in solving physics problems. Science, 208(4450), 1335–1342. https://doi.org/10.1126/science.208.4450.1335
  • Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computation with probabilities on graphical structures and their applications to expert systems (with discussion). Journal of the Royal Statistical Society, 50(2), 157–194. https://doi.org/10.1111/j.2517-6161.1988.tb01721.x
  • Lipkus, I. M., Samsa, G., & Rimer, B. K. (2001). General performance on a numeracy scale among highly educated samples. Medical Decision Making, 21(1), 37–44. PMID: 11206945. https://doi.org/10.1177/0272989X0102100105
  • Macchi, L. (2000). Partitive formulation of information in probabilistic problems: Beyond heuristics and frequency format explanations. Organizational Behavior and Human Decision Processes, 82(2), 217–236. https://doi.org/10.1006/obhd.2000.2895
  • Marr, D. (1982). Vision. A computational investigation into the human representation and processing of visual information. Freeman.
  • McDermott, J. R., & Larkin, J. H. (1978). Representing textbook physics problems. Proceedings of the 2nd Conference of the Canadian Society for Computational Studies of Intelligence (pp. 156–164). University of Toronto Press.
  • McDowell, M., & Jacobs, P. (2017). Meta-analysis of the effect of natural frequencies on Bayesian reasoning. Psychological Bulletin, 143(12), 1273–1312. https://doi.org/10.1037/bul0000126
  • Mellers, B. A., & McGraw, P. A. (1999). How to improve Bayesian reasoning: Comment on Gigerenzer and Hoffrage (1995). Psychological Review, 106(2), 417–424. https://doi.org/10.1037/0033-295X.106.2.417
  • Mellor, D. H. (2005). Probability: A philosophical introduction. Routledge.
  • Micallef, L., Dragicevic, P., & Fekete, J. (2012). Assessing the effect of visualizations on Bayesian reasoning through crowdsourcing. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2536–2545. https://doi.org/10.1109/TVCG.2012.199
  • Moro, R., Bodanza, G.-A., Freidin, E. (2011). Sets or frequencies? How to help people solve conditional probability problems. Journal of Cognitive Psychology, 23(7), 843–857. https://doi.org/10.1080/20445911.2011.579072
  • Mosteller, F. (1965). Fifty challenging problems in probability with solutions. Addison Wesley.
  • Neapolitan, R. E. (1990). Probabilistic reasoning in expert systems: Theory and algorithms. Wiley.
  • Newell, A., & Simon, H. A. (1972). Human problem solving. Prentice.
  • Nisbett, R. E., Krantz, D. H., Jepson, C., & Kunda, Z. (1983). The use of statistical reasoning in everyday inductive reasoning. Psychological Review, 90(4), 339–363. https://doi.org/10.1037/0033-295X.90.4.339
  • Novick, L. R. (1990). Representational transfer in problem solving. Psychological Science, 1(2), 128–132. https://doi.org/10.1111/j.1467-9280.1990.tb00081.x
  • Osman, M. (2013). A case study: Dual-process theories of higher cognition: Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8(3), 248–252. https://doi.org/10.1177/1745691613483475
  • Ottley, A., Peck, E. M., Harrison, L. T., Afergan, D., Ziemkiewicz, C., Taylor, H. A., Han, P. K. J., & Chang, R. (2016). Improving Bayesian reasoning: The effects of phrasing, visualization, and spatial ability. IEEE Transactions on Visualization and Computer Graphics, 22(1), 529–538. https://doi.org/10.1109/TVCG.2015.2467758
  • Rittle-Johnson, B., Siegler, R. S., & Alibali, M. W. (2001). Developing conceptual understanding and procedural skill in mathematics: An iterative process. Journal of Educational Psychology, 93(2), 346–362. https://doi.org/10.1037//0022-0663.93.2.346
  • Schoenfeld, A. H. (1979). Explicit heuristic training as a variable in problem-solving performance. Journal for Research in Mathematics Education, 10(3), 173–187. https://doi.org/10.2307/748805
  • Schwartz, L. M., Woloshin, S., Black, W. C., & Welch, H. G. (1997). The role of numeracy in understanding the benefit of screening mammography. Annals of Internal Medicine, 127(11), 966–972. https://doi.org/10.7326/0003-4819-127-11-199712010-00003
  • Sedlmeier, P., & Gigerenzer, G. (2001). Teaching Bayesian reasoning in less than two hours. Journal of Experimental Psychology, 130(3), 380–400. https://doi.org/10.1037/0096-3445.130.3.380
  • Shafer, G., & Tversky, A. (1985). Languages and designs for probability judgment. Cognitive Science, 9(3), 309–339. https://doi.org/10.1207/s15516709cog0903_2
  • Shenoy, P. P., & Shafer, G. (1990). Axioms for probability and belief-function propagation. In: G. Shafer, & J. Pearl (Eds.), Readings in uncertain reasoning (pp. 575–610). Morgan Kaufmann.
  • Shimojo, S., & Ichikawa, S. (1989). Intuitive reasoning about probability: Theoretical and experimental analyses of the “problem of three prisoners”. Cognition, 32(1), 1–24. https://doi.org/10.1016/0010-0277(89)90012-7
  • Siegrist, M., & Keller, C. (2011). Natural frequencies and Bayesian reasoning: The impact of formal education and problem context. Journal of Risk Research, 14(9), 1039–1055. https://doi.org/10.1080/13669877.2011.571786
  • Sirota, M., Juanchich, M., & Hagmayer, Y. (2014). Ecological rationality or nested sets? Individual differences in cognitive processing predict Bayesian reasoning. Psychonomic Bulletin & Review, 21(1), 198–204. https://doi.org/10.3758/s13423-013-0464-6
  • Sirota, M., Kostovičová, L., & Vallée-Tourangeau, F. (2015). How to train your Bayesian: A problem-representation transfer rather than a format-representation shift explains training effects. Quarterly Journal of Experimental Psychology, 68(1), 1–9. https://doi.org/10.1080/17470218.2014.972420
  • Sirota, M., Vallée-Tourangeau, G., Vallée-Tourangeau, F., & Juanchich, M. (2015). On Bayesian problem-solving: Helping Bayesians solve simple Bayesian word problems. Frontiers in Psychology, 6, 1141. https://doi.org/10.3389/fpsyg.2015.01141
  • Sloman, S. A., Over, D., Slovak, L., & Stibel, J. M. (2003). Frequency illusions and other fallacies. Organizational Behavior and Human Decision Processes, 91(2), 296–309. https://doi.org/10.1016/S0749-5978(03)00021-9
  • Stanovich, K. E., & West, R. F. (2000). Individual differences in reasoning: Implications for the rationality debate? The Behavioral and Brain Sciences, 23(5), 645–665. https://doi.org/10.1017/s0140525x00003435
  • Talboy, A. N., & Schneider, S. L. (2017). Improving accuracy on Bayesian inference problems using a brief tutorial. Journal of Behavioral Decision Making, 30(2), 373–388. https://doi.org/10.1002/bdm.1949
  • Tricot, A., & Sweller, J. (2014). Domain-specific knowledge and why teaching generic skills does not work. Educational Psychology Review, 26(2), 265–283. https://doi.org/10.1007/s10648-013-9243-1
  • Tubau, E., Aguilar-Lleyda, D., & Johnson, E. D. (2015). Reasoning and choice in the Monty Hall dilemma (MHD): Implications for improving Bayesian reasoning. Frontiers in Psychology, 6, 353. https://doi.org/10.3389/fpsyg.2015.00353
  • Tubau, E., & Alonso, D. (2003). Overcoming illusory inferences in a probabilistic counterintuitive problem: The role of explicit representations. Memory & Cognition, 31(4), 596–607. https://doi.org/10.3758/BF03196100
  • Tversky, A., & Kahneman, D. (1982a). Causal schemas in judgments under uncertainty. In D. Kahneman, P. Slovic, & A. Tversky, (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 117–128). Cambridge University Press.
  • Tversky, A., & Kahneman, D. (1982b). Evidential impact of base rates. In: D. Kahneman, P. Slovic, & A. Tversky, (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 153–160). Cambridge University Press.
  • Von Mises, R. (1972). Wahrscheinlichkeit, Statistik und Wahrheit (Probability, statistics, and truth) (4th ed.). Springer.
  • Vos Savant, M. (1990, September, 9). Ask Marilyn. Parade Magazine, 15.
  • Wassner, C., Martignon, L., & Biehler, R. (2004). Bayesianisches Denken in der Schule (Bayesian thinking in school). Unterrichtswissenschaft, 32(1), 58–96. https://doi.org/10.25656/01:5808
  • Weber, P., Binder, K., & Krauss, S. (2018). Why can only 24% solve Bayesian reasoning problems in natural frequencies: Frequency phobia in spite of probability blindness. Frontiers in Psychology, 9, 1833. https://doi.org/10.3389/fpsyg.2018.01833

Appendix

Formal specification of the probability spaces and probabilistic operations underlying elementary probabilistic problems

In the following, we provide a formal definition of the probability space and the elementary probabilistic operations underlying elementary probabilistic reasoning problems.

Specification of the Probability Space

Let Ω1,Ω2,,Ωn represent n finite sets of values from various domains, and let mi=|Ωi| denote the size of set Ωi, with 2mi< (i=1,2,,n). The sample space Ω, underlying elementary probabilistic problems, is defined as the Cartesian product of the n sets: Ω=Ω1×Ω2××Ωn. Thus, each element ωΩ is a tuple x1,x2,,xn with xiΩi (i=1,2,,n).

A basic probability assignment is a function π:Ω[0,1] that assigns the point probability π(x1,x2,,xn) to the each element ω=x1,x2,,xn of the sample space Ω such that the assigned point probabilities sum to 1: x1Ω1x2Ω2xnΩnπ(x1,x2,,xn)=1.

The probability space, underlying elementary probabilistic problems, is a triple Ω,2Ω,P, consisting of the sample space Ω, the power set 2Ω on Ω, and the probability function P:2Ω0,1 that assigns to each non-empty set A2Ω the probability P(A):=x1,x2,,xnAπ(x1,x2,,xn), i.e., the probability of A is the sum of probabilities assigned to the elementary events in the sample space that are contained in A. In addition, P():=0. The resulting probability measure P satisfies the probability axioms: (a) the assigned probabilities are in the range [0,1] of real numbers; (b) P(Ω)=1, and (c) the measure is additive: Let A1,A2,,AK2Ω denote a set of pairwise disjoint sets, i.e., AkAl=, for all k and l (k=1,2,,K;l=1,2,,K;kl), then P(A1A2AK)=k=1KP(Ak).

A random variable Xi:ΩΩi is defined as a projection function that projects an element ω=x1,x2,,xn of Ω on its i-th component, i.e., Xiω=xi (i=1,2,,n).

Types of Probabilities and Elementary Probabilistic Operations

Elementary probabilistic problems comprise two components: Different types of probabilities and elementary probabilistic operations.

Types of Probabilities

Three types of probabilities (or probability distributions) are involved in elementary probabilistic problems: joint, marginal, and conditional probabilities.

The joint (probability) distribution P(X1,X2,,Xn) conforms to the probability distribution over the elements ωΩ. It may be regarded as a table comprising m1m2mn entries. The single entries of the table represent the joint probability of the combination of values of the n variables and are denoted by P(X1=x1,X2=x2,,Xn=xn), or simpler: P(x1,x2,,xn). The joint probability distribution contains the complete probability information about the variables in question, enabling one to answer any possible probabilistic question. In addition, the joint probability distribution is defined on the finest partition of the sampling space consisting of m1m2mn combinations of the values of the n variables. These two characteristics, completeness and most fine-grained partitioning of the sampling space, render the joint distribution of focal importance in the case of probabilistic reasoning. For elementary probabilistic problems, used in psychological and educational studies, the joint distribution is at the heart of the probabilistic reasoning process.

Given the joint distribution P(X1,X2,,Xk,Xk+1,,Xn) over the set of variables X1,X2,,Xk,Xk+1,,Xn, the marginal distribution P(X1,X2,,Xk) is the probability distribution over a proper subset X1,X2,,Xk (k<n) of the variables.

Given the joint distribution P(X1,X2,,Xk,Xk+1,,Xn) on the set of variables X1,X2,,Xk,Xk+1,,Xn, and the marginal distribution P(X1,X2,,Xk), the conditional (probability) distribution PX1,X2,,XkXk+1,,Xn of the variables X1,X2,,Xk, given the variables Xk+1,Xk+2,,Xn, is defined by the equation (assuming P(Xk+1,,Xn)>0): (A1) PX1,X2,,XkXk+1,,Xn=P(X1,X2,,Xk,Xk+1,,Xn)P(Xk+1,,Xn).(A1)

The conditional distribution PX1,X2,,XkXk+1,,Xn may be conceived of as a table with each entry containing the respective conditional point probability, denoted by the symbols PX1=x1,X2=x2,,Xk=xkXk+1=xk+1,,Xn=xn or Px1,x2,,xnxk+1,,xn.

Elementary Probabilistic Operations (EPO)

The second component of elementary probabilistic problems constitute elementary probabilistic operations: (1) combination of information, (2) marginalization, and (3) conditioning. These operators are used to switch between different types of probabilities. The first operation, the combination of probability information, consists in the generation of a joint probability distribution by combining conditional and marginal probabilities. The combination is based on EquationEquation (A1) specifying the conditional distribution. Reordering terms results in: (A2) P(X1,X2,,Xk,Xk+1,,Xn)=PX1,X2,,XkXk+1,,XnP(Xk+1,,Xn).(A2)

Repeated application of EquationEquation (A2) on the right-most factor on the right-hand side of EquationEquation (A2) results in the chain rule for combining probabilistic information: (A3) P(X1,X2,,Xn)=PX1X2,,XnPX2X3,,XnPXn1XnP(Xn).(A3)

The second elementary probabilistic operation is called marginalization. With discrete variables it is realised by means of a summation performed on the joint distribution. The result of marginalization is a marginal distribution. The summation is taken over all combination of values of those variables that do not make up the resulting marginal distribution. These variables are “summed out” of the distribution. Formally, the operation of marginalization can be represented as follows: (A4) P(X1,X2,,Xk)=xk+1Ωk+1xk+2Ωk+2xnΩnP(X1,X2,,Xk,xk+1,xk+2,,xn).(A4)

The resulting marginal distribution is the (joint) distribution of the variables that have not been summed out. In the case of continuous variables, the operation of summation is replaced by integration.

The final probabilistic operation is called conditioning. It is realized by means of the algebraic operation of division. This operation is based on the definition of conditional probabilities given by EquationEquation (A1). Conditioning on an event B induces a new probability measure. In particular, given the probability space Ω,2Ω,P and let BΩ be a non-empty subset of Ω, with P(B)>0. Conditioning on B results in the new probability space Ω,2Ω,PB, with the induced probability measure PB given by PBA:=PA,B/PB, for every A2Ω, where P(A,B) is the probability measure of the joint event A&B.

In the case of the more general Jeffrey conditioning with uncertain conditioning events, the induced probability measure may be defined as follows: Let B1,B2,,BK be a partitioning of the sample space Ω, with P(Bk)>0, for all k (k=1,2,,K), and let γ1,γ2,,γK be probability weights, i.e., γk0, for all k (k=1,2,,K), and k=1Kγk=1, representing the posterior probabilities of the conditioning sets B1,B2,,BK. The corresponding probability measure is given by PB1,B2,,BKγ1,γ2,,γKA:=k=1KγkPA,Bk/PBk, for every A2Ω. The probability measure induced by simple conditioning is a special case of the probability measure resulting from Jeffrey conditioning: PB(A)=PB,B¯1,0(A), for every A2Ω, where B¯ denotes the complement of B with respect to Ω (P(B¯)>0). Thus, conditional probabilities are defined on a different probability space than joint and marginal probabilities. The three elementary probabilistic operations also work with conditional distributions on the new probability space Ω,2Ω,PB.