1,288
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Everybody herds, sometimes: cumulative advantage as a product of rational learning

, &
Pages 207-271 | Received 03 Jul 2022, Accepted 29 Mar 2023, Published online: 08 Jun 2023
 

Abstract

We propose a model of cumulative advantage (CA) as an unintended consequence of the choices of a population of individuals. Each individual searches for a high quality object from a set comprising high and low quality objects. Individuals rationally learn from their own experience with objects (reinforcement learning) and from the observation of others’ choices (social learning). We show that CA emerges inexorably as individuals rely more on social learning and as they learn from more rather than fewer others. Our theory argues that CA has social dilemma features: the benefits of CA could be enjoyed with modest drawbacks provided individuals would practice restraint in their social learning. However, when practiced by everyone such restraint goes against the individual’s self-interest.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 In the behavioral game theory literature (Camerer, Citation2003) this mode of learning would be called “belief learning.” However, since all our learning is rational and thus based on beliefs, we opt for the term “reinforcement learning.” This conveys the information that this form of learning is based on an agent’s own experience. Just like with non-rational forms of reinforcement learning (cf. Flache & Macy, Citation2002) our rational reinforcement learning implies that the experience of success makes a repeated choice of the same object more likely, whereas failure makes it less likely.

2 For these beliefs to be truly well-defined after all histories, it is required that “zero-probability events” do not occur. In particular, when appropriate we will be assuming that ε0<1, since ε1=0 and ε0=1 together imply that neighbor j always stays (rendering “leave” a zero-probability event).

3 When values of ½ are plugged in for both epsilons in EquationEqs. (4) and (Equation5) the posteriors equal the priors.

4 This operationalization of “optimal learning” resembles a procedure used by Anderson and Holt (Citation1997). These authors conduct an experimental investigation of the model of Bikhchandani et al. (Citation1992), and in their ex post (i.e., after having observed participants’ behavior) determination of optimal choice behavior they estimate error rates (i.e. deviations from Bayesian optimality) for each round and use those estimates to determine optimal behavior in the next. The key difference with our current approach, apart from the fact that our agents are of course simulated, is that our “false stayers” do not deviate from Bayesian optimality, whereas Anderson & Holt’s participants do.

5 Note that the latter does not imply that the difference in consequences of success or failure is unimportant. We do not model the value to the agents of success or failure, but simply assume ordinal preferences in this respect: agents like success better than failure.

6 For each model run separately, we generated a random regular graph with the specified neighborhood size using the function sample_k_regular from the igraph package (Csardi & Nepusz, Citation2006). The support of this function consists of the space of connected k-regular graphs in which there exists at most one edge between any pair of nodes (i.e., no multigraphs). Thus, each connected regular k-graph of specified neighborhood size has a strictly positive probability of being drawn. The function does not guarantee a uniform draw (equiprobability of all connected regular k-graphs), however. To guarantee uniformity, we would have to employ the function sample_degseq, but this would come at the price of increasing the runtime of our models. Considering the relatively large number of runs we need for our paper (including robustness checks) we opted for using sample_k_regular directly. We believe the lack of uniformity in no way affects our conclusions, because nodes have no properties beyond their labels. In addition, since beliefs about object quality are uniform at the outset, object choice in the first round is entirely random.

7 Note that the model does not harbor any “repeated game strategic incentives:” each agent simply wants to maximize the probability of choosing a high quality object in each round, and hence has an incentive to learn optimally from the behavior of her neighbors in each round separately.

8 Note that the “false stayer rate” is not information agents would normally have in real-world applications. We included this treatment to investigate a situation with “complete information” that allows agents to “optimally adjust their aim” after each round. Initially, we wanted to use the approach in this treatment to locate a Nash equilibrium of the system (from the potentially very large set of equilibria). To do so, we took the following approach. We ran our current optimal learning model in a series of subsequent “sessions” (each comprised of 20 runs), each time feeding the agents in a new session the average learning trajectory (i.e., the profile of average false stayer rates in each round, across the runs) of the previous session. While for neighborhood sizes 2 and 4 this eventually yielded a stable learning trajectory (which would thus be a “fixed point” of the system and hence a Nash equilibrium), this procedure failed to converge for larger neighborhoods. We have therefore chosen to postpone the issue of computationally finding Nash equilibria to future research, and include the current optimal learning treatment as an investigation of what optimal behavior would look like if agents had access to the (informationally aggregated) learning histories of their peers.

9 Across all values of ε0, the percentages of runs ending in such herding are 0 for k equal to 2, 4, and 8, but 2.62%, 7.14%, and 12.38% for k equal to 16, 32, and 64, respectively. The percentages of runs ending in all agents converging on a high quality object are 16.67%, 20.71%, 43.10%, 81.67%, 88.57%, and 85.48% for k equal to 2, 4, 8, 16, 32, and 64, respectively.

10 Leaving out the trivial case of ε=12, the probabilities of herding are 0 for k equal to 2, 4, and 8, and 0.12, 0.26, and 0.39 for k equal to 16, 32, and 64, respectively. The probabilities of all agents converging on a high-quality object for the same sequence of k are 0, 0, 0.12, 0.59, 0.68, and 0.59, respectively.