Holding back from theory: limits and methodological alternatives of randomized field experiments in development economics: Journal of Economic Methodology: Vol 27 , No 3

ABSTRACT

In this paper, we critically and constructively examine the methodology of evidence-based development economics, which deploys randomized field experiments (RFEs) as its main tool. We describe the context in which this movement started, and illustrate in detail how RFEs are designed and implemented in practice, drawing on a series of experiments by Pascaline Dupas and her colleagues on the use of bednets, saving and governance in Kenya. We show that this line of experiments have evolved to address the limitation of obtaining policy-relevant insights from RFEs alone, characterized as their lack of external validity in the literature. After examining the two prominent responses by leading figures of evidence-based development economics, namely machine learning and structured speculation, we propose an alternative methodological strategy that incorporates two sub-fields, namely experimental economics and behavioral economics, to complement RFEs in investigating the data-generating process underlying the treatment effects of RFEs. This strategy highlights promising methodological developments in RFEs neither captured by the two proposals nor recognized by methodologists, and also guides how to combine different sub-fields of economics.

KEYWORDS:

JEL CODES:

Acknowledgments

We thank two anonymous reviewers and Pascaline Dupas for their insightful and detailed comments on earlier versions of this paper. We also benefited from the comments from the audiences at a TINT brown bag seminar and the meetings of the following societies: SPSP (Society for Philosophy of Science in Practice), INEM (International Network for Economic Method), and ESHET (European Society for the History of Economic Thought). In particular we thank Glenn Harrison, Marría Jiménez Buedo for valuable comments. Sofia Blanco Sequeiros helped the project as a research assistant. Lisa Muszynski improved the language and style of the paper. The usual caveats about any shortcomings of the work being our own apply.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes on contributors

Judith Favereau is an associate professor in philosophy of economics and history of economics thought at the pluridisciplinary laboratory TRIANGLE in the University Lyon 2. She is also affiliated with the Centre for Philosophy of Social Science (TINT). Her topics of interest are development economics, experimental economics, philosophy of economics, and evidence-based policy. Her research focuses on how development economics, experimental economics, and evidence-based policy interact together to fight poverty.

Michiru Nagatsu is an associate professor at the Helsinki Institute of Sustainability Science, and Practical Philosophy, the University of Helsinki. He runs Economics and Philosophy Lab and HELSUS Methodology Lab. His research uses a range of empirical approaches – including experimental philosophy, collaborations with scientists, interviews, integrated history and philosophy of science – to study conceptual and methodological questions in the philosophy of science.

Notes

1 For a discussion on the gold standard nature of RFEs, see for example Cartwright (Citation2007), Cartwright and Hardie (Citation2012) and Duflo (Citation2004).

2 This follows exactly the same trend that first appeared in medicine with the rise of evidence-based medicine, and then more broadly in social science with the movement of evidence-based policy. See, for example, Cartwright and Hardie (Citation2012) for an analysis of such a movement.

3 For Instance, Banerjee (Citation2005, p. 4343) claims:

The fallout of the behavioral economics revolution in economics is that we are no longer particularly sure of what the right theory ought to look like, especially inasmuch as decision problems are concerned. In particular, we are no longer sure in the presumption that utility functions and cost functions are somehow more stable and more universal than behavioral rules. (Duflo, Citation2009)

makes a similar claim.

4 Although Sachs and Easterly are the two main figures of this popularized debate, other economists have claimed similar positions. For instance, Singer (Citation2009) and Collier (Citation2007) obviously share the position of Sachs, while Moyo (Citation2009) sides with Easterly. Since J-PAL's researchers focus on the Sachs-Easterly debate, we follow this in our arguments.

5 ‘Whom should we believe? Those who tell us that aid can solve the problem? Or those who say that it makes things worse? The debate cannot be solved in the abstract: We need evidence’ (Banerjee & Duflo, Citation2011, p. 4).

6 It is instructive to see a similar ‘forget about theory and collect data’ idea resonate in the popular discourse on big data science (e.g. Anderson, Citation2008). For critical assessment of this idea in climate science, see Faghmous and Kumar (Citation2014). We thank Miles MacLeod for pointing out this similarity.

7 Another type of answer focuses on the statistical aspects of RFEs in order to unpack the distribution of the treatment effect. See for example Athey and Imbens (Citation2017) for the promotion of RFEs and Bareinboim, Lee, Honavar, and Pearl (Citation2013) for the promotion of data fusion. Since our concern about external validity of RFEs in development economics is more general than a purely statistical one, we do not discuss this type of answer here.

8 We use the term external validity in the remainder of the paper to refer to the validity of inferences, not experimental design or results. See Jiménez-Buedo (Citation2011) for a review of a range of things external validity can be about in experimental social science.

9 Deaton (Citation2010) takes the example of Keynesian investment. He shows that in a Kenesyan model, investment is both part of the income explanation as well as the consumption, which one would thus need to theoretically explain. The Keynesian explanations relate on the entrepreneurs' optimism or pessimism.

10 Deaton (Citation2019) insists on that last point as well as the ethical concerns that RFEs in development economics encounter. Barrett and Carter (Citation2010) also scrutinize this point, referring to a driving licence RFE in India.

11 In 2001, Kenya's health ministry led an important program to fight malaria by distributing 3 million ITNs for free. The Global fund fighting HIV, malaria, and tuberculosis financed this Kenyan program up to 17 million dollars. After this program, Guyatt, Ochola, and Snow (Citation2002) assess whether the ITNs reached their destinations and show that they did. In other words, neither waste nor corruption was detected, which were the main concerns of the massive aid opponents.

12 Dupas (Citation2014) assesses this willingness to pay with a new health product: long-lasting insecticide-treated nets, or LLIN, in which the insecticide needs to be restored every four or five years.

13 Kremer et al. (Citation2019) offers a large overview of such a puzzle. They enlarge it to several domains, such as savings, health, technology adoption, labor,…They refer to this enlarged puzzle as the Euler's puzzle.

14 For instance, Easterly (Citation2006) claims that bednets are often used as fishing nets or wedding veils.

15 The title of Duflo's presentation at the NBER (Duflo, Citation2018), ‘Machinistas meet randomistas’, is a direct response to Deaton (Citation2010), who with Ravallion (Citation2009) coined the term randomistas to refer to J-PAL's researchers.

16 This approach also encourages re-randomization. Randomization might not fully balance the sample into the two groups, therefore re-randomizing might improve such balance. However, re-randomizing might also threaten the internal validity of the experimental result. That is why Banerjee, Chassang, and Snowberg (Citation2017) define a threshold for such re-randomization.

17 ‘Proposition 3 formalizes the natural intuition that external policy advice is unavoidably subjective. This does not mean that it needs to be informed by experimental evidence, rather, judgment will unavoidably color it’ (Banerjee, Chassang, & Snowberg, Citation2017, p. 25).

18 Furthermore, Dupas and Robinson (Citation2013b) have another section in which they attempt to ‘rule out alternative explanations’. The three alternative mechanisms all concern the role of a Rotative Savings and Credit Associations (ROSCAs). In this peer-to-peer credit system, members make cyclical contributions of the same amount of money to a common fund (called the ‘pot’) at each meeting, and a lump sum of the pot of money is given to one member in each cycle. First, Dupas and Robinson (Citation2013b) claims that the ROSCA might have been perceived as a reminder to save. Second, seeing people save at the ROSCA might give individuals reasons to save as well. Third, Dupas and Robinson (Citation2013b) question the future of the ROSCA, since their device implies less commitment than the one of the ROSCA. However, the ROSCA also plays a social and even moral role in the community, which is implicit but not questioned by Dupas and Robinson (Citation2013b).

19 When Mwai Kibaki was elected president of Kenya in December 2007, his election encountered strong opposition. From January to February 2008, Kenya faced social violence. As a result, Kenyans' incomes as well as expenditures dropped. As well-discussed in the literature (e.g. Barro, Citation1991), politically unstable countries tend to have lower growth than stable ones. The main question, however, is which causes which: does slow growth lead to political instability or vice versa? To answer this question, Dupas and Robinson (Citation2010) use three data sets coming from three different experiments. The first data set is the one from Dupas and Robinson (Citation2013a). The second one is from an experiment in Kenya conducted by Robinson and Yeh (Citation2009) on transactional sex and risky behavior. The last one is from an experiment by Robinson and other colleagues at the J-PAL: Kremer, Lee, Robinson, and Rostapshova (Citation2016) on local shop owners in Kenya.

20 Note however that $z_{p}$ is not necessarily an environment to which the experimenter recommends a policy intervention, as in Banerjee, Chassang, Montero, et al. (Citation2017), but rather an environment to which she applies an inference of a causal mechanism – similar to that at work in the lab. We emphasize this point because the recommendation of a similar policy requires much more information than the causal similarity.

21 Harrison and List (Citation2004) discuss other types of experiments such as thought experiments, social experiments and natural experiments, which we shall not discuss here. Another thing to note is that, logically, there would be more types of experiments than five. But being exhaustive in this respect is not necessary for our purposes here.

22 Robustness in this experimental/operational sense is related to but distinct from the robustness much discussed in the philosophical literature since Wimsatt (Citation2007), focusing on robust theoretical modeling as triangulation through independent theoretical derivations.

23 There is another closely related hypothesis, the spillover effect, which is important in Dupas (Citation2014), but we will not discuss this to make our exposition simple.

24 Depending on the experimental design and the specification of the estimation model, the experimenters can also measure the non-standard behavior such as loss-aversion and present-biasedness, and their importance relative to other more ‘rational’ behavior. In general, to measure preferences is to identify and estimate parameters associated with those preferences in a model (or models) of those preferences.

25 That is, F1-type experiments in on p. 15. Although Cardenas and Carpenter (Citation2008) use the term ‘field lab’ to characterize those experiments, we will stick to the terminology of here to avoid confusion.

26 Cardenas and Carpenter (Citation2008) call them ‘field labs’. Gneezy and Imas (Citation2017) calls the use of this type of field experiments the ‘lab-in-the-field’ methodology.

27 The last purpose concerns the validity of different elicitation and other experimental methods. Although they are also relevant for external validity of experimental results in general, we will not discuss this issue in order to focus on the complementarity of RFEs and LFEs.

28 The first two types of preferences were measured in common choice tasks. The latter two types were elicited as a willingness to participate and confidence in an incentivized knowledge quiz competition.

29 Male entrepreneurs have become more risk averse and female entrepreneurs have become more confident.

30 This contrast between internal and external constraints does not have to imply that the women are inherently unwilling to compete relative to the men. It is equally likely that this mind-set is an endogenous response to the gender structure in the society.

Additional information

Funding

This work was supported by the Academy of Finland in the context of the project “Model-building across disciplinary boundaries: Economics, ecology, and psychology” (No. 294545).

Holding back from theory: limits and methodological alternatives of randomized field experiments in development economics

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Holding back from theory: limits and methodological alternatives of randomized field experiments in development economics

ABSTRACT

Acknowledgments

Disclosure statement

Notes on contributors

Notes

Additional information

Funding

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature