3,023
Views
1
CrossRef citations to date
0
Altmetric
Original Article

Reproducible research: a minority opinion

Pages 1-11 | Received 25 Jan 2017, Accepted 30 Jul 2017, Published online: 13 Dec 2017

Abstract

Reproducible research, a growing movement within many scientific fields, including machine learning, would require the code, used to generate the experimental results, be published along with any paper. Probably the most compelling argument for this is that it is simply following good scientific practice, established over the years by the greats of science. The implication is that failure to follow such a practice is unscientific, not a label any machine learning researchers would like to carry. It is further claimed that misconduct is causing a growing crisis of confidence in science. That, without this practice being enforced, science would inevitably fall into disrepute. This viewpoint is becoming ubiquitous but here I offer a differing opinion. I argue that far from being central to science, what is being promulgated is a narrow interpretation of how science works. I contend that the consequences are somewhat overstated. I would also contend that the effort necessary to meet the movement’s aims, and the general attitude it engenders would not serve well any of the research disciplines, including our own.

1. Introduction

There is now a well-established, yet still growing, movement within a broad range of research communities called Reproducible Research. Many in the machine learning community are clearly supportive (Vanschoren et al., Citation2012; Sonnenburg et al., Citation2007). Indeed, one leading conference, the European Conference on Machine Learning (ECML), encourages papers that adhere to its standards. Within the broader AI community, the Journal for Artificial Intelligence Research (JAIR) makes an even stronger commitment. Reproducible Research requires the publication not only of a paper but also of all the computational tools and data used to generate the results reported therein. The code and data would be run together to prove that the results given in the paper could be reproduced. One of the most compelling arguments is that it is just following good scientific practice, established over the years by the greats of science; this movement is simply updating the traditional scientific methodology to bring it into the computational age. This sort of argument, I suspect has a strong resonance with many machine learning researchers. Nearly 20 years ago, Kibler and Langley (Citation1988) argued that machine learning is essentially an experimental science. Reproducible Research tells us how to do this the right way. There is a strong implication that if you do not follow the rules, you are doing something which is unscientific. A claim, I am sure, that many would take to heart.

One motivation for the movement, and its rapid expansion, is a series of well-known frauds that have occurred relatively recently. The number seems to be growing and has resulted in an increasing number of retractions from our journals, including Science and Nature. Probably the most infamous episode being the Duke University cancer trials. When Anil Potti was found to have inflated his CV (TCL, Citation2010), it quickly became apparent that there were also major flaws with the data used to support his conclusions. This and other examples of scientific misconduct have been reported internationally in prestigious newspapers (Economist Editorial, Citation2011, Citation2013; Guardian, Citation2011; NYT, Citation2011, Citation2012). This raises the concern that if not addressed the general public will inevitably lose its trust in science.

Certainly many agree. The Reproducible Research movement is progressively gaining more and more support in such diverse communities as Artificial Intelligence, Bio-statistics, Geoscience, to name but a few. Its growing influence can be seen by the increasing number of workshops and conferences on the topic in many areas of research (AAAS, Citation2011; AMP, Citation2011; ENAR, Citation2011; Kazai & Hanbury, Citation2015; Moska, Citation2015; NSF, Citation2010; SIAM-CSE, Citation2011; SIAM-Geo, Citation2011). The top two science journals Science (Jasny, Chin, Chong, & Vignieri, Citation2011) and Nature (Citation2015) have devoted special issues on this topic. There is already a book (Stodden, Leisch, & Peng, Citation2014) and online courses (Johns Hopkins University, Citation2015; Swiss Institute of BioInformatics, Citation2015). In fact, we seem to be well past any discussion of the merits of the idea. What appears now to be of paramount importance is to simply determine how badly each particular field suffers from the problem (Baker, Citation2015; Camerer et al., Citation2016).

What is surprising, perhaps, is the lack of any opposition to this movement. One argument for this would be that the problem is clear cut, the answer obvious. However, here I offer a differing opinion. In this paper, I raise some issues which should, at the very least, cause a pause for reflection. I support my opinion not by empirical means but by an appeal to intuition. It would be hard, although not impossible, to generate empirical support. Furthermore, I believe this approach is justified as it is the same one that the Reproducible Research movement used to support its own viewpoint.

Let me begin, in the next section, by making the case for why everyone in the community should be concerned about this issue. I will then make the set of arguments for Reproducible Research explicit. In Section 4, I address each one in turn and offer counterarguments. Section 5 will discuss issues that are often, in my mind, incorrectly conflated with the Reproducible Research movement.

2. Why should you care?

There is a very large effortFootnote1 being put into addressing the putative issue of Reproducible Research. Even if you are not a direct part of that activity, the outcome will definitely affect what you do: through the reviewing process, when applying for funding and on submitting your own papers. As to reviewing, the Reproducible Research movement would require assessing not only the paper but also the accompanying data and code. It would be the reviewer’s responsibility to make sure that the code, run on the data, produced the figures and results in the paper. I am sure most of us already find the reviewing process too time consuming. So, increasing it will clearly make things worse. The alternative is that this verification exercise will take up most of the reviewers’ time leaving little for a more critical, and broader, analysis of the research. Already funding agencies, such as the National Institutes of Health (NIH), are beginning to make Reproducibility a condition for their support (NIH, Citation2016). The American Statistical Association (ASA) has written an open letter to funding agencies on this topic (Broman et al., Citation2017). There already are specific grants for research in this area (Iorns, Citation2013). Undoubtedly there will be more, putting an extra burden on the limited funding available. It will become a requirement for many conferences and journals. Within our own community, some have started by simply encouraging such papers. I believe, though, if the Reproducible Research movement is successful, encouragement will be replaced by something stronger.

3. The majority opinion

I take the main arguments for Reproducible Research from a paper of the same title, representing the conclusions of the Yale Round-table on Data and Code Sharing in the Computational Sciences (Stodden, Citation2010). There are very similar views held within the machine learning community. A paper published in JMLR (Sonnenburg et al., Citation2007), one of our two main machine learning journals argued that open source software was critical to support reproducibility. This paper had 16 authors, many well known with the machine learning community.

The round-table shows even broader support. It attracted 21 people from a good cross section of disciplines: Law, Medicine, Geoscience, Statistics, Physics, Bio-informatics and others. I believe it represents a broadly held viewpoint as evidenced by the number of researchers involved and the fields they represent. I also chose this paper because of the clarity of its position; clearly time was spent to work out the details and implications of their proposal. Certainly, some researchers have more modest aims and propose relatively low impact ways of improving reproducibility (Sandve, Nekrutenko, Taylor, & Hovig, Citation2013). However, I would contend it is not their voices that are the loudest or the most influential.

I believe the main arguments for Reproducible Research are the following:

(1)

It is, and has always been, an essential part of science; not doing so is simply bad science.

(2)

It is an important step in the ‘Scientific Method’ allowing science to progress by building on previous work; without it progress slows.

(3)

It requires the submission of the data and computational tools used to generate the results; without it results cannot be verified and built upon.

(4)

It is necessary to prevent scientific misconduct; the increasing number of cases is causing a crisis of confidence in science.

In the rest of the paper, I intend to show that each of these arguments is suspect. My main concern is to address the claim that Reproducible Research, or some simple variant thereof, is the right and only way to do good science; that to achieve it, we must require the submission of code and data along with any paper. I am convinced that this would be a not inconsiderable burden on writer and reviewer alike. If the result was a considerable reduction in scientific misconduct that might represent a justifiable burden. However, I would suggest that rather than reducing problems it may have the opposite effect. Reviewers under time pressure will be less critical of a paper’s content and more concerned about the correct form of the data and code received.

4. A minority opinion

In this section, I offer what is clearly a minority opinion. Yet, I hope, at least, to increase the size of that minority through this paper. Let me clarify what are, I claim, the problems for each of the arguments given for Reproducible Research.

Let me sketch my response here:

(1)

Reproducibility, at least in the form proposed, is not now, nor has it ever been, an essential part of science.

(2)

The idea of a single well-defined scientific method resulting in an incremental, and cumulative, scientific process is, at the very best, moot.

(3)

Requiring the submission of data and code will encourage a level of distrust among researchers and promote the acceptance of papers based on narrow technical criteria.

(4)

Misconduct has always been part of science with surprisingly little consequence. The public’s distrust is likely more to with the apparent variability of scientific conclusions.

4.1. Not an essential part of science

The first claim for Reproducible Research is that it is, and has always been, a essential part of science. Let us begin with a couple of quotes that emphasise this point. Sonnenburg et al. (Citation2007) states ‘Reproducibility of experimental results is a cornerstone of science’. Wren (Citation2014) concurs ‘Reproducibility – the ability to redo an experiment and get the same results – is a cornerstone of science’. In fact, this particular phrase seems to have moved into the general lexicon; when reproducibility is paired with ‘cornerstone of science’, Google returns close to 100,000 hits. The repeated use of the term ‘cornerstone’ clearly indicates there is a widely held belief that Reproducibility is an essential part of science, in all its disparate forms. To make a claim about what is fundamental to science, one must take an historical overview. The round-table makes this argument explicit claiming that ‘Traditionally, papers contained the information needed to effect reproducibility’. Others have argued for the central role of something similar. In a Science editorial, Crocker and Cooper (Citation2011) write ‘Replication is considered the scientific gold standard’. Peng (Citation2011) argues ‘Replication is the ultimate standard by which scientific claims are judged’. Again, using phrases like ‘gold standard’ and ‘ultimate standard’, a very strong statement is being made about its importance to science in general.

Clearly, there are some terminological differences, as others have pointed out (Kenett & Shmueli, Citation2015; Nature Editorial, Citation2016). Here, I make my own attempt to tease the issues apart. To this end, let us look at a few papers in a special issue of the journal Science called Data Replication and Reproducibility (Jasny et al., Citation2011). Interestingly, the title of this special issue uses both terms. The introduction seems to use them interchangeably. It suggests there is a single notion of replication and what is of interest is how it is achieved across multiple fields. However, Peng (Citation2011), in the same issue, does make a distinction ‘This [Reproducible Research] standard falls short of full replication because the same data are analyzed again’.

Overall, I think, we need to consider three quite separate concepts: Reproducibility, Replicability and what I will call Retestability. These may not be ideal terms but the first has been co-opted by movement itself; the second is I believe a statistical concept of some standing and the third I chose to be neutral to whether the broad approach was one of verification or falsifiability (Popper, Citation1968).

Reproducibility requires that the experiment originally carried out be duplicated as far as is reasonably possible. The aim is to minimise the difference from the first experiment including its flaws, to produce independent verification of the result as reported.

Replicability is a statistical concept. It addresses the problem of limited samples, that might have produced an apparently good result by chance. The aim here is also to minimise the difference from the first experiment. The single change is that the data, although drawn from the exactly the same source, should come from an independent sample.

Retestability is a more general scientific concept. It addresses the robustness and generalisability of the result. The point is to deliberately increase the difference between the two experiments, to determine if the changes affect the outcome. To put it another way, it is the result that is being replicated not the experiment.

In the Science special issue, Ryan (Citation2011) in his paper on ‘Frog eating bats’ considers the latter two. As far as Replication, he says ‘Was this observation replicated? Yes, we caught several bats feasting on these frogs’. This seems to clearly reflect a statistical concern. Yet, he goes on ‘we replicated the same experiment in a flight cage’. Here, deliberate changes were made to effectively support the validity of the claim. Although Ryan called this replication, and the ‘same experiment’, I consider it more a case of Retestability. I conjecture that he called the latter Replication because he did not consider the changes should be critical to the outcome. In fact, the intent was to show they were not, so that the robustness of the result could be demonstrated. I would go further and say the more changes made the stronger the support, as I have argued elsewhere (Drummond, Citation2009). There, I gave an example of measuring the speed of light in two distinct ways: the transition time of the moons of Jupiter, the parallax of stars as seen from earth. The two experiments share nothing in common and this is what gives the combination of the two such evidential power.

It seems clear to me that Reproducibility as proposed by the round table, and supporters within the machine learning community, has never been a central tenet of science. Nevertheless, like Peng, some might still argue that it is useful as ‘a minimum standard’ when true statistical Replication is not practical to do. If that is so, should one consider Replicability the true ‘gold standard’ or ‘cornerstone of science’, with Reproducibility acting as an occasional stand-in? It is true that there are an increasing number of research fields where statistics takes on a fundamental role; Biostatistics and Geostatistics are two such examples. Yet, even in these areas, we primarily seek evidence for, or against, a scientific hypothesis not a statistical one. Replicability, at least in a formal sense, is tied strongly to the idea of statistical hypothesis testing. This idea was introduced by Fisher (Citation1925), refined by Neyman and Pearson (Citation1933) and became an integral part of some, but by no means all, sciences much later. This time-line would not include most of the major events in science, particularly in Physics. Therefore, I would claim even Replicability fails to make the grade. Surely then, only Retestability has any real claim to be a gold standard. Reproducibility is far too weak to be considered even a minimum one.

4.2. No single scientific method

Even if one concedes that Reproducibility has little or nothing to do with the verification, or falsification, of scientific hypotheses, one might still claim that it is an important step in the scientific method. Without such a step, one could go on, it is impossible to build on previous work and scientific progress is slowed. The round-table puts it this way ‘Reproducibility will let each generation of scientists build on the previous generations’ achievements’. In a recent Science editorial, Crocker and Cooper (Citation2011) concur ‘ [it] is the cornerstone of a cumulative science’.

The idea of a clear scientific method that has been, and should be, followed is pervasive and to many persuasive. It pervades education at all levels. It is taught to schoolchildren. It is taught to undergraduates. It is even taught to graduate students. One reason for its popularity is that it makes a clear distinction as to what falls under the rubric of science and what does not, separating science from pseudo-science. It also defines some clear and simple steps that if followed should produce solid science, a clear pedagogical advantage. The round table supports the idea of such a method with reproducibility as a critical step, ‘To adhere to the scientific method we must be able to reproduce computational results’. They go onto argue that following this method should be strongly encouraged through the policies of journals and funding agencies. Crocker and Cooper (Citation2011) criticise journals for putting up barriers that discourage Reproducibility, ‘Despite the need for reproducible results findings are almost impossible to publish in top scientific journals’.

This cumulative view of science is not universal by any means. Polanyi (Citation1958) was an early critic of the idea of a single scientific method. His work strongly influenced Kuhn (Citation1962) who, in his famous book The Structure of Scientific Revolutions, contended that science progressed primarily through a series of paradigm shifts, rather than an incremental manner building on past successes. These shifts were, however, separated by what he termed normal science, a process of gathering facts under a single dominant paradigm. Certainly, many philosophers support some notion of scientific progress but this is a long way from the narrow sense of incrementalism promoted by the Reproducible Research movement. The philosopher Feyerabend (Citation1970) takes an extreme view ‘The only principle that does not inhibit progress is: anything goes’. Some might feel that we should not worry too much about what philosophers say. As the science historian, Holton (Citation1986) puts it ‘the perception by the large majority of scientists, right or wrong, that the messages of more recent philosophers may be safely neglected’. But it is not only the philosophers who feel that the idea of a single method is a considerable oversimplification. Polanyi was a scientist in his own right and Bridgman (Citation1986), in his book Reflections of a Physicist, argued ‘ there are as many scientific methods as there are individual scientists’. More recently, the physicist and Nobel Laureate Weinberg (Citation1995) wrote ‘We do not have a fixed scientific method to rally around and defend’

It would seem that the claim for a single scientific method is at best debatable and therefore any argument for the necessity of particular steps is somewhat suspect. Science does clearly progress by ideas shared throughout a community. However, this is not to commit to a method where progress is achieved by taking the precise results of one paper and working to incrementally improve on them, the process presumably being repeated ad infinitum. I would claim this is a very impoverished view of how science progresses. I have written elsewhere that we should be neither too orthodox or too anarchistic (Drummond, Citation2008). We need to find a balance between strongly enforced standards and a free for all that makes it difficult to assess another’s research. I believe, we already have a generally shared sense of what it means to be scientific but to enforce much narrower standards would seem to be of dubious merit and without historical justification.

4.3. Submission of data and code counterproductive

One might still argue that, as any author should have the code and data readily available, the additional costs of submission would be minimal. Peng (Citation2011), for example, suggests that ‘Publishing code is something we can do now for almost no additional cost’. Thus any tangible benefits should come cheaply. I have no wish to argue against the existing practice of sharing of software tools within the community. This can introduce a degree of commonality which makes experimental comparisons easier. It also can be an effective way of promoting one’s own research; an approach with which this author has some experience (Drummond & Holte, Citation2006). However, submitting code with every paper, in whatever language, for whatever system, will simply result in an accumulation of questionable software. There may be some cases where people would be able to use it but I would doubt that they would be frequent. As Hamilton (Citation1990) points out, many papers are uncited and others have only a few citations. It is reasonable to infer, even setting aside any difficulties with execution, that the majority of code would not be used through lack of interest. The round table is clearly concerned about problems arising from non-executable code. To address this, it proposes ‘A system with a devoted scientific community that maintains the code and the reproducibility status’. The round table recognises that this would be a not inconsiderable burden to the community, yet contend that the benefits would be large enough to significantly outweigh the costs.

I am less convinced that this trade-off is a good one. In fact, I am concerned that not all the costs have been identified and the apparent benefits might not be realised at all. Firstly, the process will seem to many to be little more than a policing exercise, checking that people have done exactly what they claim. This will undermine the level of trust between researchers that is important to any scientific community. Bissell (Citation2013) confirms that this is already happening. Secondly, even if you feel that the effort needed is not large as the round table suggests, it will clearly be another stage added to the reviewing process. Already, there is a large, and ever growing, workload for reviewers. The increase is due mainly to what I will term ‘paper inflation’. It arises from the increasing pressures on scientists to publish and publish often. More papers can mean greater funding for an established scientist. More papers can mean a better position for a graduate student or post doctoral fellow. I believe this increasing load is already reducing the time spent on each review. Reviews that would typically have been half a page to a page long are now much shorter. It also makes it more difficult to find an expert on any particular topic who has time for a review. When reviewers are under time pressure they will tend to make judgements that are easy to appraise and justify, e.g. checking that certain narrow technical criteria have been met. What the round table proposes will increase this load substantially. We should be working to reduce, not increase, reviewer workload. I would claim that careful reviewing by experts is a much better defence against scientific misconduct than any execution of code.

4.4. Misconduct in science is not new

One motivation for Reproducible Research is a putative recent increase in the number of cases of scientific misconduct. The round table states ‘Relaxed attitudes about communicating computational experiments details is causing a large and growing credibility gap’. Crocker & Cooper (Citation2011) voice similar worries ‘The costs of the fraud for science, and for public trust in science are devastating’. The round table goes as far as to say there is an ‘ emerging credibility crisis’.

So, there is clearly a claim that something has recently changed and that is causing the crisis. Yet, misconduct is hardly new in science. Broad & Wade (Citation1984) give a few examples that are some 2000 years old. They list many more recent, yet certainly not modern, cases, some by very eminent scientists. Some of the published results by Mendel, considered the originator of the study of genetics, are somewhat too good to be true. Even Sir Issac Newton, himself, is not above reproach.

This is certainly not to condone such behaviour but only to wonder how science has been so successful in the past if the effect of ‘relaxed attitudes’ is so devastating. One case that often comes to people’s minds, when this issue is discussed, is the announcement of ‘Cold Fusion’ in March 1989 by Pons and Fleischmann (NYT, Citation1989). I concede this would seem to be a prototypical case showing the importance of Reproducibility. However, this discovery would have had far greater impact than the vast majority of scientific results. The discovery of fusion that took place at low temperatures would have considerable consequences to the field of physics and enormous societal impact as a means of producing energy with little environmental consequence. Many scientists did attempt to reproduce the result and failed. In the end, the impact of this announcement was short lived. In different fields, there may be rare cases of very pivotal results that would also benefit from reproduction. I would doubt that these are anywhere close to the norm, even in our top journals. Again, researchers in those fields would identify what is important and challenge the appropriate result. There is little need for additional safeguards.

I am also convinced that misconduct is not the main reason some of the general public have little trust in science. I would suggest that the public are mostly concerned about science when it affects them directly. An example is a health issue, such as the question ‘is coffee good or bad for me?’ Science, as reported in the media, fails to give a clear answer. In the eighties, a link was established with pancreatic cancer (NYT, Citation1981). In the nineties, coffee was found not to influence the frequency of bladder cancer (CST, Citation1993). Early in the new millennium, carcinogens were found in coffee (ST, Citation2006). More recently coffee has been linked to reduction in brain cancer (USAT, Citation2010). For scientists, this may seem unproblematic; these cases are not mutually contradictory. Even if they were, the overturning of early results by later studies is normal science. However, many people expect that science should have a single definitive answer, particularly about what they consider to be an important issue.

If I am right, it would seem that any crisis of confidence would best be addressed by better informing the general public of the way science works. That there is consensus on a number of broad theories whereas other ideas are much more speculative. The idea of scientific consensus has been used to convince the public of the veracity of global warming (Oreskes, Citation2004). Nevertheless, it is quite common to have experimental results that conflict. That is why meta-analysis is so important in many fields. One concern is that some scientists seem willing to put great faith in results from a single or small number of experiments. This is, at least partially, why they are so upset when misconduct is discovered. Even without misconduct, there are many potential sources of error that can readily creep into experiments. One such source, Meehl (Citation1990) humorously called the ‘crud factor’, uncontrollable errors that creep into all experiments. He goes onto describe many other sources of experimental error. Perhaps another lesson we might take from the Duke University cancer trials fiasco is that we should be less willing to go to clinical trials based on such limited evidence. If we were somewhat more skeptical about the results of scientific experiments, cases of misconduct would probably have much less of an impact.

5. What I am not arguing against

So far in this paper, I have tried, as much as possible, to make clear as to what I contend are the weaknesses of the arguments made for Reproducible Research. I think it also important to make clear what I am not arguing against.

For some authors in machine learning, Reproducibility per se is not their primary focus (Sonnenburg et al., Citation2007; Vanschoren et al., Citation2012). It is only seen as one of many benefits of their approaches. I contend that there are many ideas that have been too strongly conflated with Reproducible Research. Thus, I hope to disarm some of the arguments that might be levelled against my position by making the separation clearer. Let me list some of these ideas:

5.1. Open science

Many have argued that research should be much more broadly available. Some fifteen years ago, a group of about 15 people formed the Budapest Open Access Initiative (Open Society Institute, Citation2001). It now has close to 6000 signatories. The main concern is that access to journals, or travelling to international conferences, is priced too high. This excludes researchers from poorer countries or, indeed, other interested parties that lack institutional backing. I certainly support this broad principal. Yet, I am not convinced that it is fundamentally tied to Reproducible Research as some suggest (Groves & Godlee, Citation2012). This initiative was formed some time before the latter really got off the ground. Surely, the main barrier to Open Science is the publication process. This is a clear motivator for the increasingly common practice of having open journals on the web. Even the prestigious Royal Society now has one (Sanders, Citation2014).

5.2. Open data

From a machine learning perspective, one of greatest benefits of Open Data is the ability to try ones own algorithms on a broad selection of real data-sets. This was certainly one reason for the success of collections of data-sets such as the UCI Machine Learning Repository, one of the earliest of such resources (Asuncion & Newman, Citation2007). Now, there are many more (Services, Citation2016). In addition, familiarity with commonly used data-sets can have the advantage of making results in papers easier to understand and compare. Indeed, this practice in other disciplines extends to physical and biological specimens, among other things, and achieves a similar effect. I would stress again these advantages are independent of its putative use in Reproducible Research. Further, sometimes, such over-familiarity can become a problem (Drummond & Japkowicz, Citation2010).

5.3. Open source software

There is no doubt in my mind that open source software has made my work much easier, even in some cases possible at all. I use R (R Development Core Team, Citation2008), OpenModelica (Fritzson et al., Citation2005), emacs (Stallman, Citation1981) and many many more pieces of freely available software. My personal experience is that I obtain most benefit using established algorithms, that have been around for years if not decades, rather than something freshly minted. Such tools are particularly useful when they come from a field that is only tangentially connected to ones own. I have used computational geometry, methods from linear algebra, integer, linear, and quadratic programming and many more. As it is not my primary research field, it helps considerably to have well written trustworthy routines. The main use for machine learning software, I find, is much more pedagogical than practical. They are ideal for introducing someone to a new algorithm but much less useful when developing ones own. Existing code can be useful as a template indicating things needed to be considered. However, again this is unlikely, in my experience, to be from very recent research.

5.4. Literate programming

With many open source packages, it is now much easier to tie the writing of a paper to the computational tools that carry out experiments. Literate programming has blocks of code embedded within the text of a paper (Knuth, Citation1988). This code can be executed to run an experiment and put the graphical output directly into the paper. This has a number of advantages: updating of a paper is quick and easy; and the code is better documented by connecting it directly to its broader purpose. Clearly it does improve the ease of sharing code. However, the sort of mandated sharing that Reproducible Research proposes is just a small part of that. So, I see little support for the two being intimately tied, as some claim (Schulte, Davison Dye, & Dominik, Citation2012).

To summarise this section, my general point is all these ideas may, indeed, make Reproducible Research easier. Yet, they all can be clearly justified without it. So, I suggest that even a strong supporter of one, or more, of the ideas listed above, need not be committed to Reproducible Research.

6. Conclusions

My main aim in this paper was to raise questions as to the benefits accrued through adopting the recommendations of the Reproducible Research movement. My primary concern was the argument linking it to good scientific practice. This went against my intuitions. The more I read about it, the less convinced I became. I hope I have persuaded, at least, some of you that the connection is much weaker than many assume. Without those scientific credentials, the other arguments for it must be considerably weakened. I am far from convinced that what is left behind is substantive. Perhaps, the connection might be restored by replacing the notion of Reproducibilty with something of more general scientific meaning. This would certainly do much to address my problem with the first argument and, perhaps, the second. However, there are other weaknesses which I have listed and these would be harder to address. In particular, if Reproducibilty in the narrow sense is not what is needed, then I cannot see the necessity of publishing code and data along with any paper. It may be that a more moderate approach would address some of the concerns I have raised. I will leave it to others to make that claim. My concern is mainly the impact it is already having and will have increasingly into the future. I would contend that any move by editors of journals or funding agencies to enforce the ideas put forwards would not serve any scientific community well. There are much more important issues to address.

Notes

No potential conflict of interest was reported by the authors.

1 A Google search of the terms ‘Reproducible Research’ returned over 16 million pages.

References

  • AAAS. (2011, February). AAAS annual meeting: Symposium on the digitization of science: Reproducibility and interdisciplinary knowledge transfer. Washington, DC.
  • AMP. (2011). Applied mathematics perspectives workshop on reproducible research: Tools and strategies for scientific computing applied mathematics perspectives. Vancouver, BC.
  • Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository (Technical report). Irvine, CA: University of California, School of Information and Computer Science.
  • Baker, M. (2015, August). Over half of psychology studies fail reproducibility test. Nature. doi:10.1038/nature.2015.18248
  • Bissell, M. (2013, November). Reproducibility: The risks of the replication drive. Nature, 503, 333–334.
  • Bridgman, P. (1986). Reflections of a physicist. Oxford: Oxford Science Publications, Clarendon Press.
  • Broad, W., & Wade, N. (1984). Betrayers of the truth. New York: Random House.
  • Broman, K., Cetinkaya-Rundel, M., Nussbaum, A., Paciorek, C., Peng, R., Turek, D., & Wickham, H. (2017, January). Recommendations to funding agencies for supporting reproducible research. American Statistical Association. Retrieved from http://www.amstat.org/asa/files/pdfs/pol-reproducibleresearchrecommendations.pdf
  • Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., ... Wu, H. (2016). Evaluating replicability of laboratory experiments in economics. Science, 351(6280), 1433–1436.
  • Crocker, J., & Cooper, M. L. (2011). Editorial: Addressing scientific fraud. Science, 334, 1182.
  • CST. (1993, June). Coffee ‘not factor’ in bladder cancer. Chicago Sun-Times.
  • Drummond, C. (2008). Finding a balance between anarchy and orthodoxy. In Proceedings of the Twenty-Fifth International Conference on Machine Learning: Workshop on Evaluation Methods for Machine Learning III (4 pages), Helsinki.
  • Drummond, C. (2009). Replicability is not reproducibility: Nor is it good science. In Proceedings of the Twenty-Sixth International Conference on Machine Learning: Workshop on Evaluation Methods for Machine Learning IV (4 pages), Montreal.
  • Drummond, C., & Holte, R. C. (2006). Cost curves: An improved method for visualizing classifier performance. Machine Learning, 65(1), 95–130.
  • Drummond, C., & Japkowicz, N. (2010). Warning: Statistical benchmarking is addictive kicking the habit in machine learning. Journal of Experimental and Theoretical Artificial Intelligence, 22, 67–80.
  • Economist Editorial. (2011, September). An array of errors, investigations into a case of alleged scientific misconduct have revealed numerous holes in the oversight of science and scientific publishing. The Economist. Retrieved from http://www.economist.com/node/21528593.
  • Economist Editorial. (2013, October). Unreliable research: Trouble at the lab. The Economist. Retrieved from https://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble
  • ENAR. (2011, March). Research ethics in biostatistics: Invited panel discussion at ENAR 2011 on the biostatistician’s role in reproducible research. Miami, FL.
  • Feyerabend, P. (1970). Against method: Outline of an anarchistic theory of knowledge. Atlantic Highlands, NJ: Humanities Press.
  • Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.
  • Fritzson, P., Aronsson, P., Lundvall, H., Nyström, K., Pop, A., Saldamli, L., & Broman, D. (2005). The Open Modelica modeling, simulation, and development environment. Proceedings of the 46th Conference on Simulation and Modelling of the Scandinavian Simulation Society Trondheim, Trondheim, Norway.
  • Groves, T., & Godlee, F. (2012). Open science and reproducible research. Editorial BMJ, 344, e4383. Retrived from http://www.bmj.com/content/344/bmj.e4383
  • Guardian. (2011, July). Scientific fraud in the UK: The time has come for regulation. Manchester Guardian.
  • Hamilton, D. P. (1990). Publishing by -- and for? -- the numbers. Science, 250, 1331–1332.
  • Holton, G. (1986). The advancement of science, and its burdens: The Jefferson lecture and other essays. Cambridge: Cambridge University.
  • Iorns, E. (2013, October). Reproducibility initiative receives \$1.3m grant to validate 50 landmark cancer studies. Online at Center for Open Science. Retrieved from https://cos.io/about/news/reproducibility-initiative-receives
  • Jasny, B. R., Chin, G., Chong, L., & Vignieri, S. (2011). Introduction: Special issue on data replication and reproducibility. Science, 334(6060), 1225.
  • Johns Hopkins University. (2015). Reproducible research. Online Course. Retrieved from https://www.coursera.org/course/repdata
  • Kazai, G., & Hanbury, A. (Eds.). (2015). European Conference on Information Retrieval: Special Track. Vienna, Austria.
  • Kenett, R. S., & Shmueli, G. (2015). Clarifying the terminology that describes scientific reproducibility. Nature Methods, 12(699).
  • Kibler, D., & Langley, P. (1988). Machine learning as an experimental science. In Proceedings of the Third European Working Session on Learning, 81–92.
  • Knuth, D. (1988). Literate programming. The Computer Journal, 27, 97–111.
  • Kuhn, T. (1962). The structure of scientific revolutions. University of Chicago.
  • Meehl, P. E. (1990). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141.
  • Moska, S. (Ed.). (2015). Worshop on Reproducible Research at Computational and Simulation Sciences and eResearch Conference.
  • Nature. (2015, October). Challenges in irreproducible research. Nature Special Issue.
  • Nature Editorial. (2016, May). Reality check on reproducibility. Nature, 533(437).
  • Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society London, 231, 289–337.
  • NIH. (2016, April). Rigor and reproducibility. Retrieved from https://grants.nih.gov/reproducibility/index.htm
  • NSF. (2010). National science foundation workshop on changing the conduct of science in the information age summary.
  • NYT. (1981, March). Study links coffee use to pancreas cancer. New York Times.
  • NYT. (1989, May). Physicists debunk claim of a new kind of fusion. New York Times.
  • NYT. (2011, July). How bright promise in cancer testing fell apart. New York Times.
  • NYT. (2012, January). University suspects fraud by a researcher who studied red wine. New York Times.
  • Open Society Institute. (2001). Budapest open access initiative.
  • Oreskes, N. (2004). Beyond the ivory tower: The scientific consensus on climate change. Science, 306(5702), 1686.
  • Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226–1227.
  • Polanyi, M. (1958). Personal knowledge: Towards a post-critical philosophy. Routledge.
  • Popper, K. R. (1968). The logic of scientific discovery. New York: Harper & Row.
  • R Development Core Team. (2008). R: A language and environment for statistical computing. Retrieved from http://www.R-project.org
  • Ryan, M. J. (2011). Replication in field biology: The case of the frog-eating bat. Science, 334(6060), 1229–1230.
  • Sanders, J. (Ed.). (2014). Royal society open science. Retrieved from http://rsos.royalsocietypublishing.org/
  • Sandve, G. K., Nekrutenko, A., Taylor, J., & Hovig, E. (2013, October 24). Editorial: Ten simple rules for reproducible computational research. PLOS Computational Biology.
  • Schulte, E., Davison Dye, T., & Dominik, C. (2012). A multi-language computing environment for literate programming and reproducible research. Journal of Statistical Software, 46(3).
  • Services, A. W. (2016). AWS public data sets. Retrieved from http://aws.amazon.com/datasets/
  • SIAM-CSE. (2011). SIAM conference on computational science \ & engineering workshop on verifiable, reproducible computational science.
  • SIAM-Geo. (2011). SIAM geosciences workshop on reproducible science and open-source software in the geosciences.
  • Sonnenburg, S., Braun, M. L., Ong, C. S., Bengio, S., Bottou, L., Holmes, G., & Williamson, R. (2007). The need for open source software in machine learning. JMLR, 2443–2466.
  • ST. (2006). Cancer chemical found in coffee. Sunday Times.
  • Stallman, R. M. (1981). Emacs: The extensible, customizable, self-documenting display editor (Technical Report AIM-519A, MIT Artificial Intelligence Laboratory), Cambridge Massachusetts
  • Stodden, V., Leisch, F., & Peng, R. D. (Eds.). (2014). Implementing reproducible research. The R Series. Chapman & Hall/CRC.
  • Stodden, V. C. (2010). Reproducible research: Addressing the need for data and code sharing in computational science Yale law school roundtable on data and code sharing. Computing in Science & Engineering, 12(5), 8–13.
  • Swiss Institute of BioInformatics. (2015, January). Reproducible research. Retrieved from http://edu.ch.embnet.org/course/view.php?id=193
  • TCL. (2010, August). Duke finds ‘issues of substantial concern’ and sanctions Potti. The Cancer Letter.
  • USAT. (2010, November). Can coffee, tea lower brain cancer risk? USA Today.
  • Vanschoren, J., Blockeel, H., Pfahringer, B., & Holmes, G. (2012). Experiment databases. A new way to share, organize and learn from experiments. Machine Learning, 87(2), 127–158.
  • Weinberg, S. (1995). The methods of science and those by which we live. Academic Questions, 8(2), 7–13.
  • Wren, K. (2014, May). As concerns about non-reproducible data mount, some solutions take shape. AAAS News.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.