The cancer researchers Glenn Begley and Lee Ellis made a rather remarkable claim last year. In a commentary that analyzed the dearth of efficacious novel cancer therapies, they revealed that scientists at the biotechnology company Amgen were unable to replicate the vast majority of published pre-clinical research studies. Only 6 out of 53 landmark cancer studies could be replicated, a dismal success rate of 11%! The Amgen researchers had deliberately chosen highly innovative cancer research papers, hoping that these would form the scientific basis for future cancer therapies that they could develop. It should not come as a surprise that progress in developing new cancer treatments is so sluggish. New clinical treatments are often based on innovative scientific concepts derived from pre-clinical laboratory research. However, if the pre-clinical scientific experiments cannot be replicated, it would be folly to expect that clinical treatments based on these questionable scientific concepts would succeed.
Reproducibility of research findings is the cornerstone of science. Peer-reviewed scientific journals generally require that scientists conduct multiple repeat experiments and report the variability of their findings before publishing them. However, it is not uncommon for researchers to successfully repeat experiments and publish a paper, only to learn that colleagues at other institutions can’t replicate the findings. This does not necessarily indicate foul play. The reasons for the lack of reproducibility include intentional fraud and misconduct, yes, but more often it’s negligence, inadvertent errors, imperfectly designed experiments and the subliminal biases of the researchers or other uncontrollable variables.
Clinical studies, of new drugs, for example, are often plagued by the biological variability found in study participants. A group of patients in a trial may exhibit different responses to a new medication compared to patients enrolled in similar trials at different locations. In addition to genetic differences between patient populations, factors like differences in socioeconomic status, diet, access to healthcare, criteria used by referring physicians, standards of data analysis by researchers or the subjective nature of certain clinical outcomes – as well as many other uncharted variables – might all contribute to different results.
The claims of low reproducibility made by Begley and Ellis, however, did not refer to clinical cancer research but to pre-clinical science. Pre-clinical scientists attempt to reduce the degree of experimental variability by using well-defined animal models and standardized outcomes such as cell division, cell death, cell signaling or tumor growth. Without the variability inherent in patient populations, pre-clinical research variables should in theory be easier to control. The lack of reproducibility in pre-clinical cancer research has a significance that reaches far beyond just cancer research. Similar or comparable molecular and cellular experimental methods are also used in other areas of biological research, such as stem cell biology, neurobiology or cardiovascular biology. If only 11% of published landmark papers in cancer research are reproducible, it raises questions about how published papers in other areas of biological research fare.
Following the publication of Begley and Ellis’ commentary, cancer researchers wanted to know more details. Could they reveal the list of the irreproducible papers? How were the experiments at Amgen conducted to assess reproducibility? What constituted a successful replication? Were certain areas of cancer research or specific journals more prone to publishing irreproducible results? What was the cause of the poor reproducibility? Unfortunately, the Amgen scientists were bound by confidentiality agreements that they had entered into with the scientists whose work they attempted to replicate. They could not reveal which papers were irreproducible or specific details regarding the experiments, thus leaving the cancer research world in a state of uncertainty. If so much published cancer research cannot be replicated, how can the field progress?
The list of scientific journals in which some of the irreproducible papers were published includes the the “elite” of scientific publications: The prestigious Nature tops the list with ten mentions, but one can also find Cancer Research (nine mentions), Cell (six mentions), PNAS (six mentions) and Science (three mentions).
Does this mean that these high-profile journals are the ones most likely to publish irreproducible results? Not necessarily. Researchers typically choose to replicate the work published in high-profile journals and use that as a foundation for new projects. Researchers at MD Anderson Cancer Center may not have been able to reproduce the results of ten cancer research papers published in Nature, but the survey did not provide any information regarding how many cancer research papers in Nature were successfully replicated.
The lack of data on successful replications is a major limitation of this survey. We know that more than half of all scientists responded “Yes” to the rather opaque question “Have you ever tried to reproduce a finding from a published paper and not been able to do so?”, but we do not know how often this occurred. Researchers who successfully replicated nine out of ten papers and researchers who failed to replicate four out of four published papers would have both responded “Yes.” Other limitations of this survey include that it does not list the specific irreproducible papers or clearly define what constitutes reproducibility. Published scientific papers represent years of work and can encompass five, ten or more distinct experiments. Does successful reproducibility require that every single experiment in a paper be replicated or just the major findings? What if similar trends are seen but the magnitude of effects is smaller than what was published in the original paper?
Due to these limitations, the survey cannot provide definitive answers about the magnitude of the reproducibility problem. It only confirms that lack of reproducibility is a potentially important problem in pre-clinical cancer research, and that high-impact peer-reviewed journals are not immune. While Begley and Ellis have focused on questioning the reproducibility of cancer research, it is likely that other areas of biological and medical research are also struggling with the problem of reproducibility. Some of the most highly cited papers in stem cell biology cannot be replicated , and a recent clinical trial using bone marrow cells to regenerate the heart did not succeed in improving heart function after a heart attack despite earlier trials demonstrating benefits.
Does this mean that cancer research is facing a crisis? If only 11% of pre-clinical cancer research is reproducible, as originally proposed by Begley and Ellis, then it might be time to sound the alarm bells. But since we don’t know how exactly reproducibility was assessed, it is impossible to ascertain the extent of the problem. The word “crisis” also has a less sensationalist meaning: the time for a crucial decision. In that sense, cancer research and perhaps much of contemporary biological and medical research needs to face up to the current quality control “crisis.” Scientists need to wholeheartedly acknowledge that reproducibility is a major problem and crucial steps must be taken to track and improve the reproducibility of published scientific work.
First, scientists involved in biological and medical research need to foster a culture that encourages the evaluation of reproducibility and develop the necessary infrastructure. When scientists are unable to replicate results of published papers and contact the authors, the latter need to treat their colleagues with respect and work together to resolve the issue. Many academic psychologists have already recognized the importance of tracking reproducibility and initiated a large-scale collaborative effort to tackle the issue; the Harvard psychologists Joshua Hartshorne and Adena Schachner also recently proposed using a formal approach to track the reproducibility of research. Biological and medical scientists should consider adopting similar infrastructures for their research, because reproducibility is clearly not just a problem for psychology research.
Second, grant-funding agencies should provide adequate research funding for scientists to conduct replication studies. Currently, research grants are awarded to those who propose the most innovative experiments, but few — if any — funds are available for researchers who want to confirm or refute a published scientific paper. While innovation is obviously important, attempts to replicate published findings deserve recognition and funding because new work can only succeed if it is built on solid, reproducible scientific data.
In the U.S., it can take 1-2 years from when researchers submit a grant proposal to when they receive funding to conduct research. Funding agencies could consider an alternate approach, one that allows for rapid approval of small-budget grant proposals so that researchers can immediately start evaluating the reproducibility of recent breakthrough discoveries. Such funding for reproducibility testing could be provided to individual laboratories or teams of scientists such as the Reproducibility Initiative or the recent efforts of chemistry bloggers to document reproducibility.
The U.S.-based NIH (National Institutes of Health) is the largest source of funding for medical research in the world and is now considering the implementation of new reproducibility requirements for scientists who receive funding. However, not even the NIH has a clear plan for how reproducibility testing should be funded.
Lastly, it is also important that scientific journals address the issue of reproducibility. One of the most common and also most heavily criticized metrics for the success of a scientific journal is its “impact factor,” an indicator of how often an average article published in the journal is cited. Even irreproducible scientific papers can be cited thousands of times and boost a journal’s “impact.”
If a system tracked the reproducibility of scientific papers, one could conceivably calculate a reproducibility score for any scientific journal. That way, a journal’s reputation would not only rest on the average number of citations but also on the reliability of the papers it publishes. Scientific journals should also consider supporting reproducibility initiatives by encouraging the publication of papers that attempted to replicate previous papers — as long as the reproducibility was tested in a rigorous fashion and independent of whether or not the replication attempts were successful.
There is no need to publish the 20th replication study that merely confirms what 19 previous studies have previously found, but publication of replication attempts is sorely needed before a consensus is reached regarding a scientific discovery. The journal PLOS One has partnered up with the Reproducibility Initiative to provide a forum for the publication of replication studies, but there is no reason why other journals should not follow.
While PLOS One publishes many excellent papers, current requirements for tenure and promotion at academic centers often require that researchers publish in certain pre-specified scientific journals, including those affiliated with certain professional societies and which carry prestige in a designated field of research. If these journals also encouraged the publication of replication attempts, more researchers would conduct them and contribute to the post-publication quality control of scientific literature.
The recent questions raised about the reproducibility of biological and medical research findings is forcing scientists to embark on a soul-searching mission. It is likely that this journey will shake up many long-held beliefs. But this reappraisal will ultimately lead to a more rigorous and reliable science.
Note: An earlier version of this article was first published on Salon.com.