One of the cornerstones of scientific research is the reproducibility of findings. Novel scientific observations need to be validated by subsequent studies in order to be considered robust. This has proven to be somewhat of a challenge for many biomedical research areas, including high impact studies in cancer research and stem cell research. The fact that an initial scientific finding of a research group cannot be confirmed by other researchers does not mean that the initial finding was wrong or that there was any foul play involved. The most likely explanation in biomedical research is that there is tremendous biological variability. Human subjects and patients examined in one research study may differ substantially from those in follow-up studies. Biological cell lines and tools used in basic science studies can vary widely, depending on so many details such as the medium in which cells are kept in a culture dish. The variability in findings is not a weakness of biomedical research, in fact it is a testimony to the complexity of biological systems. Therefore, initial findings need to always be treated with caution and presented with the inherent uncertainty. Once subsequent studies – often with larger sample sizes – confirm the initial observations, they are then viewed as being more robust and gradually become accepted by the wider scientific community.
Even though most scientists become aware of the scientific uncertainty associated with an initial observation as their career progresses, non-scientists may be puzzled by shifting scientific narratives. People often complain that “scientists cannot make up their minds” – citing examples of newspaper reports such as those which state drinking coffee may be harmful only to be subsequently contradicted by reports which laud the beneficial health effects of coffee drinking. Accurately communicating scientific findings as well as the inherent uncertainty of such initial findings is a hallmark of critical science journalism.
A group of researchers led by Dr. Estelle Dumas-Mallet at the University of Bordeaux recently studied the extent of uncertainty communicated to the public by newspapers when reporting initial medical research findings in their recently published paper “Scientific Uncertainty in the Press: How Newspapers Describe Initial Biomedical Findings“. Dumas-Mallet and her colleagues examined 426 English-language newspaper articles published between 1988 and 2009 which described 40 initial biomedical research studies. They focused on scientific studies in which a new risk factor such as smoking or old age had been newly associated with a disease such as schizophrenia, autism, Alzheimer’s disease or breast cancer (total of 12 diseases). The researchers only included scientific studies which had subsequently been re-evaluated by follow-up research studies and found that less than one third of the scientific studies had been confirmed by subsequent research. Dumas-Mallet and her colleagues were therefore interested in whether the newspaper articles, which were published shortly after the release of the initial research paper, adequately conveyed the uncertainty surrounding the initial findings and thus adequately preparing their readers for subsequent research that may confirm or invalidate the initial work.
The University of Bordeaux researchers specifically examined whether headlines of the newspaper articles were “hyped” or “factual”, whether they mentioned whether or not this was an initial study and clearly indicated they need for replication or validation by subsequent studies. Roughly 35% of the headlines were “hyped”. One example of a “hyped” headline was “Magic key to breast cancer fight” instead of using a more factual headline such as “Scientists pinpoint genes that raise your breast cancer risk“. Dumas-Mallet and her colleagues found that even though 57% of the newspaper articles mentioned that these medical research studies were initial findings, only 21% of newspaper articles included explicit “replication statements” such as “Tests on larger populations of adults must be performed” or “More work is needed to confirm the findings”.
The researchers next examined the key characteristics of the newspaper articles which were more likely to convey the uncertainty or preliminary nature of the initial scientific findings. Newspaper articles with “hyped” headlines were less likely to mention the need for replicating and validating the results in subsequent studies. On the other hand, newspaper articles which included a direct quote from one of the research study authors were three times more likely to include a replication statement. In fact, approximately half of all the replication statements mentioned in the newspaper articles were found in author quotes, suggesting that many scientists who conducted the research readily emphasize the preliminary nature of their work. Another interesting finding was the gradual shift over time in conveying scientific uncertainty. “Hyped” headlines were rare before 2000 (only 15%) and become more frequent during the 2000s (43%). On the other hand, replication statements were more common before 2000 (35%) than after 2000 (16%). This suggests that there was a trend towards conveying less uncertainty after 2000, which is surprising because debate about scientific replicability in the biomedical research community seems to have become much more widespread in the past decade.
As in all scientific studies, we need to be aware of the analysis performed by Dumas-Mallet and her colleagues. They focused on analyzing a very narrow area of biomedical research – newly identified risk factors for selected diseases. It remains to be seen whether other areas of biomedical research such as treatment of diseases or basic science discoveries of new molecular pathways are also reported with “hyped” headlines and without replication statements. In other words – this research on “replication statements” in newspaper articles also needs to be replicated. It is not clear that the worrisome trend of over-selling robustness of initial research findings after the year 2000 still persists since the work by Dumas-Mallet and colleagues stopped analyzing studies published after 2009. One would hope that the recent discussions about replicability issues in science among scientists would reverse this trend. Even though the findings of the University of Bordeaux researchers need to be replicated by others, science journalists and readers of newspapers can glean some important information from this study: One needs to be wary of “hyped” headlines and it can be very useful to interview authors of scientific studies when reporting about new research, especially asking them about the limitations of their work. “Hyped” newspaper headlines and an exaggerated sense of certainty in initial scientific findings may erode the long-term trust of the public in scientific research, especially if subsequent studies fail to replicate the initial results. Critical and comprehensive reporting of biomedical research studies – including their limitations and uncertainty – by science journalists is therefore a very important service to society which contributes to science literacy and science-based decision making.
Reproducibility of findings is a core foundation of science. If scientific results only hold true in some labs but not in others, then how can researchers feel confident about their discoveries? How can society put evidence-based policies into place if the evidence is unreliable?
Recognition of this “crisis” has prompted calls for reform. Researchers are feeling their way, experimenting with different practices meant to help distinguish solid science from irreproducible results. Some people are even starting to reevaluate how choices are made about what research actually gets tackled. Breaking innovative new ground is flashier than revisiting already published research. Does prioritizing novelty naturally lead to this point?
Incentivizing the wrong thing?
One solution to the reproducibility crisis could be simply to conduct lots of replication studies. For instance, the scientific journal eLife is participating in an initiative to validate and reproduce important recent findings in the field of cancer research. The first set of these “rerun” studies was recently released and yielded mixed results. The results of 2 out of 5 research studies were reproducible, one was not and two additional studies did not provide definitive answers.
But there’s at least one major obstacle to investing time and effort in this endeavor: the quest for novelty. The prestige of an academic journal depends at least partly on how often the research articles it publishes are cited. Thus, research journals often want to publish novel scientific findings which are more likely to be cited, not necessarily the results of newly rerun older research.
Genetics researcher Barak Cohen at Washington University in St. Louis recently published a commentary analyzing this growing push for novelty. He suggests that progress in science depends on a delicate balance between novelty and checking the work of other scientists. When rewards such as funding of grants or publication in prestigious journals emphasize novelty at the expense of testing previously published results, science risks developing cracks in its foundation.
One of his main concerns is that scientific papers now inflate their claims in order to emphasize their novelty and the relevance of biomedical research for clinical applications. By exchanging depth of research for breadth of claims, researchers may be at risk of compromising the robustness of the work. By claiming excessive novelty and impact, researchers may undermine its actual significance because they may fail to provide solid evidence for each claim.
Prestigious journals often now demand complete scientific stories, from basic molecular mechanisms to proving their relevance in various animal models. Unexplained results or unanswered questions are seen as weaknesses. Instead of publishing one exciting novel finding that is robust, and which could spawn a new direction of research conducted by other groups, researchers now spend years gathering a whole string of findings with broad claims about novelty and impact.
Balancing fresh findings and robustness
A challenge for editors and reviewers of scientific manuscripts is assessing the novelty and likely long-term impact of the work they’re assessing. The eventual importance of a new, unique scientific idea is sometimes difficult to recognize even by peers who are grounded in existing knowledge. Many basic research studies form the basis of future practical applications. One recent study found that of basic research articles that received at least one citation, 80 percent were eventually cited by a patent application. But it takes time for practical significance to come to light.
A collaborative team of economics researchers recently developed an unusual measure of scientific novelty by carefully studying the references of a paper. They ranked a scientific paper as more novel if it cited a diverse combination of journals. For example, a scientific article citing a botany journal, an economics journal and a physics journal would be considered very novel if no other article had cited this combination of varied references before.
This measure of novelty allowed them to identify papers which were more likely to be cited in the long run. But it took roughly four years for these novel papers to start showing their greater impact. One may disagree with this particular indicator of novelty, but the study makes an important point: It takes time to recognize the full impact of novel findings.
Realizing how difficult it is to assess novelty should give funding agencies, journal editors and scientists pause. Progress in science depends on new discoveries and following unexplored paths – but solid, reproducible research requires an equal emphasis on the robustness of the work. By restoring the balance between demands and rewards for novelty and robustness, science will achieve even greater progress.
Murder your darlings. The British writer Sir Arthur Quiller Crouch shared this piece of writerly wisdom when he gave his inaugural lecture series at Cambridge, asking writers to consider deleting words, phrases or even paragraphs that are especially dear to them. The minute writers fall in love with what they write, they are bound to lose their objectivity and may not be able to judge how their choice of words will be perceived by the reader. But writers aren’t the only ones who can fall prey to the Pygmalion syndrome. Scientists often find themselves in a similar situation when they develop “pet” or “darling” hypotheses.
How do scientists decide when it is time to murder their darling hypotheses? The simple answer is that scientists ought to give up scientific hypotheses once the experimental data is unable to support them, no matter how “darling” they are. However, the problem with scientific hypotheses is that they aren’t just generated based on subjective whims. A scientific hypothesis is usually put forward after analyzing substantial amounts of experimental data. The better a hypothesis is at explaining the existing data, the more “darling” it becomes. Therefore, scientists are reluctant to discard a hypothesis because of just one piece of experimental data that contradicts it.
In addition to experimental data, a number of additional factors can also play a major role in determining whether scientists will either discard or uphold their darling scientific hypotheses. Some scientific careers are built on specific scientific hypotheses which set apart certain scientists from competing rival groups. Research grants, which are essential to the survival of a scientific laboratory by providing salary funds for the senior researchers as well as the junior trainees and research staff, are written in a hypothesis-focused manner, outlining experiments that will lead to the acceptance or rejection of selected scientific hypotheses. Well written research grants always consider the possibility that the core hypothesis may be rejected based on the future experimental data. But if the hypothesis has to be rejected then the scientist has to explain the discrepancies between the preferred hypothesis that is now falling in disrepute and all the preliminary data that had led her to formulate the initial hypothesis. Such discrepancies could endanger the renewal of the grant funding and the future of the laboratory. Last but not least, it is very difficult to publish a scholarly paper describing a rejected scientific hypothesis without providing an in-depth mechanistic explanation for why the hypothesis was wrong and proposing alternate hypotheses.
For example, it is quite reasonable for a cell biologist to formulate the hypothesis that protein A improves the survival of neurons by activating pathway X based on prior scientific studies which have shown that protein A is an activator of pathway X in neurons and other studies which prove that pathway X improves cell survival in skin cells. If the data supports the hypothesis, publishing this result is fairly straightforward because it conforms to the general expectations. However, if the data does not support this hypothesis then the scientist has to explain why. Is it because protein A did not activate pathway X in her experiments? Is it because in pathway X functions differently in neurons than in skin cells? Is it because neurons and skin cells have a different threshold for survival? Experimental results that do not conform to the predictions have the potential to uncover exciting new scientific mechanisms but chasing down these alternate explanations requires a lot of time and resources which are becoming increasingly scarce. Therefore, it shouldn’t come as a surprise that some scientists may consciously or subconsciously ignore selected pieces of experimental data which contradict their darling hypotheses.
Let us move from these hypothetical situations to the real world of laboratories. There is surprisingly little data on how and when scientists reject hypotheses, but John Fugelsang and Kevin Dunbar at Dartmouth conducted a rather unique study “Theory and data interactions of the scientific mind: Evidence from the molecular and the cognitive laboratory” in 2004 in which they researched researchers. They sat in at scientific laboratory meetings of three renowned molecular biology laboratories at carefully recorded how scientists presented their laboratory data and how they would handle results which contradicted their predictions based on their hypotheses and models.
In their final analysis, Fugelsang and Dunbar included 417 scientific results that were presented at the meetings of which roughly half (223 out of 417) were not consistent with the predictions. Only 12% of these inconsistencies lead to change of the scientific model (and thus a revision of hypotheses). In the vast majority of the cases, the laboratories decided to follow up the studies by repeating and modifying the experimental protocols, thinking that the fault did not lie with the hypotheses but instead with the manner how the experiment was conducted. In the follow up experiments, 84 of the inconsistent findings could be replicated and this in turn resulted in a gradual modification of the underlying models and hypotheses in the majority of the cases. However, even when the inconsistent results were replicated, only 61% of the models were revised which means that 39% of the cases did not lead to any significant changes.
The study did not provide much information on the long-term fate of the hypotheses and models and we obviously cannot generalize the results of three molecular biology laboratory meetings at one university to the whole scientific enterprise. Also, Fugelsang and Dunbar’s study did not have a large enough sample size to clearly identify the reasons why some scientists were willing to revise their models and others weren’t. Was it because of varying complexity of experiments and models? Was it because of the approach of the individuals who conducted the experiments or the laboratory heads? I wish there were more studies like this because it would help us understand the scientific process better and maybe improve the quality of scientific research if we learned how different scientists handle inconsistent results.
In my own experience, I have also struggled with results which defied my scientific hypotheses. In 2002, we found that stem cells in human fat tissue could help grow new blood vessels. Yes, you could obtain fat from a liposuction performed by a plastic surgeon and inject these fat-derived stem cells into animal models of low blood flow in the legs. Within a week or two, the injected cells helped restore the blood flow to near normal levels! The simplest hypothesis was that the stem cells converted into endothelial cells, the cell type which forms the lining of blood vessels. However, after several months of experiments, I found no consistent evidence of fat-derived stem cells transforming into endothelial cells. We ended up publishing a paper which proposed an alternative explanation that the stem cells were releasing growth factors that helped grow blood vessels. But this explanation was not as satisfying as I had hoped. It did not account for the fact that the stem cells had aligned themselves alongside blood vessel structures and behaved like blood vessel cells.
Even though I “murdered” my darling hypothesis of fat –derived stem cells converting into blood vessel endothelial cells at the time, I did not “bury” the hypothesis. It kept ruminating in the back of my mind until roughly one decade later when we were again studying how stem cells were improving blood vessel growth. The difference was that this time, I had access to a live-imaging confocal laser microscope which allowed us to take images of cells labeled with red and green fluorescent dyes over long periods of time. Below, you can see a video of human bone marrow mesenchymal stem cells (labeled green) and human endothelial cells (labeled red) observed with the microscope overnight. The short movie compresses images obtained throughout the night and shows that the stem cells indeed do not convert into endothelial cells. Instead, they form a scaffold and guide the endothelial cells (red) by allowing them to move alongside the green scaffold and thus construct their network. This work was published in 2013 in the Journal of Molecular and Cellular Cardiology, roughly a decade after I had been forced to give up on the initial hypothesis. Back in 2002, I had assumed that the stem cells were turning into blood vessel endothelial cells because they aligned themselves in blood vessel like structures. I had never considered the possibility that they were scaffold for the endothelial cells.
This and other similar experiences have lead me to reformulate the “murder your darlings” commandment to “murder your darling hypotheses but do not bury them”. Instead of repeatedly trying to defend scientific hypotheses that cannot be supported by emerging experimental data, it is better to give up on them. But this does not mean that we should forget and bury those initial hypotheses. With newer technologies, resources or collaborations, we may find ways to explain inconsistent results years later that were not previously available to us. This is why I regularly peruse my cemetery of dead hypotheses on my hard drive to see if there are ways of perhaps resurrecting them, not in their original form but in a modification that I am now able to test.
Fugelsang, J., Stein, C., Green, A., & Dunbar, K. (2004). Theory and Data Interactions of the Scientific Mind: Evidence From the Molecular and the Cognitive Laboratory. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 58 (2), 86-95 DOI: 10.1037/h0085799
Fareed Zakaria recently wrote an article in the Washington Post lamenting the loss of liberal arts education in the United States. However, instead of making a case for balanced education, which integrates various forms of creativity and critical thinking promoted by STEM (science, technology, engineering and mathematics) and by a liberal arts education, Zakaria misrepresents STEM education as primarily teaching technical skills and also throws in a few cliches about Asians. You can read my response to his article at 3Quarksdaily.
Research institutions in the life sciences engage in two types of regular scientific meet-ups: scientific seminars and lab meetings. The structure of scientific seminars is fairly standard. Speakers give Powerpoint presentations (typically 45 to 55 minutes long) which provide the necessary scientific background, summarize their group’s recent published scientific work and then (hopefully) present newer, unpublished data. Lab meetings are a rather different affair. The purpose of a lab meeting is to share the scientific work-in-progress with one’s peers within a research group and also to update the laboratory heads. Lab meetings are usually less formal than seminars, and all members of a research group are encouraged to critique the presented scientific data and work-in-progress. There is no need to provide much background information because the audience of peers is already well-acquainted with the subject and it is not uncommon to show raw, unprocessed data and images in order to solicit constructive criticism and guidance from lab members and mentors on how to interpret the data. This enables peer review in real-time, so that, hopefully, major errors and flaws can be averted and newer ideas incorporated into the ongoing experiments.
During the past two decades that I have actively participated in biological, psychological and medical research, I have observed very different styles of lab meetings. Some involve brief 5-10 minute updates from each group member; others develop a rotation system in which one lab member has to present the progress of their ongoing work in a seminar-like, polished format with publication-quality images. Some labs have two hour meetings twice a week, other labs meet only every two weeks for an hour. Some groups bring snacks or coffee to lab meetings, others spend a lot of time discussing logistics such as obtaining and sharing biological reagents or establishing timelines for submitting manuscripts and grants. During the first decade of my work as a researcher, I was a trainee and followed the format of whatever group I belonged to. During the past decade, I have been heading my own research group and it has become my responsibility to structure our lab meetings. I do not know which format works best, so I approach lab meetings like our experiments. Developing a good lab meeting structure is a work-in-progress which requires continuous exploration and testing of new approaches. During the current academic year, I decided to try out a new twist: incorporating literature and philosophy into the weekly lab meetings.
My research group studies stem cells and tissue engineering, cellular metabolism in cancer cells and stem cells and the inflammation of blood vessels. Most of our work focuses on identifying molecular and cellular pathways in cells, and we then test our findings in animal models. Over the years, I have noticed that the increasing complexity of the molecular and cellular signaling pathways and the technologies we employ makes it easy to forget the “big picture” of why we are even conducting the experiments. Determining whether protein A is required for phenomenon X and whether protein B is a necessary co-activator which acts in concert with protein A becomes such a central focus of our work that we may not always remember what it is that compels us to study phenomenon X in the first place. Some of our research has direct medical relevance, but at other times we primarily want to unravel the awe-inspiring complexity of cellular processes. But the question of whether our work is establishing a definitive cause-effect relationship or whether we are uncovering yet another mechanism within an intricate web of causes and effects sometimes falls by the wayside. When asked to explain the purpose or goals of our research, we have become so used to directing a laser pointer onto a slide of a cellular model that it becomes challenging to explain the nature of our work without visual aids.
This fall, I introduced a new component into our weekly lab meetings. After our usual round-up of new experimental data and progress, I suggested that each week one lab member should give a brief 15 minute overview about a book they had recently finished or were still reading. The overview was meant to be a “teaser” without spoilers, explaining why they had started reading the book, what they liked about it, and whether they would recommend it to others. One major condition was to speak about the book without any Powerpoint slides! But there weren’t any major restrictions when it came to the book; it could be fiction or non-fiction and published in any language of the world (but ideally also available in an English translation). If lab members were interested and wanted to talk more about the book, then we would continue to discuss it, otherwise we would disband and return to our usual work. If nobody in my lab wanted to talk about a book then I would give an impromptu mini-talk (without Powerpoint) about a topic relating to the philosophy or culture of science. I use the term “culture of science” broadly to encompass topics such as the peer review process and post-publication peer review, the question of reproducibility of scientific findings, retractions of scientific papers, science communication and science policy – topics which have not been traditionally considered philosophy of science issues but still relate to the process of scientific discovery and the dissemination of scientific findings.
One member of our group introduced us to “For Whom the Bell Tolls” by Ernest Hemingway. He had also recently lived in Spain as a postdoctoral research fellow and shared some of his own personal experiences about how his Spanish friends and colleagues talked about the Spanish Civil War. At another lab meeting, we heard about “Sycamore Row” by John Grisham and the ensuring discussion revolved around race relations in Mississippi. I spoke about “A Tale for a Time Being” by Ruth Ozeki and the difficulties that the book’s protagonist faced as an outsider when her family returned to Japan after living in Silicon Valley. I think that the book which got nearly everyone in the group talking was “Far From the Tree: Parents, Children and the Search for Identity” by Andrew Solomon. The book describes how families grapple with profound physical or cognitive differences between parents and children. The PhD student who discussed the book focused on the “Deafness” chapter of this nearly 1000-page tome but she also placed it in the broader context of parenting, love and the stigma of disability. We stayed in the conference room long after the planned 15 minutes, talking about being “disabled” or being “differently abled” and the challenges that parents and children face.
On the weeks where nobody had a book they wanted to present, we used the time to touch on the cultural and philosophical aspects of science such as Thomas Kuhn’s concept of paradigm shifts in “The Structure of Scientific Revolutions“, Karl Popper’s principles of falsifiability of scientific statements, the challenge of reproducibility of scientific results in stem cell biology and cancer research, or the emergence of Pubpeer as a post-publication peer review website. Some of the lab members had heard of Thomas Kuhn’s or Karl Popper’s ideas before, but by coupling it to a lab meeting, we were able to illustrate these ideas using our own work. A lot of 20th century philosophy of science arose from ideas rooted in physics. When undergraduate or graduate students take courses on philosophy of science, it isn’t always easy for them to apply these abstract principles to their own lab work, especially if they pursue a research career in the life sciences. Thomas Kuhn saw Newtonian and Einsteinian theories as distinct paradigms, but what constitutes a paradigm shift in stem cell biology? Is the ability to generate induced pluripotent stem cells from mature adult cells a paradigm shift or “just” a technological advance?
It is difficult for me to know whether the members of my research group enjoy or benefit from these humanities blurbs at the end of our lab meetings. Perhaps they are just tolerating them as eccentricities of the management and maybe they will tire of them. I personally find these sessions valuable because I believe they help ground us in reality. They remind us that it is important to think and read outside of the box. As scientists, we all read numerous scientific articles every week just to stay up-to-date in our area(s) of expertise, but that does not exempt us from also thinking and reading about important issues facing society and the world we live in. I do not know whether discussing literature and philosophy makes us better scientists but I hope that it makes us better people.
Here is a graphic showing the usage of the words “scientists”, “researchers”, “soldiers” in English-language books published in 1900-2008. The graphic was generated using the Google N-gram Viewer which scours all digitized books in the Google database for selected words and assesses the relative word usage frequencies.
It is depressing that soldiers are mentioned more frequently than scientists or researchers (even when the word frequencies of “scientists” and “researchers” are combined) in English-language books even though the numbers of researchers in the countries which produce most English-language books are comparable or higher than the number of soldiers.
Here are the numbers of researchers (data from the 2010 UNESCO Science report, numbers are reported for the year 2007, PDF) in selected English-language countries and the corresponding numbers of armed forces personnel (data from the World Bank, numbers reported for 2012):
United States: 1.4 million researchers vs. 1.5 million armed forces personnel
United Kingdom: 255,000 researchers vs. 169,000 armed forces personnel
Canada: 139,000 researchers vs. 66,000 armed forces personnel
I find it disturbing that our books – arguably one of our main cultural legacies – give a disproportionately greater space to discussing or describing the military than to our scientific and scholarly endeavors. But I am even more worried about the recent trends. The N-gram Viewer evaluates word usage up until 2008, and “soldiers” has been steadily increasing since the 1990s. The usage of “scientists” and “researchers” has reached a plateau and is now decreasing. I do not want to over-interpret the importance of relative word frequencies as indicators of society’s priorities, but the last two surges of “soldiers” usage occurred during the two World Wars and in 2008, “soldiers” was used as frequently as during the first years of World War II.
It is mind-boggling for us scientists that we have to struggle to get funding for research which has the potential to transform society by providing important new insights into the nature of our universe, life on this planet, our environment and health, whereas the military receives substantially higher amounts of government funding (at least in the USA) for its destructive goals. Perhaps one reason for this discrepancy is that voters hear, see and read much more about wars and soldiers than about science and research. Depictions of heroic soldiers fighting evil make it much easier for voters to go along with allocation of resources to the military. Most of my non-scientist friends can easily name books or movies about soldiers, but they would have a hard time coming up with books and movies about science and scientists. My take-home message from the N-gram Viewer results is that scientists have an obligation to reach out to the public and communicate the importance of science in an understandable manner if they want to avoid the marginalization of science.
We often laud intellectual diversity of a scientific research group because we hope that the multitude of opinions can help point out flaws and improve the quality of research long before it is finalized and written up as a manuscript. The recent events surrounding the research in one of the world’s most famous stem cell research laboratories at Harvard shows us the disastrous effects of suppressing diverse and dissenting opinions.
The infamous “Orlic paper” was a landmark research article published in the prestigious scientific journal Nature in 2001, which showed that stem cells contained in the bone marrow could be converted into functional heart cells. After a heart attack, injections of bone marrow cells reversed much of the heart attack damage by creating new heart cells and restoring heart function. It was called the “Orlic paper” because the first author of the paper was Donald Orlic, but the lead investigator of the study was Piero Anversa, a professor and highly respected scientist at New York Medical College.
Anversa had established himself as one of the world’s leading experts on the survival and death of heart muscle cells in the 1980s and 1990s, but with the start of the new millennium, Anversa shifted his laboratory’s focus towards the emerging field of stem cell biology and its role in cardiovascular regeneration. The Orlic paper was just one of several highly influential stem cell papers to come out of Anversa’s lab at the onset of the new millenium. A 2002 Anversa paper in the New England Journal of Medicine – the world’s most highly cited academic journal –investigated the hearts of human organ transplant recipients. This study showed that up to 10% of the cells in the transplanted heart were derived from the recipient’s own body. The only conceivable explanation was that after a patient received another person’s heart, the recipient’s own cells began maintaining the health of the transplanted organ. The Orlic paper had shown the regenerative power of bone marrow cells in mouse hearts, but this new paper now offered the more tantalizing suggestion that even human hearts could be regenerated by circulating stem cells in their blood stream.
A 2003 publication in Cell by the Anversa group described another ground-breaking discovery, identifying a reservoir of stem cells contained within the heart itself. This latest coup de force found that the newly uncovered heart stem cell population resembled the bone marrow stem cells because both groups of cells bore the same stem cell protein called c-kit and both were able to make new heart muscle cells. According to Anversa, c-kit cells extracted from a heart could be re-injected back into a heart after a heart attack and regenerate more than half of the damaged heart!
These Anversa papers revolutionized cardiovascular research. Prior to 2001, most cardiovascular researchers believed that the cell turnover in the adult mammalian heart was minimal because soon after birth, heart cells stopped dividing. Some organs or tissues such as the skin contained stem cells which could divide and continuously give rise to new cells as needed. When skin is scraped during a fall from a bike, it only takes a few days for new skin cells to coat the area of injury and heal the wound. Unfortunately, the heart was not one of those self-regenerating organs. The number of heart cells was thought to be more or less fixed in adults. If heart cells were damaged by a heart attack, then the affected area was replaced by rigid scar tissue, not new heart muscle cells. If the area of damage was large, then the heart’s pump function was severely compromised and patients developed the chronic and ultimately fatal disease known as “heart failure”.
Anversa’s work challenged this dogma by putting forward a bold new theory: the adult heart was highly regenerative, its regeneration was driven by c-kit stem cells, which could be isolated and used to treat injured hearts. All one had to do was harness the regenerative potential of c-kit cells in the bone marrow and the heart, and millions of patients all over the world suffering from heart failure might be cured. Not only did Anversa publish a slew of supportive papers in highly prestigious scientific journals to challenge the dogma of the quiescent heart, he also happened to publish them at a unique time in history which maximized their impact.
In the year 2001, there were few innovative treatments available to treat patients with heart failure. The standard approach was to use medications that would delay the progression of heart failure. But even the best medications could not prevent the gradual decline of heart function. Organ transplants were a cure, but transplantable hearts were rare and only a small fraction of heart failure patients would be fortunate enough to receive a new heart. Hopes for a definitive heart failure cure were buoyed when researchers isolated human embryonic stem cells in 1998. This discovery paved the way for using highly pliable embryonic stem cells to create new heart muscle cells, which might one day be used to restore the heart’s pump function without resorting to a heart transplant.
The dreams of using embryonic stem cells to regenerate human hearts were soon squashed when the Bush administration banned the generation of new human embryonic stem cells in 2001, citing ethical concerns. These federal regulations and the lobbying of religious and political groups against human embryonic stem cells were a major blow to research on cardiovascular regeneration. Amidst this looming hiatus in cardiovascular regeneration, Anversa’s papers appeared and showed that one could steer clear of the ethical controversies surrounding embryonic stem cells by using an adult patient’s own stem cells. The Anversa group re-energized the field of cardiovascular stem cell research and cleared the path for the first human stem cell treatments in heart disease.
Instead of having to wait for the US government to reverse its restrictive policy on human embryonic stem cells, one could now initiate clinical trials with adult stem cells, treating heart attack patients with their own cells and without having to worry about an ethical quagmire. Heart failure might soon become a disease of the past. The excitement at all major national and international cardiovascular conferences was palpable whenever the Anversa group, their collaborators or other scientists working on bone marrow and cardiac stem cells presented their dizzyingly successful results. Anversa received numerous accolades for his discoveries and research grants from the NIH (National Institutes of Health) to further develop his research program. He was so successful that some researchers believed Anversa might receive the Nobel Prize for his iconoclastic work which had redefined the regenerative potential of the heart. Many of the world’s top universities were vying to recruit Anversa and his group, and he decided to relocate his research group to Harvard Medical School and Brigham and Women’s Hospital 2008.
There were naysayers and skeptics who had resisted the adult stem cell euphoria. Some researchers had spent decades studying the heart and found little to no evidence for regeneration in the adult heart. They were having difficulties reconciling their own results with those of the Anversa group. A number of practicing cardiologists who treated heart failure patients were also skeptical because they did not see the near-miraculous regenerative power of the heart in their patients. One Anversa paper went as far as suggesting that the whole heart would completely regenerate itself roughly every 8-9 years, a claim that was at odds with the clinical experience of practicing cardiologists. Other researchers pointed out serious flaws in the Anversa papers. For example, the 2002 paper on stem cells in human heart transplant patients claimed that the hearts were coated with the recipient’s regenerative cells, including cells which contained the stem cell marker Sca-1. Within days of the paper’s publication, many researchers were puzzled by this finding because Sca-1 was a marker of mouse and rat cells – not human cells! If Anversa’s group was finding rat or mouse proteins in human hearts, it was most likely due to an artifact. And if they had mistakenly found rodent cells in human hearts, so these critics surmised, perhaps other aspects of Anversa’s research were similarly flawed or riddled with artifacts.
At national and international meetings, one could observe heated debates between members of the Anversa camp and their critics. The critics then decided to change their tactics. Instead of just debating Anversa and commenting about errors in the Anversa papers, they invested substantial funds and efforts to replicate Anversa’s findings. One of the most important and rigorous attempts to assess the validity of the Orlic paper was published in 2004, by the research teams of Chuck Murry and Loren Field. Murry and Field found no evidence of bone marrow cells converting into heart muscle cells. This was a major scientific blow to the burgeoning adult stem cell movement, but even this paper could not deter the bone marrow cell champions.
The skeptics who had doubted Anversa’s claims all along may now feel vindicated, but this is not the time to gloat. Instead, the discipline of cardiovascular stem cell biology is now undergoing a process of soul-searching. How was it possible that some of the most widely read and cited papers were based on heavily flawed observations and assumptions? Why did it take more than a decade since the first refutation was published in 2004 for scientists to finally accept that the near-magical regenerative power of the heart turned out to be a pipe dream.
One reason for this lag time is pretty straightforward: It takes a tremendous amount of time to refute papers. Funding to conduct the experiments is difficult to obtain because grant funding agencies are not easily convinced to invest in studies replicating existing research. For a refutation to be accepted by the scientific community, it has to be at least as rigorous as the original, but in practice, refutations are subject to even greater scrutiny. Scientists trying to disprove another group’s claim may be asked to develop even better research tools and technologies so that their results can be seen as more definitive than those of the original group. Instead of relying on antibodies to identify c-kit cells, the 2014 refutation developed a transgenic mouse in which all c-kit cells could be genetically traced to yield more definitive results – but developing new models and tools can take years.
The scientific peer review process by external researchers is a central pillar of the quality control process in modern scientific research, but one has to be cognizant of its limitations. Peer review of a scientific manuscript is routinely performed by experts for all the major academic journals which publish original scientific results. However, peer review only involves a “review”, i.e. a general evaluation of major strengths and flaws, and peer reviewers do not see the original raw data nor are they provided with the resources to replicate the studies and confirm the veracity of the submitted results. Peer reviewers rely on the honor system, assuming that the scientists are submitting accurate representations of their data and that the data has been thoroughly scrutinized and critiqued by all the involved researchers before it is even submitted to a journal for publication. If peer reviewers were asked to actually wade through all the original data generated by the scientists and even perform confirmatory studies, then the peer review of every single manuscript could take years and one would have to find the money to pay for the replication or confirmation experiments conducted by peer reviewers. Publication of experiments would come to a grinding halt because thousands of manuscripts would be stuck in the purgatory of peer review. Relying on the integrity of the scientists submitting the data and their internal review processes may seem naïve, but it has always been the bedrock of scientific peer review. And it is precisely the internal review process which may have gone awry in the Anversa group.
Just like Pygmalion fell in love with Galatea, researchers fall in love with the hypotheses and theories that they have constructed. To minimize the effects of these personal biases, scientists regularly present their results to colleagues within their own groups at internal lab meetings and seminars or at external institutions and conferences long before they submit their data to a peer-reviewed journal. The preliminary presentations are intended to spark discussions, inviting the audience to challenge the veracity of the hypotheses and the data while the work is still in progress. Sometimes fellow group members are truly skeptical of the results, at other times they take on the devil’s advocate role to see if they can find holes in their group’s own research. The larger a group, the greater the chance that one will find colleagues within a group with dissenting views. This type of feedback is a necessary internal review process which provides valuable insights that can steer the direction of the research.
Considering the size of the Anversa group – consisting of 20, 30 or even more PhD students, postdoctoral fellows and senior scientists – it is puzzling why the discussions among the group members did not already internally challenge their hypotheses and findings, especially in light of the fact that they knew extramural scientists were having difficulties replicating the work.
“I think that most scientists, perhaps with the exception of the most lucky or most dishonest, have personal experience with failure in science—experiments that are unreproducible, hypotheses that are fundamentally incorrect. Generally, we sigh, we alter hypotheses, we develop new methods, we move on. It is the data that should guide the science.
In the Anversa group, a model with much less intellectual flexibility was applied. The “Hypothesis” was that c-kit (cd117) positive cells in the heart (or bone marrow if you read their earlier studies) were cardiac progenitors that could: 1) repair a scarred heart post-myocardial infarction, and: 2) supply the cells necessary for cardiomyocyte turnover in the normal heart.
This central theme was that which supplied the lab with upwards of $50 million worth of public funding over a decade, a number which would be much higher if one considers collaborating labs that worked on related subjects.
In theory, this hypothesis would be elegant in its simplicity and amenable to testing in current model systems. In practice, all data that did not point to the “truth” of the hypothesis were considered wrong, and experiments which would definitively show if this hypothesis was incorrect were never performed (lineage tracing e.g.).”
Discarding data that might have challenged the central hypothesis appears to have been a central principle.
According to the whistleblower, Anversa’s group did not just discard undesirable data, they actually punished group members who would question the group’s hypotheses:
“In essence, to Dr. Anversa all investigators who questioned the hypothesis were “morons,” a word he used frequently at lab meetings. For one within the group to dare question the central hypothesis, or the methods used to support it, was a quick ticket to dismissal from your position.“
The group also created an environment of strict information hierarchy and secrecy which is antithetical to the spirit of science:
“The day to day operation of the lab was conducted under a severe information embargo. The lab had Piero Anversa at the head with group leaders Annarosa Leri, Jan Kajstura and Marcello Rota immediately supervising experimentation. Below that was a group of around 25 instructors, research fellows, graduate students and technicians. Information flowed one way, which was up, and conversation between working groups was generally discouraged and often forbidden.
Raw data left one’s hands, went to the immediate superior (one of the three named above) and the next time it was seen would be in a manuscript or grant. What happened to that data in the intervening period is unclear.
A side effect of this information embargo was the limitation of the average worker to determine what was really going on in a research project. It would also effectively limit the ability of an average worker to make allegations regarding specific data/experiments, a requirement for a formal investigation.“
This segregation of information is a powerful method to maintain an authoritarian rule and is more typical for terrorist cells or intelligence agencies than for a scientific lab, but it would definitely explain how the Anversa group was able to mass produce numerous irreproducible papers without any major dissent from within the group.
In addition to the secrecy and segregation of information, the group also created an atmosphere of fear to ensure obedience:
“Although individually-tailored stated and unstated threats were present for lab members, the plight of many of us who were international fellows was especially harrowing. Many were technically and educationally underqualified compared to what might be considered average research fellows in the United States. Many also originated in Italy where Dr. Anversa continues to wield considerable influence over biomedical research.
This combination of being undesirable to many other labs should they leave their position due to lack of experience/training, dependent upon employment for U.S. visa status, and under constant threat of career suicide in your home country should you leave, was enough to make many people play along.
Even so, I witnessed several people question the findings during their time in the lab. These people and working groups were subsequently fired or resigned. I would like to note that this lab is not unique in this type of exploitative practice, but that does not make it ethically sound and certainly does not create an environment for creative, collaborative, or honest science.”
Foreign researchers are particularly dependent on their employment to maintain their visa status and the prospect of being fired from one’s job can be terrifying for anyone.
This is an anonymous account of a whistleblower and as such, it is problematic. The use of anonymous sources in science journalism could open the doors for all sorts of unfounded and malicious accusations, which is why the ethics of using anonymous sources was heavily debated at the recent ScienceOnline conference. But the claims of the whistleblower are not made in a vacuum – they have to be evaluated in the context of known facts. The whistleblower’s claim that the Anversa group and their collaborators received more than $50 million to study bone marrow cell and c-kit cell regeneration of the heart can be easily verified at the public NIH grant funding RePORTer website. The whistleblower’s claim that many of the Anversa group’s findings could not be replicated is also a verifiable fact. It may seem unfair to condemn Anversa and his group for creating an atmosphere of secrecy and obedience which undermined the scientific enterprise, caused torment among trainees and wasted millions of dollars of tax payer money simply based on one whistleblower’s account. However, if one looks at the entire picture of the amazing rise and decline of the Anversa group’s foray into cardiac regeneration, then the whistleblower’s description of the atmosphere of secrecy and hierarchy seems very plausible.
The investigation of Harvard into the Anversa group is not open to the public and therefore it is difficult to know whether the university is primarily investigating scientific errors or whether it is also looking into such claims of egregious scientific misconduct and abuse of scientific trainees. It is unlikely that Anversa’s group is the only group that might have engaged in such forms of misconduct. Threatening dissenting junior researchers with a loss of employment or visa status may be far more common than we think. The gravity of the problem requires that the NIH – the major funding agency for biomedical research in the US – should look into the prevalence of such practices in research labs and develop safeguards to prevent the abuse of science and scientists.
Neutrality is prized by scientists and journalists. Scientists are supposed to report and analyze their scientific research in a neutral fashion. Similarly, journalistic professionalism requires a neutral and objective stance when reporting or analyzing news. Nevertheless, scientists and journalists are also aware of the fact that there is no perfect neutrality. We are all victims of our conscious and unconscious biases and how we report data or events is colored by our biases. Not only is it impossible to be truly “neutral”, but one can even question whether “neutrality” should be a universal mandate. Neutrality can make us passive, especially when we see a clear ethical mandate to take action. Should one report in a neutral manner about genocide instead of becoming an advocate for the victims? Should a scientist who observes a destruction of ecosystems report on this in a neutral manner? Is it acceptable or perhaps even required for such a scientist to abandon neutrality and becoming an advocate to protect the ecosystems?
Science bloggers or science journalists have to struggle to find the right balance between neutrality and advocacy. Political bloggers and journalists who are enthusiastic supporters of a political party will find it difficult to preserve neutrality in their writing, but their target audiences may not necessarily expect them to remain neutral. I am often fascinated and excited by scientific discoveries and concepts that I want to write about, but I also notice how my enthusiasm for science compromises my neutrality. Should science bloggers strive for neutrality and avoid advocacy? Or is it understood that their audiences do not expect neutrality?
One way to increase objectivity and neutrality in science writing is to provide balanced views. When discussing a scientific discovery or concept, one can also cite or reference scientists with opposing views. This underscores that scientific opinion is not a monolith and that most scientific findings can and should be challenged. However, the mandate to provide balance can also lead to “false balance” when two opposing opinions are presented as two equivalent perspectives, even though one of the two sides has little to no scientific evidence to back up its claims. More than 99% of all climatologists agree about the importance of anthropogenic global warming, therefore it would be “false balance” to give equal space to opposing fringe views. Most science bloggers would also avoid “false balance” when it comes to reporting about the scientific value of homeopathy since nearly every scientist in the world agrees that homeopathy has no scientific data to back it up.
But how should science bloggers decide what constitutes “necessary balance” versus “false balance” when writing about areas of research where the scientific evidence is more ambivalent. How about a scientific discovery which 80% of scientists think is a landmark finding and 20% of scientists believe is a fluke? How does one find out about the scientific rigor of the various viewpoints and how should a blog post reflect these differences in opinion? Press releases of universities or research institutions usually only cite the researchers that conducted a scientific study, but how does one find out about other scientists who disagree with the significance of the new study?
3. Anonymous Sources
Most scientific peer review is conducted with anonymous sources. The editors of peer reviewed scientific journals send out newly submitted manuscripts to expert reviewers in the field but they try to make sure that the names of the reviewers remain confidential. This helps ensure that the reviewers can comment freely about any potential flaws in the manuscript without having to fear retaliation from the authors who might be incensed about the critique. Even in the post-publication phase, anonymous commenters can leave critical comments about a published study at the post-publication peer review website PubPeer. The comments made by anonymous as well as identified commenters at PubPeer played an important role in raising questions about recent controversial stem cell papers. On the other hand, anonymous sources may also use their cover to make baseless accusations and malign researchers. In the case of journals, the responsibility lies with the editors to ensure that their anonymous reviewers are indeed behaving in a professional manner and not abusing their anonymity.
Investigative political journalists also often rely on anonymous sources and whistle-blowers to receive critical information that would have otherwise been impossible to obtain. Journalists are also trained to ensure that their anonymous sources are credible and that they are not abusing their anonymity.
Should science bloggers and science journalists also consider using anonymous sources? Would unnamed scientists provide a more thorough critical appraisal of the quality of scientific research or would this open the door to abuse?
I hope that you leave comments on this post, tweet your thoughts using the #scioStandards hashtag and discuss your views at the Science Online conference.
I will be facilitating the discussion at this session, which will take place at noon on Saturday, March 1, just before the final session of the conference. The title of the session is rather vague, and the purpose of the session is for attendees to exchange their views on whether we can agree on certain scientific and journalistic standards for science blogging.
Individual science bloggers have very different professional backgrounds and they also write for a rather diverse audience. Some bloggers are part of larger networks, others host a blog on their own personal website. Some are paid, others write for free. Most bloggers have developed their own personal styles for how they write about scientific studies, the process of scientific discovery, science policy and the lives of people involved in science. Considering the heterogeneity in the science blogging community, is it even feasible to identify “standards” for scientific blogging? Are there some core scientific and journalistic standards that most science bloggers can agree on? Would such “standards” merely serve as informal guidelines or should they be used as measures to assess the quality of science blogging?
These are the kinds of questions that we will try to discuss at the session. I hope that we will have a lively discussion, share our respective viewpoints and see what we can learn from each other. To gauge the interest levels of the attendees, I am going to pitch a few potential discussion topics on this blog and use your feedback to facilitate the discussion. I would welcome all of your responses and comments, independent of whether you intend to attend the conference or the session. I will also post these questions in the Science Online discussion forum.
One of the challenges we face when we blog about specific scientific studies is determining how much background reading is necessary to write a reasonably accurate blog post. Most science bloggers probably read the original research paper they intend to write about, but even this can be challenging at times. Scientific papers aren’t very long. Journals usually restrict the word count of original research papers to somewhere between 2,000 words to 8,000 words (depending on each scientific journal’s policy and whether the study is a published as a short communication or a full-length article). However, original research papers are also accompanied four to eight multi-paneled figures with extensive legends.
Nowadays, research papers frequently include additional figures, data-sets and detailed descriptions of scientific methods that are published online and not subject to the word count limit. A 2,000 word short communication with two data figures in the main manuscript may therefore be accompanied by eight “supplemental” online-only figures and an additional 2,000 words of text describing the methods in detail. A single manuscript usually summarizes the results of multiple years of experimental work, which is why this condensed end-product is quite dense. It can take hours to properly study the published research study and understand the intricate details.
Is it enough to merely read the original research paper in order to blog about it? Scientific papers include a brief introduction section, but these tend to be written for colleagues who are well-acquainted with the background and significance of the research. However, unless one happens to blog about a paper that is directly related to one’s own work, most of us probably need additional background reading to fully understand the significance of a newly published study.
An expert on liver stem cells, for example, who wants blog about the significance of a new paper on lung stem cells will probably need substantial amount of additional background reading. One may have to read at least one or two older research papers by the authors or their scientific colleagues / competitors to grasp what makes the new study so unique. It may also be helpful to read at least one review paper (e.g. a review article summarizing recent lung stem cell discoveries) to understand the “big picture”. Some research papers are accompanied by scientific editorials which can provide important insights into the strengths and limitations of the paper in question.
All of this reading adds up. If it takes a few hours to understand the main paper that one intends to blog about, and an additional 2-3 hours to read other papers or editorials, a science blogger may end up having to invest 4-5 hours of reading before one has even begun to write the intended blog post.
What strategies have science bloggers developed to manage their time efficiently and make sure they can meet (external or self-imposed) deadlines but still complete the necessary background reading?
Should bloggers provide references and links to the additional papers they consulted?
Should bloggers try to focus on a narrow area of expertise so that over time they develop enough of a background in this niche area so that they do not need so much background reading?
Are there major differences in the expectations of how much background reading is necessary? For example, does an area such as stem cell research or nanotechnology require far more background reading because every day numerous new papers are published and it is so difficult to keep up with the pace of the research?
Is it acceptable to take short-cuts? Could one just read the paper that one wants to blog about and forget about additional background reading, hoping that the background provided in the paper is sufficient and balanced?
Can one avoid reading the supplementary figures or texts of a paper and just stick to the main text of a paper, relying on the fact that the peer reviewers of the published paper would have caught any irregularities in the supplementary data?
Is it possible to primarily rely on a press release or an interview with the researchers of the paper and just skim the results of the paper instead of spending a few hours trying to read the original paper?
Or do such short-cuts compromise the scientific and journalistic quality of science blogs?
Would a discussion about expectations, standards and strategies to manage background reading be helpful for participants of the session?
Since Shinya Yamanaka’s landmark discovery that adult skin cells could be reprogrammed into embryonic-like induced pluripotent stem cells (iPSCs) by introducing selected embryonic genes into adult cells, laboratories all over the world have been using modifications of the “Yamanaka method” to create their own stem cell lines. The original Yamanaka method published in 2006 used a virus which integrated into the genome of the adult cell to introduce the necessary genes. Any introduction of genetic material into a cell carries the risk of causing genetic aberrancies that could lead to complications, especially if the newly generated stem cells are intended for therapeutic usage in patients.
Researchers have therefore tried to modify the “Yamanaka method” and reduce the risk of genetic aberrations by either using genetic tools to remove the introduced genes once the cells are fully reprogrammed to a stem cell state, introducing genes without non-integrating viruses or by using complex cocktails of chemicals and growth factors in order to generate stem cells without the introduction of any genes into the adult cells.
The papers by Obokata and colleagues at the RIKEN center in Kobe, Japan use a far more simple method to reprogram adult cells. Instead of introducing foreign genes, they suggest that one can expose adult mouse cells to a severe stress such as an acidic solution. The cells which survive acid-dipping adventure (25 minutes in a solution with pH 5.7) activate their endogenous dormant embryonic genes by an unknown mechanism. The researchers then show that these activated cells take on properties of embryonic stem cells or iPSCs if they are maintained in a stem cell culture medium and treated with the necessary growth factors. Once the cells reach the stem cell state, they can then be converted into cells of any desired tissue, both in a culture dish as well as in a developing mouse embryo. Many of the experiments in the papers were performed by starting out with adult mouse lymphocytes, but the researchers also found that mouse skin fibroblasts and other cells could also be successfully converted into an embryonic-like state using the acid stress.
My first reaction was incredulity. How could such a simple and yet noxious stress such as exposing cells to acid be sufficient to initiate a complex “stemness” program? Research labs have spent years fine-tuning the introduction of the embryonic genes, trying to figure out the optimal combination of genes and timing of when the genes are essential during the reprogramming process. These two papers propose that the whole business of introducing stem cell genes into adult cells was unnecessary – All You Need Is Acid.
This sounds too good to be true. The recent history in stem cell research has taught us that we need to be skeptical. Some of the most widely cited stem cell papers cannot be replicated. This problem is not unique to stem cell research, because other biomedical research areas such as cancer biology are also struggling with issues of replicability, but the high scientific impact of burgeoning stem cell research has forced its replicability issues into the limelight. Nowadays, whenever stem cell researchers hear about a ground-breaking new stem cell discovery, they often tend to respond with some degree of skepticism until multiple independent laboratories can confirm the results.
My second reaction was that I really liked the idea. Maybe we had never tried something as straightforward as an acid stress because we were too narrow-minded, always looking for complex ways to create stem cells instead of trying simple approaches. The stress-induction of stem cell behavior may also represent a regenerative mechanism that has been conserved by evolution. When our amphibian cousins regenerate limbs following an injury, adult tissue cells are also reprogrammed to a premature state by the stress of the injury before they start building a new limb.
The idea of stress-induced reprogramming of adult cells to an embryonic-like state also has a powerful poetic appeal, which inspired me to write the following haiku:
Just because the idea of acid-induced reprogramming is so attractive does not mean that it is scientifically accurate or replicable.
A number of concerns about potential scientific misconduct in the context of the two papers have been raised and it appears that the RIKEN center is investigating these concerns. Specifically, anonymous bloggers have pointed out irregularities in the figures of the papers and that some of the images may be duplicated. We will have to wait for the results of the investigation, but even if image errors or duplications are found, this does not necessarily mean that this was intentional misconduct or fraud. Assembling manuscripts with so many images is no easy task and unintentional errors do occur. These errors are probably far more common than we think. High profile papers undergo much more scrutiny than the average peer-reviewed paper, and this is probably why we tend to uncover them more readily in such papers. For example, image duplication errors were discovered in the 2013 Cell paper on human cloning, but many researchers agreed that the errors in the 2013 Cell paper were likely due to sloppiness during the assembly of the submitted manuscript and did not constitute intentional fraud.
Irrespective of the investigation into the irregularities of figures in the two Nature papers, the key question that stem cell researchers have to now address is whether the core findings of the Obokata papers are replicable. Can adult cells – lymphocytes, skin fibroblasts or other cells – be converted into embryonic-like stem cells by an acid stress? If yes, then this will make stem cell generation far easier and it will open up a whole new field of inquiry, leading to many new exciting questions. Do human cells also respond to acid stress in the same manner as the mouse cells? How does acid stress reprogram the adult cells? Is there an acid-stress signal that directly acts on stem cell transcription factors or does the stress merely activate global epigenetic switches? Are other stressors equally effective? Does this kind of reprogramming occur in our bodies in response to an injury such as low oxygen or inflammation because these kinds of injuries can transiently create an acidic environment in our tissues?
Researchers all around the world are currently attempting to test the effect of acid exposure on the activation of stem cell genes. Paul Knoepfler’s stem cell blog is currently soliciting input from researchers trying to replicate the work. Paul makes it very clear that this is an informal exchange of ideas so that researchers can learn from each other on a “real-time” basis. It is an opportunity to find out about how colleagues are progressing without having to wait for 6-12 months for the next big stem cell meeting or the publication of a paper confirming or denying the replication of acid-induced reprogramming. Posting one’s summary of results on a blog is not as rigorous as publishing a peer-reviewed paper with all the necessary methodological details, but it can at least provide some clues as to whether some or all of the results in the controversial Obokata papers can be replicated.
If the preliminary findings of multiple labs posted on the blog indicate that lymphocytes or skin cells begin to activate their stem cell gene signature after acid stress, then we at least know that this is a project which merits further investigation and researchers will be more willing to invest valuable time and resources to conduct additional replication experiments. On the other hand, if nearly all the researchers post negative results on the blog, then it is probably not a good investment of resources to spend the next year or so trying to replicate the results.
It does not hurt to have one’s paradigms or ideas challenged by new scientific papers as long as we realize that paradigm-challenging papers need to be replicated. The Nature papers must have undergone rigorous peer review before their publication, but scientific peer review does not involve checking replicability of the results. Peer reviewers focus on assessing the internal logic, experimental design, novelty, significance and validity of the conclusions based on the presented data. The crucial step of replicability testing occurs in the post-publication phase. The post-publication exchange of results on scientific blogs by independent research labs is an opportunity to crowd-source replicability testing and thus accelerate the scientific authentication process. Irrespective of whether or not the attempts to replicate acid-induced reprogramming succeed, the willingness of the stem cell community to engage in a dialogue using scientific blogs and evaluate replicability is an important step forward.
Obokata H, Wakayama T, Sasai Y, Kojima K, Vacanti MP, Niwa H, Yamato M, & Vacanti CA (2014). Stimulus-triggered fate conversion of somatic cells into pluripotency. Nature, 505 (7485), 641-7 PMID: 24476887