The Road to Bad Science Is Paved with Obedience and Secrecy

We often laud intellectual diversity of a scientific research group because we hope that the multitude of opinions can help point out flaws and improve the quality of research long before it is finalized and written up as a manuscript. The recent events surrounding the research in one of the world’s most famous stem cell research laboratories at Harvard shows us the disastrous effects of suppressing diverse and dissenting opinions.

Cultured cells via Shutterstock
Cultured cells via Shutterstock

The infamous “Orlic paper” was a landmark research article published in the prestigious scientific journal Nature in 2001, which showed that stem cells contained in the bone marrow could be converted into functional heart cells. After a heart attack, injections of bone marrow cells reversed much of the heart attack damage by creating new heart cells and restoring heart function. It was called the “Orlic paper” because the first author of the paper was Donald Orlic, but the lead investigator of the study was Piero Anversa, a professor and highly respected scientist at New York Medical College.

Anversa had established himself as one of the world’s leading experts on the survival and death of heart muscle cells in the 1980s and 1990s, but with the start of the new millennium, Anversa shifted his laboratory’s focus towards the emerging field of stem cell biology and its role in cardiovascular regeneration. The Orlic paper was just one of several highly influential stem cell papers to come out of Anversa’s lab at the onset of the new millenium. A 2002 Anversa paper in the New England Journal of Medicine – the world’s most highly cited academic journal –investigated the hearts of human organ transplant recipients. This study showed that up to 10% of the cells in the transplanted heart were derived from the recipient’s own body. The only conceivable explanation was that after a patient received another person’s heart, the recipient’s own cells began maintaining the health of the transplanted organ. The Orlic paper had shown the regenerative power of bone marrow cells in mouse hearts, but this new paper now offered the more tantalizing suggestion that even human hearts could be regenerated by circulating stem cells in their blood stream.

Woman having a heart attack via Shutterstock
Woman having a heart attack via Shutterstock

2003 publication in Cell by the Anversa group described another ground-breaking discovery, identifying a reservoir of stem cells contained within the heart itself. This latest coup de force found that the newly uncovered heart stem cell population resembled the bone marrow stem cells because both groups of cells bore the same stem cell protein called c-kit and both were able to make new heart muscle cells. According to Anversa, c-kit cells extracted from a heart could be re-injected back into a heart after a heart attack and regenerate more than half of the damaged heart!

These Anversa papers revolutionized cardiovascular research. Prior to 2001, most cardiovascular researchers believed that the cell turnover in the adult mammalian heart was minimal because soon after birth, heart cells stopped dividing. Some organs or tissues such as the skin contained stem cells which could divide and continuously give rise to new cells as needed. When skin is scraped during a fall from a bike, it only takes a few days for new skin cells to coat the area of injury and heal the wound. Unfortunately, the heart was not one of those self-regenerating organs. The number of heart cells was thought to be more or less fixed in adults. If heart cells were damaged by a heart attack, then the affected area was replaced by rigid scar tissue, not new heart muscle cells. If the area of damage was large, then the heart’s pump function was severely compromised and patients developed the chronic and ultimately fatal disease known as “heart failure”.

Anversa’s work challenged this dogma by putting forward a bold new theory: the adult heart was highly regenerative, its regeneration was driven by c-kit stem cells, which could be isolated and used to treat injured hearts. All one had to do was harness the regenerative potential of c-kit cells in the bone marrow and the heart, and millions of patients all over the world suffering from heart failure might be cured. Not only did Anversa publish a slew of supportive papers in highly prestigious scientific journals to challenge the dogma of the quiescent heart, he also happened to publish them at a unique time in history which maximized their impact.

In the year 2001, there were few innovative treatments available to treat patients with heart failure. The standard approach was to use medications that would delay the progression of heart failure. But even the best medications could not prevent the gradual decline of heart function. Organ transplants were a cure, but transplantable hearts were rare and only a small fraction of heart failure patients would be fortunate enough to receive a new heart. Hopes for a definitive heart failure cure were buoyed when researchers isolated human embryonic stem cells in 1998. This discovery paved the way for using highly pliable embryonic stem cells to create new heart muscle cells, which might one day be used to restore the heart’s pump function without  resorting to a heart transplant.

 

Human heart jigsaw puzzle via Shutterstock
Human heart jigsaw puzzle via Shutterstock

The dreams of using embryonic stem cells to regenerate human hearts were soon squashed when the Bush administration banned the generation of new human embryonic stem cells in 2001, citing ethical concerns. These federal regulations and the lobbying of religious and political groups against human embryonic stem cells were a major blow to research on cardiovascular regeneration. Amidst this looming hiatus in cardiovascular regeneration, Anversa’s papers appeared and showed that one could steer clear of the ethical controversies surrounding embryonic stem cells by using an adult patient’s own stem cells. The Anversa group re-energized the field of cardiovascular stem cell research and cleared the path for the first human stem cell treatments in heart disease.

Instead of having to wait for the US government to reverse its restrictive policy on human embryonic stem cells, one could now initiate clinical trials with adult stem cells, treating heart attack patients with their own cells and without having to worry about an ethical quagmire. Heart failure might soon become a disease of the past. The excitement at all major national and international cardiovascular conferences was palpable whenever the Anversa group, their collaborators or other scientists working on bone marrow and cardiac stem cells presented their dizzyingly successful results. Anversa received numerous accolades for his discoveries and research grants from the NIH (National Institutes of Health) to further develop his research program. He was so successful that some researchers believed Anversa might receive the Nobel Prize for his iconoclastic work which had redefined the regenerative potential of the heart. Many of the world’s top universities were vying to recruit Anversa and his group, and he decided to relocate his research group to Harvard Medical School and Brigham and Women’s Hospital 2008.

There were naysayers and skeptics who had resisted the adult stem cell euphoria. Some researchers had spent decades studying the heart and found little to no evidence for regeneration in the adult heart. They were having difficulties reconciling their own results with those of the Anversa group. A number of practicing cardiologists who treated heart failure patients were also skeptical because they did not see the near-miraculous regenerative power of the heart in their patients. One Anversa paper went as far as suggesting that the whole heart would completely regenerate itself roughly every 8-9 years, a claim that was at odds with the clinical experience of practicing cardiologists.  Other researchers pointed out serious flaws in the Anversa papers. For example, the 2002 paper on stem cells in human heart transplant patients claimed that the hearts were coated with the recipient’s regenerative cells, including cells which contained the stem cell marker Sca-1. Within days of the paper’s publication, many researchers were puzzled by this finding because Sca-1 was a marker of mouse and rat cells – not human cells! If Anversa’s group was finding rat or mouse proteins in human hearts, it was most likely due to an artifact. And if they had mistakenly found rodent cells in human hearts, so these critics surmised, perhaps other aspects of Anversa’s research were similarly flawed or riddled with artifacts.

At national and international meetings, one could observe heated debates between members of the Anversa camp and their critics. The critics then decided to change their tactics. Instead of just debating Anversa and commenting about errors in the Anversa papers, they invested substantial funds and efforts to replicate Anversa’s findings. One of the most important and rigorous attempts to assess the validity of the Orlic paper was published in 2004, by the research teams of Chuck Murry and Loren Field. Murry and Field found no evidence of bone marrow cells converting into heart muscle cells. This was a major scientific blow to the burgeoning adult stem cell movement, but even this paper could not deter the bone marrow cell champions.

Despite the fact that the refutation of the Orlic paper was published in 2004, the Orlic paper continues to carry the dubious distinction of being one of the most cited papers in the history of stem cell research. At first, Anversa and his colleagues would shrug off their critics’ findings or publish refutations of refutations – but over time, an increasing number of research groups all over the world began to realize that many of the central tenets of Anversa’s work could not be replicated and the number of critics and skeptics increased. As the signs of irreplicability and other concerns about Anversa’s work mounted, Harvard and Brigham and Women’s Hospital were forced to initiate an internal investigation which resulted in the retraction of one Anversa paper and an expression of concern about another major paper. Finally, a research group published a paper in May 2014 using mice in which c-kit cells were genetically labeled so that one could track their fate and found that c-kit cells have a minimal – if any – contribution to the formation of new heart cells: a fraction of a percent!

The skeptics who had doubted Anversa’s claims all along may now feel vindicated, but this is not the time to gloat. Instead, the discipline of cardiovascular stem cell biology is now undergoing a process of soul-searching. How was it possible that some of the most widely read and cited papers were based on heavily flawed observations and assumptions? Why did it take more than a decade since the first refutation was published in 2004 for scientists to finally accept that the near-magical regenerative power of the heart turned out to be a pipe dream.

One reason for this lag time is pretty straightforward: It takes a tremendous amount of time to refute papers. Funding to conduct the experiments is difficult to obtain because grant funding agencies are not easily convinced to invest in studies replicating existing research. For a refutation to be accepted by the scientific community, it has to be at least as rigorous as the original, but in practice, refutations are subject to even greater scrutiny. Scientists trying to disprove another group’s claim may be asked to develop even better research tools and technologies so that their results can be seen as more definitive than those of the original group. Instead of relying on antibodies to identify c-kit cells, the 2014 refutation developed a transgenic mouse in which all c-kit cells could be genetically traced to yield more definitive results – but developing new models and tools can take years.

The scientific peer review process by external researchers is a central pillar of the quality control process in modern scientific research, but one has to be cognizant of its limitations. Peer review of a scientific manuscript is routinely performed by experts for all the major academic journals which publish original scientific results. However, peer review only involves a “review”, i.e. a general evaluation of major strengths and flaws, and peer reviewers do not see the original raw data nor are they provided with the resources to replicate the studies and confirm the veracity of the submitted results. Peer reviewers rely on the honor system, assuming that the scientists are submitting accurate representations of their data and that the data has been thoroughly scrutinized and critiqued by all the involved researchers before it is even submitted to a journal for publication. If peer reviewers were asked to actually wade through all the original data generated by the scientists and even perform confirmatory studies, then the peer review of every single manuscript could take years and one would have to find the money to pay for the replication or confirmation experiments conducted by peer reviewers. Publication of experiments would come to a grinding halt because thousands of manuscripts would be stuck in the purgatory of peer review. Relying on the integrity of the scientists submitting the data and their internal review processes may seem naïve, but it has always been the bedrock of scientific peer review. And it is precisely the internal review process which may have gone awry in the Anversa group.

Just like Pygmalion fell in love with Galatea, researchers fall in love with the hypotheses and theories that they have constructed. To minimize the effects of these personal biases, scientists regularly present their results to colleagues within their own groups at internal lab meetings and seminars or at external institutions and conferences long before they submit their data to a peer-reviewed journal. The preliminary presentations are intended to spark discussions, inviting the audience to challenge the veracity of the hypotheses and the data while the work is still in progress. Sometimes fellow group members are truly skeptical of the results, at other times they take on the devil’s advocate role to see if they can find holes in their group’s own research. The larger a group, the greater the chance that one will find colleagues within a group with dissenting views. This type of feedback is a necessary internal review process which provides valuable insights that can steer the direction of the research.

Considering the size of the Anversa group – consisting of 20, 30 or even more PhD students, postdoctoral fellows and senior scientists – it is puzzling why the discussions among the group members did not already internally challenge their hypotheses and findings, especially in light of the fact that they knew extramural scientists were having difficulties replicating the work.

Retraction Watch is one of the most widely read scientific watchdogs which tracks scientific misconduct and retractions of published scientific papers. Recently, Retraction Watch published the account of an anonymous whistleblower who had worked as a research fellow in Anversa’s group and provided some unprecedented insights into the inner workings of the group, which explain why the internal review process had failed:

“I think that most scientists, perhaps with the exception of the most lucky or most dishonest, have personal experience with failure in science—experiments that are unreproducible, hypotheses that are fundamentally incorrect. Generally, we sigh, we alter hypotheses, we develop new methods, we move on. It is the data that should guide the science.

 In the Anversa group, a model with much less intellectual flexibility was applied. The “Hypothesis” was that c-kit (cd117) positive cells in the heart (or bone marrow if you read their earlier studies) were cardiac progenitors that could: 1) repair a scarred heart post-myocardial infarction, and: 2) supply the cells necessary for cardiomyocyte turnover in the normal heart.

 This central theme was that which supplied the lab with upwards of $50 million worth of public funding over a decade, a number which would be much higher if one considers collaborating labs that worked on related subjects.

 In theory, this hypothesis would be elegant in its simplicity and amenable to testing in current model systems. In practice, all data that did not point to the “truth” of the hypothesis were considered wrong, and experiments which would definitively show if this hypothesis was incorrect were never performed (lineage tracing e.g.).”

Discarding data that might have challenged the central hypothesis appears to have been a central principle.

 

Hood over screen - via Shutterstock
Hood over screen – via Shutterstock

According to the whistleblower, Anversa’s group did not just discard undesirable data, they actually punished group members who would question the group’s hypotheses:

In essence, to Dr. Anversa all investigators who questioned the hypothesis were “morons,” a word he used frequently at lab meetings. For one within the group to dare question the central hypothesis, or the methods used to support it, was a quick ticket to dismissal from your position.

The group also created an environment of strict information hierarchy and secrecy which is antithetical to the spirit of science:

“The day to day operation of the lab was conducted under a severe information embargo. The lab had Piero Anversa at the head with group leaders Annarosa Leri, Jan Kajstura and Marcello Rota immediately supervising experimentation. Below that was a group of around 25 instructors, research fellows, graduate students and technicians. Information flowed one way, which was up, and conversation between working groups was generally discouraged and often forbidden.

 Raw data left one’s hands, went to the immediate superior (one of the three named above) and the next time it was seen would be in a manuscript or grant. What happened to that data in the intervening period is unclear.

 A side effect of this information embargo was the limitation of the average worker to determine what was really going on in a research project. It would also effectively limit the ability of an average worker to make allegations regarding specific data/experiments, a requirement for a formal investigation.

This segregation of information is a powerful method to maintain an authoritarian rule and is more typical for terrorist cells or intelligence agencies than for a scientific lab, but it would definitely explain how the Anversa group was able to mass produce numerous irreproducible papers without any major dissent from within the group.

In addition to the secrecy and segregation of information, the group also created an atmosphere of fear to ensure obedience:

“Although individually-tailored stated and unstated threats were present for lab members, the plight of many of us who were international fellows was especially harrowing. Many were technically and educationally underqualified compared to what might be considered average research fellows in the United States. Many also originated in Italy where Dr. Anversa continues to wield considerable influence over biomedical research.

 This combination of being undesirable to many other labs should they leave their position due to lack of experience/training, dependent upon employment for U.S. visa status, and under constant threat of career suicide in your home country should you leave, was enough to make many people play along.

 Even so, I witnessed several people question the findings during their time in the lab. These people and working groups were subsequently fired or resigned. I would like to note that this lab is not unique in this type of exploitative practice, but that does not make it ethically sound and certainly does not create an environment for creative, collaborative, or honest science.”

Foreign researchers are particularly dependent on their employment to maintain their visa status and the prospect of being fired from one’s job can be terrifying for anyone.

This is an anonymous account of a whistleblower and as such, it is problematic. The use of anonymous sources in science journalism could open the doors for all sorts of unfounded and malicious accusations, which is why the ethics of using anonymous sources was heavily debated at the recent ScienceOnline conference. But the claims of the whistleblower are not made in a vacuum – they have to be evaluated in the context of known facts. The whistleblower’s claim that the Anversa group and their collaborators received more than $50 million to study bone marrow cell and c-kit cell regeneration of the heart can be easily verified at the public NIH grant funding RePORTer website. The whistleblower’s claim that many of the Anversa group’s findings could not be replicated is also a verifiable fact. It may seem unfair to condemn Anversa and his group for creating an atmosphere of secrecy and obedience which undermined the scientific enterprise, caused torment among trainees and wasted millions of dollars of tax payer money simply based on one whistleblower’s account. However, if one looks at the entire picture of the amazing rise and decline of the Anversa group’s foray into cardiac regeneration, then the whistleblower’s description of the atmosphere of secrecy and hierarchy seems very plausible.

The investigation of Harvard into the Anversa group is not open to the public and therefore it is difficult to know whether the university is primarily investigating scientific errors or whether it is also looking into such claims of egregious scientific misconduct and abuse of scientific trainees. It is unlikely that Anversa’s group is the only group that might have engaged in such forms of misconduct. Threatening dissenting junior researchers with a loss of employment or visa status may be far more common than we think. The gravity of the problem requires that the NIH – the major funding agency for biomedical research in the US – should look into the prevalence of such practices in research labs and develop safeguards to prevent the abuse of science and scientists.

 

Note: An earlier version of this article was first published on 3quarksdaily.com.

Some Highlights of the Live Chat: “Are We Doing Science the Right Way?”

On February 7, 2013, ScienceNOW organized a Live Chat with the microbiologists Ferric Fang and Arturo Casadevall that was moderated by the Science staff writer Jennifer Couzin-Frankel and discussed a very broad range of topics related to how we currently conduct science. For those who could not participate in the Live Chat, I will summarize some key comments made by Fang and Casadevall, Couzin-Frankel or other commenters.

 

I have grouped the comments into key themes and also added some of my own thoughts.

 

1. Introduction to the goals of the Live Chat:

Jennifer Couzin-Frankel: …..For several years (at least) researchers have worried about where their profession is heading. As much as most of them love working in the lab, they’re also facing sometimes extreme pressure to land grants and publish hot papers. And surveys have shown that a subset are even bending or breaking the rules to accomplish that.….With us today are two guests who are studying the “science of science” together, and considering how to nurture discovery and reduce misconduct…

 

Pressure to publish, the difficulties to obtain grant funding, scientific misconduct – these are all topics that should be of interest to all of us who are actively engaged in science.

 

2. Science funding:

Ferric Fang: ….the way in which science is funded has a profound effect on how and what science is done. Paula Stephan has recently written an excellent book on this subject called “How Economics Shapes Science.”

Ferric Fang: Many are understandably reluctant to ask for more funding given the global recession and halting recovery. But I believe a persuasive economic case can be made for greater investment in R&D paying off in the long run. Paula Stephan notes that the U.S. spends twice as much on beer as on science each year.

 

These are great points. I often get the sense that federal funding for science and education is portrayed as an unnecessary luxury, charity or a form of waste. We have to remind people that investments in science and education are a very important investment with long-term returns.

 

3. Reproducibility and the self-correcting nature of science:

Arturo Casadevall: Is science self-correcting? Yes and No. In areas where there is a lot of interest in a subject experiments will be repeated and bad science will be ferreted out. However, until someone sets out to repeat an experiment we do not know whether it is reproducible. We do not know what percentage of the literature is right because no one has ever done a systematic study to see what fraction is reproducible.

 

I think that the reproducibility crisis is one of the biggest challenges for contemporary science. Thousands of scientific papers are published every day, and only a tiny fraction of them will ever be tested for reproducibility. There is minimal funding for attempting to replicate published data and also very little incentive for scientists, because even if they are able to replicate the published work, they will have a hard time publishing a confirmatory study. The lack of attempts to replicate scientific data creates a lot of uncertainty, because we do not really know, how much of the published data is truly valid.

 

Comment From David R Van Houten: …The absence of these weekly [lab] meetings was the single biggest factor allowing for the data fabrication and falsification that I observed 20 years ago as a PhD student. I pushed to get these meetings organized, and when they did occur, it made it easier to get the offender to stop, and easier to “salvage” original data…

 

I agree that regular lab meetings and more supervision by senior researchers and principal investigators can help contain and prevent data fabrication and falsification. However, overt data fabrication and fraud are probably not as common as “data fudging”, where experiments or data points are conveniently ignored because they do not fit the desired model. This kind of “data fudging” is not just a problem of junior scientists, but also occurs with senior scientists.

 

Ferric Fang: Peer review plays an important role in self-correction of science but as nearly everyone recognizes, it is not perfect. Mechanisms of post-publication review to address the problems are very important– these include errata, retractions, correspondences, follow up publications, and nowadays, public discussion on blogs and other websites.

 

I am glad that Fang (who is an editor-in-chief of an academic journal) recognizes the importance of post-publication review, and mentions blog discussions as one such form of post publication review.

 

4. Are salaries of scientists too low?

Comment From Shabbir: When an hedge fund manager makes 100 times more than a theoretical physicist, how can we expect the bright minds to go to science?

 

I agree that academic salaries for scientists are on the lower side, especially when compared with the salary that one can make in the private industry. However, I do not think that obscene salaries of hedge fund managers are the correct comparison. If the US wants to attract and retain excellent scientists, raising their salaries is definitely important. Scientists are routinely over-worked, balancing their research work, teaching, mentoring and administrative duties and receive very inadequate compensation. I have also observed a near-cynical attitude of many elite universities, which try to portray working as a scientist as an “honor” that should not require much compensation. This kind of abuse really needs to end.

 

5. Communicating science to the public

Arturo Casadevall: … Many scientists cannot explain their work at a dinner party and keep the other guests interested. We are passionate about what we do but we are often terrible in communicating the excitement that we feel. I think this is one area where perhaps better public communicating skills are needed and maybe some attention should be given to mastering these arts in training.

 

I could not agree more. Communicating science should be part of every PhD program, postdoctoral training and an ongoing effort when a scientist becomes an independent principal investigator.

 

6. Are we focusing on quantity rather than quality in science?

Ferric Fang: …. There are now in excess of 50,000,000 scientific publications according to one estimate, and we are in danger of creating a Library of Babel in which it is impossible to find the truth buried amidst poor quality or unimportant publications. This is in part a consequence of the “publish or perish” mentality in academia. A focus on quality rather than quantity in promotion decisions might help.

 

It is correct that the amount of scientific data being generated is overwhelming, but I am not sure that there is an easy way to find the “truth”. Scientific “truth” is very dynamic and I think it is becoming more and more difficult to publish in the high impact journals. A typical paper in a high-impact journal now has anywhere between 5 and 20 supplemental figures and tables, and that same paper could have been published as two or three separate papers just a few decades ago. We now just have many more active scientists all over the world that have begun publishing in English and we all have tools that generate huge amounts of data in a matter of weeks (such as microarrays, proteomics and metabolomics). It is likely that the number of publications will continue to rise in the next years and we need to come up with an innovative system to manage scientific information. Hopefully, scientists will realize that managing and evaluating existing scientific information is just as valuable as generating new scientific datasets.

 

This was a great and inspiring discussion and I look forward to other such Live Chat events.