The Road to Bad Science Is Paved with Obedience and Secrecy

We often laud intellectual diversity of a scientific research group because we hope that the multitude of opinions can help point out flaws and improve the quality of research long before it is finalized and written up as a manuscript. The recent events surrounding the research in one of the world’s most famous stem cell research laboratories at Harvard shows us the disastrous effects of suppressing diverse and dissenting opinions.

Cultured cells
Cultured cells via Shutterstock

The infamous “Orlic paper” was a landmark research article published in the prestigious scientific journal Nature in 2001, which showed that stem cells contained in the bone marrow could be converted into functional heart cells. After a heart attack, injections of bone marrow cells reversed much of the heart attack damage by creating new heart cells and restoring heart function. It was called the “Orlic paper” because the first author of the paper was Donald Orlic, but the lead investigator of the study was Piero Anversa, a professor and highly respected scientist at New York Medical College.

Anversa had established himself as one of the world’s leading experts on the survival and death of heart muscle cells in the 1980s and 1990s, but with the start of the new millennium, Anversa shifted his laboratory’s focus towards the emerging field of stem cell biology and its role in cardiovascular regeneration. The Orlic paper was just one of several highly influential stem cell papers to come out of Anversa’s lab at the onset of the new millenium. A 2002 Anversa paper in the New England Journal of Medicine – the world’s most highly cited academic journal –investigated the hearts of human organ transplant recipients. This study showed that up to 10% of the cells in the transplanted heart were derived from the recipient’s own body. The only conceivable explanation was that after a patient received another person’s heart, the recipient’s own cells began maintaining the health of the transplanted organ. The Orlic paper had shown the regenerative power of bone marrow cells in mouse hearts, but this new paper now offered the more tantalizing suggestion that even human hearts could be regenerated by circulating stem cells in their blood stream.

Woman having a heart attack via Shutterstock
Heart attack via Shutterstock

2003 publication in Cell by the Anversa group described another ground-breaking discovery, identifying a reservoir of stem cells contained within the heart itself. This latest coup de force found that the newly uncovered heart stem cell population resembled the bone marrow stem cells because both groups of cells bore the same stem cell protein called c-kit and both were able to make new heart muscle cells. According to Anversa, c-kit cells extracted from a heart could be re-injected back into a heart after a heart attack and regenerate more than half of the damaged heart!

These Anversa papers revolutionized cardiovascular research. Prior to 2001, most cardiovascular researchers believed that the cell turnover in the adult mammalian heart was minimal because soon after birth, heart cells stopped dividing. Some organs or tissues such as the skin contained stem cells which could divide and continuously give rise to new cells as needed. When skin is scraped during a fall from a bike, it only takes a few days for new skin cells to coat the area of injury and heal the wound. Unfortunately, the heart was not one of those self-regenerating organs. The number of heart cells was thought to be more or less fixed in adults. If heart cells were damaged by a heart attack, then the affected area was replaced by rigid scar tissue, not new heart muscle cells. If the area of damage was large, then the heart’s pump function was severely compromised and patients developed the chronic and ultimately fatal disease known as “heart failure”.

Anversa’s work challenged this dogma by putting forward a bold new theory: the adult heart was highly regenerative, its regeneration was driven by c-kit stem cells, which could be isolated and used to treat injured hearts. All one had to do was harness the regenerative potential of c-kit cells in the bone marrow and the heart, and millions of patients all over the world suffering from heart failure might be cured. Not only did Anversa publish a slew of supportive papers in highly prestigious scientific journals to challenge the dogma of the quiescent heart, he also happened to publish them at a unique time in history which maximized their impact.

In the year 2001, there were few innovative treatments available to treat patients with heart failure. The standard approach was to use medications that would delay the progression of heart failure. But even the best medications could not prevent the gradual decline of heart function. Organ transplants were a cure, but transplantable hearts were rare and only a small fraction of heart failure patients would be fortunate enough to receive a new heart. Hopes for a definitive heart failure cure were buoyed when researchers isolated human embryonic stem cells in 1998. This discovery paved the way for using highly pliable embryonic stem cells to create new heart muscle cells, which might one day be used to restore the heart’s pump function without  resorting to a heart transplant.

human heart jigsaw puzzle
Human heart jigsaw puzzle via Shutterstock

The dreams of using embryonic stem cells to regenerate human hearts were soon squashed when the Bush administration banned the generation of new human embryonic stem cells in 2001, citing ethical concerns. These federal regulations and the lobbying of religious and political groups against human embryonic stem cells were a major blow to research on cardiovascular regeneration. Amidst this looming hiatus in cardiovascular regeneration, Anversa’s papers appeared and showed that one could steer clear of the ethical controversies surrounding embryonic stem cells by using an adult patient’s own stem cells. The Anversa group re-energized the field of cardiovascular stem cell research and cleared the path for the first human stem cell treatments in heart disease.

Instead of having to wait for the US government to reverse its restrictive policy on human embryonic stem cells, one could now initiate clinical trials with adult stem cells, treating heart attack patients with their own cells and without having to worry about an ethical quagmire. Heart failure might soon become a disease of the past. The excitement at all major national and international cardiovascular conferences was palpable whenever the Anversa group, their collaborators or other scientists working on bone marrow and cardiac stem cells presented their dizzyingly successful results. Anversa received numerous accolades for his discoveries and research grants from the NIH (National Institutes of Health) to further develop his research program. He was so successful that some researchers believed Anversa might receive the Nobel Prize for his iconoclastic work which had redefined the regenerative potential of the heart. Many of the world’s top universities were vying to recruit Anversa and his group, and he decided to relocate his research group to Harvard Medical School and Brigham and Women’s Hospital 2008.

There were naysayers and skeptics who had resisted the adult stem cell euphoria. Some researchers had spent decades studying the heart and found little to no evidence for regeneration in the adult heart. They were having difficulties reconciling their own results with those of the Anversa group. A number of practicing cardiologists who treated heart failure patients were also skeptical because they did not see the near-miraculous regenerative power of the heart in their patients. One Anversa paper went as far as suggesting that the whole heart would completely regenerate itself roughly every 8-9 years, a claim that was at odds with the clinical experience of practicing cardiologists.  Other researchers pointed out serious flaws in the Anversa papers. For example, the 2002 paper on stem cells in human heart transplant patients claimed that the hearts were coated with the recipient’s regenerative cells, including cells which contained the stem cell marker Sca-1. Within days of the paper’s publication, many researchers were puzzled by this finding because Sca-1 was a marker of mouse and rat cells – not human cells! If Anversa’s group was finding rat or mouse proteins in human hearts, it was most likely due to an artifact. And if they had mistakenly found rodent cells in human hearts, so these critics surmised, perhaps other aspects of Anversa’s research were similarly flawed or riddled with artifacts.

At national and international meetings, one could observe heated debates between members of the Anversa camp and their critics. The critics then decided to change their tactics. Instead of just debating Anversa and commenting about errors in the Anversa papers, they invested substantial funds and efforts to replicate Anversa’s findings. One of the most important and rigorous attempts to assess the validity of the Orlic paper was published in 2004, by the research teams of Chuck Murry and Loren Field. Murry and Field found no evidence of bone marrow cells converting into heart muscle cells. This was a major scientific blow to the burgeoning adult stem cell movement, but even this paper could not deter the bone marrow cell champions.

Despite the fact that the refutation of the Orlic paper was published in 2004, the Orlic paper continues to carry the dubious distinction of being one of the most cited papers in the history of stem cell research. At first, Anversa and his colleagues would shrug off their critics’ findings or publish refutations of refutations – but over time, an increasing number of research groups all over the world began to realize that many of the central tenets of Anversa’s work could not be replicated and the number of critics and skeptics increased. As the signs of irreplicability and other concerns about Anversa’s work mounted, Harvard and Brigham and Women’s Hospital were forced to initiate an internal investigation which resulted in the retraction of one Anversa paper and an expression of concern about another major paper. Finally, a research group published a paper in May 2014 using mice in which c-kit cells were genetically labeled so that one could track their fate and found that c-kit cells have a minimal – if any – contribution to the formation of new heart cells: a fraction of a percent!

The skeptics who had doubted Anversa’s claims all along may now feel vindicated, but this is not the time to gloat. Instead, the discipline of cardiovascular stem cell biology is now undergoing a process of soul-searching. How was it possible that some of the most widely read and cited papers were based on heavily flawed observations and assumptions? Why did it take more than a decade since the first refutation was published in 2004 for scientists to finally accept that the near-magical regenerative power of the heart turned out to be a pipe dream.

One reason for this lag time is pretty straightforward: It takes a tremendous amount of time to refute papers. Funding to conduct the experiments is difficult to obtain because grant funding agencies are not easily convinced to invest in studies replicating existing research. For a refutation to be accepted by the scientific community, it has to be at least as rigorous as the original, but in practice, refutations are subject to even greater scrutiny. Scientists trying to disprove another group’s claim may be asked to develop even better research tools and technologies so that their results can be seen as more definitive than those of the original group. Instead of relying on antibodies to identify c-kit cells, the 2014 refutation developed a transgenic mouse in which all c-kit cells could be genetically traced to yield more definitive results – but developing new models and tools can take years.

The scientific peer review process by external researchers is a central pillar of the quality control process in modern scientific research, but one has to be cognizant of its limitations. Peer review of a scientific manuscript is routinely performed by experts for all the major academic journals which publish original scientific results. However, peer review only involves a “review”, i.e. a general evaluation of major strengths and flaws, and peer reviewers do not see the original raw data nor are they provided with the resources to replicate the studies and confirm the veracity of the submitted results. Peer reviewers rely on the honor system, assuming that the scientists are submitting accurate representations of their data and that the data has been thoroughly scrutinized and critiqued by all the involved researchers before it is even submitted to a journal for publication. If peer reviewers were asked to actually wade through all the original data generated by the scientists and even perform confirmatory studies, then the peer review of every single manuscript could take years and one would have to find the money to pay for the replication or confirmation experiments conducted by peer reviewers. Publication of experiments would come to a grinding halt because thousands of manuscripts would be stuck in the purgatory of peer review. Relying on the integrity of the scientists submitting the data and their internal review processes may seem naïve, but it has always been the bedrock of scientific peer review. And it is precisely the internal review process which may have gone awry in the Anversa group.

Just like Pygmalion fell in love with Galatea, researchers fall in love with the hypotheses and theories that they have constructed. To minimize the effects of these personal biases, scientists regularly present their results to colleagues within their own groups at internal lab meetings and seminars or at external institutions and conferences long before they submit their data to a peer-reviewed journal. The preliminary presentations are intended to spark discussions, inviting the audience to challenge the veracity of the hypotheses and the data while the work is still in progress. Sometimes fellow group members are truly skeptical of the results, at other times they take on the devil’s advocate role to see if they can find holes in their group’s own research. The larger a group, the greater the chance that one will find colleagues within a group with dissenting views. This type of feedback is a necessary internal review process which provides valuable insights that can steer the direction of the research.

Considering the size of the Anversa group – consisting of 20, 30 or even more PhD students, postdoctoral fellows and senior scientists – it is puzzling why the discussions among the group members did not already internally challenge their hypotheses and findings, especially in light of the fact that they knew extramural scientists were having difficulties replicating the work.

Retraction Watch is one of the most widely read scientific watchdogs which tracks scientific misconduct and retractions of published scientific papers. Recently, Retraction Watch published the account of an anonymous whistleblower who had worked as a research fellow in Anversa’s group and provided some unprecedented insights into the inner workings of the group, which explain why the internal review process had failed:

“I think that most scientists, perhaps with the exception of the most lucky or most dishonest, have personal experience with failure in science—experiments that are unreproducible, hypotheses that are fundamentally incorrect. Generally, we sigh, we alter hypotheses, we develop new methods, we move on. It is the data that should guide the science.

 In the Anversa group, a model with much less intellectual flexibility was applied. The “Hypothesis” was that c-kit (cd117) positive cells in the heart (or bone marrow if you read their earlier studies) were cardiac progenitors that could: 1) repair a scarred heart post-myocardial infarction, and: 2) supply the cells necessary for cardiomyocyte turnover in the normal heart.

 This central theme was that which supplied the lab with upwards of $50 million worth of public funding over a decade, a number which would be much higher if one considers collaborating labs that worked on related subjects.

 In theory, this hypothesis would be elegant in its simplicity and amenable to testing in current model systems. In practice, all data that did not point to the “truth” of the hypothesis were considered wrong, and experiments which would definitively show if this hypothesis was incorrect were never performed (lineage tracing e.g.).”

Discarding data that might have challenged the central hypothesis appears to have been a central principle.

Hood over screen
via Shutterstock

According to the whistleblower, Anversa’s group did not just discard undesirable data, they actually punished group members who would question the group’s hypotheses:

In essence, to Dr. Anversa all investigators who questioned the hypothesis were “morons,” a word he used frequently at lab meetings. For one within the group to dare question the central hypothesis, or the methods used to support it, was a quick ticket to dismissal from your position.

The group also created an environment of strict information hierarchy and secrecy which is antithetical to the spirit of science:

“The day to day operation of the lab was conducted under a severe information embargo. The lab had Piero Anversa at the head with group leaders Annarosa Leri, Jan Kajstura and Marcello Rota immediately supervising experimentation. Below that was a group of around 25 instructors, research fellows, graduate students and technicians. Information flowed one way, which was up, and conversation between working groups was generally discouraged and often forbidden.

 Raw data left one’s hands, went to the immediate superior (one of the three named above) and the next time it was seen would be in a manuscript or grant. What happened to that data in the intervening period is unclear.

 A side effect of this information embargo was the limitation of the average worker to determine what was really going on in a research project. It would also effectively limit the ability of an average worker to make allegations regarding specific data/experiments, a requirement for a formal investigation.

This segregation of information is a powerful method to maintain an authoritarian rule and is more typical for terrorist cells or intelligence agencies than for a scientific lab, but it would definitely explain how the Anversa group was able to mass produce numerous irreproducible papers without any major dissent from within the group.

In addition to the secrecy and segregation of information, the group also created an atmosphere of fear to ensure obedience:

“Although individually-tailored stated and unstated threats were present for lab members, the plight of many of us who were international fellows was especially harrowing. Many were technically and educationally underqualified compared to what might be considered average research fellows in the United States. Many also originated in Italy where Dr. Anversa continues to wield considerable influence over biomedical research.

 This combination of being undesirable to many other labs should they leave their position due to lack of experience/training, dependent upon employment for U.S. visa status, and under constant threat of career suicide in your home country should you leave, was enough to make many people play along.

 Even so, I witnessed several people question the findings during their time in the lab. These people and working groups were subsequently fired or resigned. I would like to note that this lab is not unique in this type of exploitative practice, but that does not make it ethically sound and certainly does not create an environment for creative, collaborative, or honest science.”

Foreign researchers are particularly dependent on their employment to maintain their visa status and the prospect of being fired from one’s job can be terrifying for anyone.

This is an anonymous account of a whistleblower and as such, it is problematic. The use of anonymous sources in science journalism could open the doors for all sorts of unfounded and malicious accusations, which is why the ethics of using anonymous sources was heavily debated at the recent ScienceOnline conference. But the claims of the whistleblower are not made in a vacuum – they have to be evaluated in the context of known facts. The whistleblower’s claim that the Anversa group and their collaborators received more than $50 million to study bone marrow cell and c-kit cell regeneration of the heart can be easily verified at the public NIH grant funding RePORTer website. The whistleblower’s claim that many of the Anversa group’s findings could not be replicated is also a verifiable fact. It may seem unfair to condemn Anversa and his group for creating an atmosphere of secrecy and obedience which undermined the scientific enterprise, caused torment among trainees and wasted millions of dollars of tax payer money simply based on one whistleblower’s account. However, if one looks at the entire picture of the amazing rise and decline of the Anversa group’s foray into cardiac regeneration, then the whistleblower’s description of the atmosphere of secrecy and hierarchy seems very plausible.

The investigation of Harvard into the Anversa group is not open to the public and therefore it is difficult to know whether the university is primarily investigating scientific errors or whether it is also looking into such claims of egregious scientific misconduct and abuse of scientific trainees. It is unlikely that Anversa’s group is the only group that might have engaged in such forms of misconduct. Threatening dissenting junior researchers with a loss of employment or visa status may be far more common than we think. The gravity of the problem requires that the NIH – the major funding agency for biomedical research in the US – should look into the prevalence of such practices in research labs and develop safeguards to prevent the abuse of science and scientists.

Note: An earlier version of this article was first published on 3quarksdaily.com.

To Err Is Human, To Study Errors Is Science

The family of cholesterol lowering drugs known as ‘statins’ are among the most widely prescribed medications for patients with cardiovascular disease. Large-scale clinical studies have repeatedly shown that statins can significantly lower cholesterol levels and the risk of future heart attacks, especially in patients who have already been diagnosed with cardiovascular disease. A more contentious issue is the use of statins in individuals who have no history of heart attacks, strokes or blockages in their blood vessels. Instead of waiting for the first major manifestation of cardiovascular disease, should one start statin therapy early on to prevent cardiovascular disease?

If statins were free of charge and had no side effects whatsoever, the answer would be rather straightforward: Go ahead and use them as soon as possible. However, like all medications, statins come at a price. There is the financial cost to the patient or their insurance to pay for the medications, and there is a health cost to the patients who experience potential side effects. The Guideline Panel of the American College of Cardiology (ACC) and the American Heart Association (AHA) therefore recently recommended that the preventive use of statins in individuals without known cardiovascular disease should be based on personalized risk calculations. If the risk of developing disease within the next 10 years is greater than 7.5%, then the benefits of statin therapy outweigh its risks and the treatment should be initiated. The panel also indicated that if the 10-year risk of cardiovascular disease is greater than 5%, then physicians should consider prescribing statins, but should bear in mind that the scientific evidence for this recommendation was not as strong as that for higher-risk individuals.

 

Oops button - via Shutterstock
Oops button – via Shutterstock

Using statins in low risk patients

The recommendation that individuals with comparatively low risk of developing future cardiovascular disease (10-year risk lower than 10%) would benefit from statins was met skepticism by some medical experts. In October 2013, the British Medical Journal (BMJ) published a paper by John Abramson, a lecturer at Harvard Medical School, and his colleagues which re-evaluated the data from a prior study on statin benefits in patients with less than 10% cardiovascular disease risk over 10 years. Abramson and colleagues concluded that the statin benefits were over-stated and that statin therapy should not be expanded to include this group of individuals. To further bolster their case, Abramson and colleagues also cited a 2013 study by Huabing Zhang and colleagues in the Annals of Internal Medicine which (according to Abramson et al.) had reported that 18 % of patients discontinued statins due to side effects. Abramson even highlighted the finding from the Zhang study by including it as one of four bullet points summarizing the key take-home messages of his article.

The problem with this characterization of the Zhang study is that it ignored all the caveats that Zhang and colleagues had mentioned when discussing their findings. The Zhang study was based on the retrospective review of patient charts and did not establish a true cause-and-effect relationship between the discontinuation of the statins and actual side effects of statins. Patients may stop taking medications for many reasons, but this does not necessarily mean that it is due to side effects from the medication. According to the Zhang paper, 17.4% of patients in their observational retrospective study had reported a “statin related incident” and of those only 59% had stopped the medication. The fraction of patients discontinuing statins due to suspected side effects was at most 9-10% instead of the 18% cited by Abramson. But as Zhang pointed out, their study did not include a placebo control group. Trials with placebo groups document similar rates of “side effects” in patients taking statins and those taking placebos, suggesting that only a small minority of perceived side effects are truly caused by the chemical compounds in statin drugs.

 

Admitting errors is only the first step

Whether 18%, 9% or a far smaller proportion of patients experience significant medication side effects is no small matter because the analysis could affect millions of patients currently being treated with statins. A gross overestimation of statin side effects could prompt physicians to prematurely discontinue medications that have been shown to significantly reduce the risk of heart attacks in a wide range of patients. On the other hand, severely underestimating statin side effects could result in the discounting of important symptoms and the suffering of patients. Abramson’s misinterpretation of statin side effect data was pointed out by readers of the BMJ soon after the article published, and it prompted an inquiry by the journal. After re-evaluating the data and discussing the issue with Abramson and colleagues, the journal issued a correction in which it clarified the misrepresentation of the Zhang paper.

Fiona Godlee, the editor-in-chief of the BMJ also wrote an editorial explaining the decision to issue a correction regarding the question of side effects and that there was not sufficient cause to retract the whole paper since the other points made by Abramson and colleagues – the lack of benefit in low risk patients – might still hold true. Instead, Godlee recognized the inherent bias of a journal’s editor when it comes to deciding on whether or not to retract a paper. Every retraction of a peer reviewed scholarly paper is somewhat of an embarrassment to the authors of the paper as well as the journal because it suggests that the peer review process failed to identify one or more major flaws. In a commendable move, the journal appointed a multidisciplinary review panel which includes leading cardiovascular epidemiologists. This panel will review the Abramson paper as well as another BMJ paper which had also cited the inaccurately high frequency of statin side effects, investigate the peer review process that failed to identify the erroneous claims and provide recommendations regarding the ultimate fate of the papers.

 

Reviewing peer review

Why didn’t the peer reviewers who evaluated Abramson’s article catch the error prior to its publication? We can only speculate as to why such a major error was not identified by the peer reviewers. One has to bear in mind that “peer review” for academic research journals is just that – a review. In most cases, peer reviewers do not have access to the original data and cannot check the veracity or replicability of analyses and experiments. For most journals, peer review is conducted on a voluntary (unpaid) basis by two to four expert reviewers who routinely spend multiple hours analyzing the appropriateness of the experimental design, methods, presentation of results and conclusions of a submitted manuscript. The reviewers operate under the assumption that the authors of the manuscript are professional and honest in terms of how they present the data and describe their scientific methodology.

In the case of Abramson and colleagues, the correction issued by the BMJ refers not to Abramson’s own analysis but to the misreading of another group’s research. Biomedical research papers often cite 30 or 40 studies, and it is unrealistic to expect that peer reviewers read all the cited papers and ensure that they are being properly cited and interpreted. If this were the expectation, few peer reviewers would agree to serve as volunteer reviewers since they would have hardly any time left to conduct their own research. However, in this particular case, most peer reviewers familiar with statins and the controversies surrounding their side effects should have expressed concerns regarding the extraordinarily high figure of 18% cited by Abramson and colleagues. Hopefully, the review panel will identify the reasons for the failure of BMJ’s peer review system and point out ways to improve it.

 

To err is human, to study errors is science

All researchers make mistakes, simply because they are human. It is impossible to eliminate all errors in any endeavor that involves humans, but we can construct safeguards that help us reduce the occurrence and magnitude of our errors. Overt fraud and misconduct are rare causes of errors in research, but their effects on any given research field can be devastating. One of the most notorious occurrences of research fraud is the case of the Dutch psychologist Diederik Stapel who published numerous papers based on blatant fabrication of data – showing ‘results’ of experiments on non-existent study subjects. The field of cell therapy in cardiovascular disease recently experienced a major setback when a university review of studies headed by the German cardiologist Bodo Strauer found evidence of scientific misconduct. The significant discrepancies and irregularities in Strauer’s studies have now lead to wide-ranging skepticism about the efficacy of using bone marrow cell infusions to treat heart disease.

 

It is difficult to obtain precise numbers to quantify the actual extent of severe research misconduct and fraud since it may go undetected. Even when such cases are brought to the attention of the academic leadership, the involved committees and administrators may decide to keep their findings confidential and not disclose them to the public. However, most researchers working in academic research environments would probably agree that these are rare occurrences. A far more likely source of errors in research is the cognitive bias of the researchers. Researchers who believe in certain hypotheses and ideas are prone to interpreting data in a manner most likely to support their preconceived notions. For example, it is likely that a researcher opposed to statin usage will interpret data on side effects of statins differently than a researcher who supports statin usage. While Abramson may have been biased in the interpretation of the data generated by Zhang and colleagues, the field of cardiovascular regeneration is currently grappling in what appears to be a case of biased interpretation of one’s own data. An institutional review by Harvard Medical School and Brigham and Women’s Hospital recently determined that the work of Piero Anversa, one of the world’s most widely cited stem cell researchers, was significantly compromised and warranted a retraction. His group had reported that the adult human heart exhibited an amazing regenerative potential, suggesting that roughly every 8 to 9 years the adult human heart replaces its entire collective of beating heart cells (a 7% – 19% yearly turnover of beating heart cells). These findings were in sharp contrast to a prior study which had found only a minimal turnover of beating heart cells (1% or less per year) in adult humans. Anversa’s finding was also at odds with the observations of clinical cardiologists who rarely observe a near-miraculous recovery of heart function in patients with severe heart disease. One possible explanation for the huge discrepancy between the prior research and Anversa’s studies was that Anversa and his colleagues had not taken into account the possibility of contaminations that could have falsely elevated the cell regeneration counts.

 

Improving the quality of research: peer review and more

Despite the fact that researchers are prone to make errors due to inherent biases does not mean we should simply throw our hands up in the air, say “Mistakes happen!” and let matters rest. High quality science is characterized by its willingness to correct itself, and this includes improving methods to detect and correct scientific errors early on so that we can limit their detrimental impact. The realization that lack of reproducibility of peer-reviewed scientific papers is becoming a major problem for many areas of research such as psychology, stem cell research and cancer biology has prompted calls for better ways to track reproducibility and errors in science.

One important new paradigm that is being discussed to improve the quality of scholar papers is the role of post-publication peer evaluation. Instead of viewing the publication of a peer-reviewed research paper as an endpoint, post publication peer evaluation invites fellow scientists to continue commenting on the quality and accuracy of the published research even after its publication and to engage the authors in this process. Traditional peer review relies on just a handful of reviewers who decide about the fate of a manuscript, but post publication peer evaluation opens up the debate to hundreds or even thousands of readers which may be able to detect errors that could not be identified by the small number of traditional peer reviewers prior to publication. It is also becoming apparent that science journalists and science writers can play an important role in the post-publication evaluation of published research papers by investigating and communicating research flaws identified in research papers. In addition to helping dismantle the Science Mystique, critical science journalism can help ensure that corrections, retractions or other major concerns about the validity of scientific findings are communicated to a broad non-specialist audience.

In addition to these ongoing efforts to reduce errors in science by improving the evaluation of scientific papers, it may also be useful to consider new pro-active initiatives which focus on how researchers perform and design experiments. As the head of a research group at an American university, I have to take mandatory courses (in some cases on an annual basis) informing me about laboratory hazards, ethics of animal experimentation or the ethics of how to conduct human studies. However, there are no mandatory courses helping us identify our own research biases or how to minimize their impact on the interpretation of our data. There is an underlying assumption that if you are no longer a trainee, you probably know how to perform and interpret scientific experiments. I would argue that it does not hurt to remind scientists regularly – no matter how junior or senior- that they can become victims of their biases. We have to learn to continuously re-evaluate how we conduct science and to be humble enough to listen to our colleagues, especially when they disagree with us.

 

Note: A shorter version of this article was first published at The Conversation with excellent editorial input provided by Jo Adetunji.

 

ResearchBlogging.org
Abramson, J., Rosenberg, H., Jewell, N., & Wright, J. (2013). Should people at low risk of cardiovascular disease take a statin? BMJ, 347 (oct22 3) DOI: 10.1136/bmj.f6123

Critical Science Writing: A Checklist for the Life Sciences

One major obstacle in the “infotainment versus critical science writing” debate is that there is no universal definition of what constitutes “critical analysis” in science writing. How can we decide whether or not critical science writing is adequately represented in contemporary science writing or science journalism, if we do not have a standardized method of assessing it? For this purpose, I would like to propose the following checklist of points that can be addressed in news articles or blog-posts which focus on the critical analysis of published scientific research. This checklist is intended for the life sciences – biological and medical research – but it can be easily modified and applied to critical science writing in other areas of research. Each category contains examples of questions which science writers can direct towards members of the scientific research team, institutional representatives or by performing an independent review of the published scientific data. These questions will have to be modified according to the specific context of a research study.

 

1. Novelty of the scientific research:

Most researchers routinely claim that their findings are novel, but are the claims of novelty appropriate? Is the research pointing towards a fundamentally new biological mechanism or introducing a completely new scientific tool? Or does it just represent a minor incremental growth in our understanding of a biological problem?

 

2. Significance of the research:

How does the significance of the research compare to the significance of other studies in the field? A biological study might uncover new regulators of cell death or cell growth, but how many other such regulators have been discovered in recent years? How does the magnitude of the effect in the study compare to magnitude of effects in other research studies? Suppressing a gene might prolong the survival of a cell or increase the regeneration of an organ, but have research groups published similar effects in studies which target other genes? Some research studies report effects that are statistically significant, but are they also biologically significant?

 

3. Replicability:

Have the findings of the scientific study been replicated by other research groups? Does the research study attempt to partially or fully replicate prior research? If the discussed study has not yet been replicated, is there any information available on the general replicability success rate in this area of research?

 

4. Experimental design:

Did the researchers use an appropriate experimental design for the current study by ensuring that they included adequate control groups and addressed potential confounding factors? Were the experimental models appropriate for the questions they asked and for the conclusions they are drawing? Did the researchers study the effects they observed at multiple time points or just at one single time point? Did they report the results of all the time points or did they just pick the time points they were interested in?

Examples of issues: 1) Stem cell studies in which human stem cells are transplanted into injured or diseased mice are often conducted with immune deficient mice to avoid rejection of the human cells. Some studies do not assess whether the immune deficiency itself impacted the injury or disease, which could be a confounding factor when interpreting the results. 2) Studies which investigate the impact of the 24-hour internal biological clock on the expression of genes sometimes perform the studies in humans and animals who maintain a regular sleep-wake schedule. This obscures the cause-effect relationship because one is unable to ascertain whether the observed effects are truly regulated by an internal biological clock or whether they merely reflect changes associated with being awake versus asleep.

 

5. Experimental methods:

Are the methods used in the research study accepted by other researchers? If the methods are completely novel, have they been appropriately validated? Are there any potential artifacts that could explain the findings? How did the findings in a dish (“in vitro“) compare to the findings in an animal experiment (“in vivo“)? If new genes were introduced into cells or into animals, was the level of activity comparable to levels found in nature or were the gene expression levels 10-, 100- or even 1000-fold higher than physiologic levels?

Examples of issues: In stem cell research, a major problem faced by researchers is how stem cells are defined, what constitutes cell differentiation and how the fate of stem cells is tracked. One common problem that has plagued peer-reviewed studies published in high-profile journals is the inadequate characterization of stem cells and function of mature cells derived from the stem cells. Another problem in the stem cell literature is the fact that stem cells are routinely labeled with fluorescent markers to help track their fate, but it is increasingly becoming apparent that unlabeled cells (i.e. non-stem cells) can emit a non-specific fluorescence that is quite similar to that of the labeled stem cells. If a study does not address such problems, some of its key conclusions may be flawed.

 

6. Statistical analysis:

Did the researchers use the appropriate statistical tests to test the validity of their results? Were the experiments adequately powered (have a sufficient sample size) to draw valid conclusions? Did the researchers pre-specify the number of repeat experiments, animals or humans in their experimental groups prior to conducting the studies? Did they modify the number of animals or human subjects in the experimental groups during the course of the study?

 

7. Consensus or dissent among scientists:

What do other scientists think about the published research? Do they agree with the novelty, significance and validity of the scientific findings as claimed by the authors of the published paper or do they have specific concerns in this regard?

 

8. Peer review process:

What were the major issues raised during the peer review process? How did the researchers address the concerns of the reviewers? Did any journals previously reject the study before it was accepted for publication?

 

9. Financial interests:

How was the study funded? Did the organization or corporation which funded the study have any say in how the study was designed, how the data was analyzed and what data was included in the publication? Do the researchers hold any relevant patents, own stock or receive other financial incentives from institutions or corporations that could benefit from this research?

 

10. Scientific misconduct, fraud or breach of ethics

Are there any allegations or concerns about scientific misconduct, fraud or breach of ethics in the context of the research study? If such concerns exist, what are the specific measures taken by the researchers, institutions or scientific journals to resolve the issues? Have members of the research team been previously investigated for scientific misconduct or fraud? Are there concerns about how informed consent was obtained from the human subjects?

 

This is just a preliminary list and I would welcome any feedback on how to improve this list in order to develop tools for assessing the critical analysis content in science writing. It may not always be possible to obtain the pertinent information. For example, since the peer review process is usually anonymous, it may be impossible for a science writer to find out details about what occurred during the peer review process if the researchers themselves refuse to comment on it.

One could assign a point value to each of the categories in this checklist and then score individual science news articles or science blog-posts that discuss specific research studies. A greater in-depth discussion of any issue should result in a greater point score for that category.

Points would not only be based on the number of issues raised but also on the quality of analysis provided in each category. Listing all the funding sources is not as helpful as providing an analysis of how the funding could have impacted the data interpretation. Similarly, if the science writer notices errors in the experimental design, it would be very helpful for the readers to understand whether these errors invalidate all major conclusions of the study or just some of its conclusions. Adding up all the points would then generate a comprehensive score that could become a quantifiable indicator of the degree of critical analysis contained in a science news article or blog-post.

 

********************

EDIT: The checklist now includes a new category – scientific misconduct, fraud or breach of ethics.

Some Highlights of the Live Chat: “Are We Doing Science the Right Way?”

On February 7, 2013, ScienceNOW organized a Live Chat with the microbiologists Ferric Fang and Arturo Casadevall that was moderated by the Science staff writer Jennifer Couzin-Frankel and discussed a very broad range of topics related to how we currently conduct science. For those who could not participate in the Live Chat, I will summarize some key comments made by Fang and Casadevall, Couzin-Frankel or other commenters.

 

I have grouped the comments into key themes and also added some of my own thoughts.

 

1. Introduction to the goals of the Live Chat:

Jennifer Couzin-Frankel: …..For several years (at least) researchers have worried about where their profession is heading. As much as most of them love working in the lab, they’re also facing sometimes extreme pressure to land grants and publish hot papers. And surveys have shown that a subset are even bending or breaking the rules to accomplish that.….With us today are two guests who are studying the “science of science” together, and considering how to nurture discovery and reduce misconduct…

 

Pressure to publish, the difficulties to obtain grant funding, scientific misconduct – these are all topics that should be of interest to all of us who are actively engaged in science.

 

2. Science funding:

Ferric Fang: ….the way in which science is funded has a profound effect on how and what science is done. Paula Stephan has recently written an excellent book on this subject called “How Economics Shapes Science.”

Ferric Fang: Many are understandably reluctant to ask for more funding given the global recession and halting recovery. But I believe a persuasive economic case can be made for greater investment in R&D paying off in the long run. Paula Stephan notes that the U.S. spends twice as much on beer as on science each year.

 

These are great points. I often get the sense that federal funding for science and education is portrayed as an unnecessary luxury, charity or a form of waste. We have to remind people that investments in science and education are a very important investment with long-term returns.

 

3. Reproducibility and the self-correcting nature of science:

Arturo Casadevall: Is science self-correcting? Yes and No. In areas where there is a lot of interest in a subject experiments will be repeated and bad science will be ferreted out. However, until someone sets out to repeat an experiment we do not know whether it is reproducible. We do not know what percentage of the literature is right because no one has ever done a systematic study to see what fraction is reproducible.

 

I think that the reproducibility crisis is one of the biggest challenges for contemporary science. Thousands of scientific papers are published every day, and only a tiny fraction of them will ever be tested for reproducibility. There is minimal funding for attempting to replicate published data and also very little incentive for scientists, because even if they are able to replicate the published work, they will have a hard time publishing a confirmatory study. The lack of attempts to replicate scientific data creates a lot of uncertainty, because we do not really know, how much of the published data is truly valid.

 

Comment From David R Van Houten: …The absence of these weekly [lab] meetings was the single biggest factor allowing for the data fabrication and falsification that I observed 20 years ago as a PhD student. I pushed to get these meetings organized, and when they did occur, it made it easier to get the offender to stop, and easier to “salvage” original data…

 

I agree that regular lab meetings and more supervision by senior researchers and principal investigators can help contain and prevent data fabrication and falsification. However, overt data fabrication and fraud are probably not as common as “data fudging”, where experiments or data points are conveniently ignored because they do not fit the desired model. This kind of “data fudging” is not just a problem of junior scientists, but also occurs with senior scientists.

 

Ferric Fang: Peer review plays an important role in self-correction of science but as nearly everyone recognizes, it is not perfect. Mechanisms of post-publication review to address the problems are very important– these include errata, retractions, correspondences, follow up publications, and nowadays, public discussion on blogs and other websites.

 

I am glad that Fang (who is an editor-in-chief of an academic journal) recognizes the importance of post-publication review, and mentions blog discussions as one such form of post publication review.

 

4. Are salaries of scientists too low?

Comment From Shabbir: When an hedge fund manager makes 100 times more than a theoretical physicist, how can we expect the bright minds to go to science?

 

I agree that academic salaries for scientists are on the lower side, especially when compared with the salary that one can make in the private industry. However, I do not think that obscene salaries of hedge fund managers are the correct comparison. If the US wants to attract and retain excellent scientists, raising their salaries is definitely important. Scientists are routinely over-worked, balancing their research work, teaching, mentoring and administrative duties and receive very inadequate compensation. I have also observed a near-cynical attitude of many elite universities, which try to portray working as a scientist as an “honor” that should not require much compensation. This kind of abuse really needs to end.

 

5. Communicating science to the public

Arturo Casadevall: … Many scientists cannot explain their work at a dinner party and keep the other guests interested. We are passionate about what we do but we are often terrible in communicating the excitement that we feel. I think this is one area where perhaps better public communicating skills are needed and maybe some attention should be given to mastering these arts in training.

 

I could not agree more. Communicating science should be part of every PhD program, postdoctoral training and an ongoing effort when a scientist becomes an independent principal investigator.

 

6. Are we focusing on quantity rather than quality in science?

Ferric Fang: …. There are now in excess of 50,000,000 scientific publications according to one estimate, and we are in danger of creating a Library of Babel in which it is impossible to find the truth buried amidst poor quality or unimportant publications. This is in part a consequence of the “publish or perish” mentality in academia. A focus on quality rather than quantity in promotion decisions might help.

 

It is correct that the amount of scientific data being generated is overwhelming, but I am not sure that there is an easy way to find the “truth”. Scientific “truth” is very dynamic and I think it is becoming more and more difficult to publish in the high impact journals. A typical paper in a high-impact journal now has anywhere between 5 and 20 supplemental figures and tables, and that same paper could have been published as two or three separate papers just a few decades ago. We now just have many more active scientists all over the world that have begun publishing in English and we all have tools that generate huge amounts of data in a matter of weeks (such as microarrays, proteomics and metabolomics). It is likely that the number of publications will continue to rise in the next years and we need to come up with an innovative system to manage scientific information. Hopefully, scientists will realize that managing and evaluating existing scientific information is just as valuable as generating new scientific datasets.

 

This was a great and inspiring discussion and I look forward to other such Live Chat events.

 

Science Journalism and the Inner Swine Dog

A search of the PubMed database, which indexes scholarly biomedical articles, reveals that 997,508 articles were published in the year 2011, which amounts to roughly 2,700 articles per day. Since the database does not include all published biomedical research articles, the actual number of published biomedical papers is probably even higher. Most biomedical researchers work in defined research areas, so perhaps only 1% of the published articles may be relevant for their research. As an example, the major focus of my research is the biology of stem cells, so I narrowed down the PubMed search to articles containing the expression “stem cells”. I found that 14291 “stem cells” articles were published in 2011, which translates to an average of 39 articles per day (assuming that one reads scientific papers on week-ends and during vacations, which is probably true for most scientists). Many researchers also tend to have two or three areas of interest, which further increases the number of articles one needs to read.


Needless to say, it has become impossible for researchers to read all the articles published in their fields of interest, because if they did that, they would not have any time left to conduct experiments of their own. To avoid drowning in the information overload, researchers have developed multiple strategies how to survive and navigate their way through all this published data. These strategies include relying on recommendations of colleagues, focusing on articles published in high-impact journals, only perusing articles that are directly related to one’s own work or only reading articles that have been cited or featured in major review articles, editorials or commentaries. As a stem cell researcher, I can use the above-mentioned strategies to narrow down the stem cell articles that I ought to read to the manageable number of about three or four articles a day. However, scientific innovation in research is fueled by the cross-fertilization of ideas. The most creative ideas are derived from combining seemingly unrelated research questions. Therefore, the challenge for me is to not only stay informed about important developments in my own areas of interest. I also need to know about major developments in other scientific domains such as network theory, botany or neuroscience, because discoveries in such “distant” fields could inspire me to develop innovative approaches in my own work.
In order to keep up with scientific developments outside of my area of expertise, I have begun to rely on high-quality science journalism, which can be found in selected print and online publications or in science blogs. Good science journalists accurately convey complex scientific concepts in simple language, without oversimplifying the actual science. This is easier said than done, because it requires a solid understanding of the science as well as excellent communication skills. Most scientists are not trained to communicate to the general audience and most journalists have had very limited exposure to actual scientific work. To become good science journalists, either scientists have to be trained in the art of communicating results to non-specialists or journalists have to acquire the scientific knowledge pertinent to the topics they want to write about. The training of science journalists requires time, resources and good mentors.
Once they have completed their training and start working as science journalists, they still need adequate time, resources and mentors. When writing about an important new scientific development, good science journalists do not just repeat the information provided by the researchers or contained in the press release of the university where the research was conducted. Instead, science journalists perform the necessary fact-checking to ensure that the provided information is indeed correct. They also consult the scientific literature as well as other scientific experts to place the new development in the context of the existing research. Importantly, science journalists then analyze the new scientific development, separating the actual scientific data from speculation as well as point out limitations and implications of the work. Science journalists also write for a very broad audience, and this also poses a challenge. Their readership includes members of the general public interested in new scientific findings, politicians and members of the private industry that may base political and economic decisions on scientific findings, patients and physicians that want to stay informed about innovative new treatments and, as mentioned above, scientists that want to know about new scientific research outside of their area of expertise.
Unfortunately, I do not think that it is widely appreciated how important high-quality science journalism is and how much effort it requires. Limited resources, constraints on a journalist’s time and the pressure to publish sensationalist articles that exaggerate or oversimplify the science in order to attract a larger readership can compromise the quality of the work. Two recent examples illustrate this: The so-called Jonah Lehrer controversy, where the highly respected and popular science journalist Jonah Lehrer was found to fabricate quotes, plagiarize and oversimplify the research as well as the more recent case where the Japanese newspaper Yomiuri Shimbun ran a story about the use of induced pluripotent stem cells to treat patients with heart disease, which turned out to be a fraudulent claim of the researcher. The case of Jonah Lehrer was a big shock for me. I had enjoyed reading a number of his articles and blogs that he had written and, at first, it was difficult for me to accept that his work contained so many errors and evidence of misconduct. Boris Kachka has recently written a very profound analysis of the Jonah Lehrer controversy in New York Magazine:

Lehrer was the first of the Millennials to follow his elders into the dubious promised land of the convention hall, where the book, blog, TED talk, and article are merely delivery systems for a core commodity, the Insight.

The Insight is less of an idea than a conceit, a bit of alchemy that transforms minor studies into news, data into magic. Once the Insight is in place—Blink, Nudge, Free, The World Is Flat—the data becomes scaffolding. It can go in the book, along with any caveats, but it’s secondary. The purpose is not to substantiate but to enchant.

Kachka’s expression “Insight” describes our desire to believe in simple narratives. Any active scientist knows that scientific findings tend to be more complex and difficult to interpret than we anticipated. There are few simple truths or “Insights” in science, even though part of us wants to seek out these elusive simple truths. The metaphor that comes to mind is the German expression “der innere Schweinehund”. This literally translates to “the inner swine dog”. The expression may evoke the image of a chimeric pig-dog beast created by a mad German scientist in a Hollywood World War II movie, but in Germany this expression is actually used to describe a metaphorical inner creature that wants us to be lazy, seek out convenience and avoid challenges. In my view, scientific work is an ongoing battle with our “inner swine dog”. We start experiments with simple hypotheses and models, and we are usually quite pleased with results that confirm these anticipated findings because they allow us to be intellectually lazy. However, good scientists know that more often than not, scientific truths are complex and we need to force ourselves to continuously challenge our own scientific concepts. Usually this involves performing more experiments, analyzing more data and trying to interpret data from many different perspectives. Overcoming the intellectual laziness requires work, but most of us who are passionate about science enjoy these challenges and seek out opportunities to battle against our “inner swine dog” instead of succumbing to a state of perpetual intellectual laziness.
When I read Kachka’s description of why Lehrer was able to get away with his fabrications and over-simplifications, I realized that it was probably because Lehrer gave us the narratives we wanted to believe. He provided “Insight” – portraying scientific research in a false shroud of certainty and simplicity. Even though many of us look forward to overcoming intellectual laziness in our own work, we may not be used to challenging our “inner swine dog” when we learn about scientific topics outside of our own areas of expertise. This is precisely why we need good science journalists, who challenge us intellectually by avoiding over-simplifications.

A different but equally instructive case of poor science journalism occurred when the widely circulated Japanese newspaper Yomiuri Shimbun reported in early October of 2012 that the Japanese researcher Hisashi Moriguchi had transplanted induced pluripotent stem cells into patients with heart disease. This was quite a sensation, because it would have been the first transplantation of this kind of stem cells into real patients. For those of us in the field of stem cell research, this came as a big surprise and did not sound very believable, because the story suggested that the work had been performed in the United States and most of us knew that obtaining approvals for using such stem cells in clinical studies would have been very challenging. However, it is very likely that many people who were not acquainted with the complexities of using stem cells in patients may have believed the story. Within days, it became apparent that the researcher’s claims were fraudulent. He had said that he had conducted the studies at Harvard, but Harvard stated that he was not currently affiliated with them and there was no evidence of any such studies ever being conducted there. His claims of how he derived the cells and in how little time he supposedly performed the experiments were also debunked.
This was not the first incident of scientific fraud in the world of stem cell research and it unfortunately will not be the last. What makes this incident noteworthy is how the newspaper Yomiuri Shimbun responded to their reporting of these fraudulent claims. They removed the original story from their page and issued public apologies for their poor reporting. The English-language version of the newspaper listed the mistakes in an article entitled “iPS REPORTS–WHAT WENT WRONG / Moriguchi reporting left questions unanswered”. These problems include inadequate fact-checking regarding the researcher’s claims and affiliations by the reporter and lack of consultation with other scientists whether the findings sounded reasonable. Interestingly, the reporter had identified some red flags and concerns:

–Moriguchi had not published any research on animal experiments.
–The reporter had not been able to contact people who could confirm the iPS cell clinical applications.
–Moriguchi’s affiliation with Harvard University could not be confirmed online.
–It was possible that different cells, instead of iPS cells, had been effective in the treatments.
–It was odd that what appeared to be major world news was appearing only in the form of a poster at a science conference.
–The reporter wondered if it was really possible that transplant operations using iPS cells had been approved at Harvard.
The reporter sent the e-mail to three others, including another news editor in charge of medical science, on the same day, and the reporter’s regular updates on the topic were shared among them.
The science reporter said he felt “at ease” after informing the editors about such dubious points. After receiving explanations from Moriguchi, along with the video clip and other materials, the reporter sought opinions from only one expert and came to believe the doubts had been resolved.

In spite of these red flags, the reporter and the editors decided to run the story. The reporter and the editors gave in to their intellectual laziness and desire of running a sensational story instead of tediously following up on all the red flags. They had a story about a Japanese researcher making a ground-breaking discovery in a very competitive area of stem cell research and this was the story that their readers would probably love. This unprofessional conduct is why the reporter and the editors received reprimands and penalties for their actions. Another article in the newspaper summarizes the punitive measures:

Effective as of next Thursday, The Yomiuri Shimbun will take disciplinary action against the following officials and employees:
–Yoshimitsu Ohashi, senior managing director and managing editor of the company, and Takeshi Mizoguchi, corporate officer and senior deputy managing editor, will each return 30 percent of their remuneration and salary for two months.
–Fumitaka Shibata, a deputy managing editor and editor of the Science News Department, will be replaced and his salary will be reduced.
–Another deputy managing editor in charge of editorial work for the Oct. 11 edition will receive an official reprimand.
–The salaries of two deputy editors of the Science News Department will be cut.
–A reporter in charge of the Oct. 11 series will receive an official reprimand.

I have mixed feelings about these punitive actions. I think it is commendable that the newspaper made apologies without reservations or excuses and listed its mistakes. The reprimands and penalties also highlight that the newspaper takes it science journalism very seriously and recognizes the importance of high professional standards. The penalties were also more severe for its editors than for the reporter, which may reflect the fact that the reporter did consult with the editors and they decided to run the story even though the red flags had been pointed out to them. My concerns arise from the fact that I am not sure punitive actions will solve the problem and they leave a lot of questions unanswered. Did the newspaper evaluate whether the science journalists and editors had been appropriately trained? Did the science journalist have the time and resources to conduct his or her research in a conscientious manner? Importantly, will science journalists be given the appropriate resources and protected from pressures or constraints that encourage unprofessional science journalism? We do not know the answers to these questions, but providing the infrastructure for high quality science journalism is probably going to be more useful than mere punitive actions. We can also hope that media organizations all over the world learn from this incident and recognize the importance of science journalism and put mechanisms in place to ensure that its quality.

Image via Wikimedia Commons/ Norbert Schnitzler: Statue “Mein Innerer Schweinhund” in Bonn