Lingulodinium polyedrum is a unicellular marine organism which belongs to the dinoflagellate group of algae. Its genome is among the largest found in any species on this planet, estimated to contain around 165 billion DNA base pairs – roughly fifty times larger than the size of the human genome. Encased in magnificent polyhedral shells, these bioluminescent algae became important organisms to study biological rhythms. Each Lingulodinium polyedrum cell contains not one but at least two internal clocks which keep track of time by oscillating at a frequency of approximately 24 hours. Algae maintained in continuous light for weeks continue to emit a bluish-green glow at what they perceive as night-time and swim up to the water surface during day-time hours – despite the absence of any external time cues. When I began studying how nutrients affect the circadian rhythms of these algae as a student at the University of Munich, I marveled at the intricacy and beauty of these complex time-keeping mechanisms that had evolved over hundreds of millions of years.
I was prompted to revisit the role of Beauty in biology while reading a masterpiece of scientific writing, “Dreams of a Final Theory” by the Nobel laureate Steven Weinberg in which he describes how the search for Beauty has guided him and many fellow theoretical physicists to search for an ultimate theory of the fundamental forces of nature. Weinberg explains that it is quite difficult to precisely define what constitutes Beauty in physics but a physicist would nevertheless recognize it when she sees it.Over the course of a quarter of a century, I have worked in a variety of biological fields, from these initial experiments in marine algae to how stem cells help build human blood vessels and how mitochondria in a cell fragment and reconnect as cells divide. Each project required its own set of research methods and techniques, each project came with its own failures and successes. But with each project, my sense of awe for the beauty of nature has grown. Evolution has bestowed this planet with such an amazing diversity of life-forms and biological mechanisms, allowing organisms to cope with the unique challenges that they face in their respective habitats. But it is only recently that I have become aware of the fact that my sense of biological beauty was a post hoc phenomenon: Beauty was what I perceived after reviewing the experimental findings; I was not guided by a quest for beauty while designing experiments. In fact, I would have been worried that such an approach might bias the design and interpretation of experiments. Might a desire for seeing Beauty in cell biology lead one to consciously or subconsciously discard results that might seem too messy?
One such key characteristic of a beautiful scientific theory is the simplicity of the underlying concepts. According to Weinberg, Einstein’s theory of gravitation is described in fourteen equations whereas Newton’s theory can be expressed in three. Despite the appearance of greater complexity in Einstein’s theory, Weinberg finds it more beautiful than Newton’s theory because the Einsteinian approach rests on one elegant central principle – the equivalence of gravitation and inertia. Weinberg’s second characteristic for beautiful scientific theories is their inevitability. Every major aspect of the theory seems so perfect that it cannot be tweaked or improved on. Any attempt to significantly modify Einstein’s theory of general relativity would lead to undermining its fundamental concepts, just like any attempts to move around parts of Raphael’s Holy Family would weaken the whole painting.
Can similar principles be applied to biology? I realized that when I give examples of beauty in biology, I focus on the complexity and diversity of life, not its simplicity or inevitability. Perhaps this is due to the fact that Weinberg was describing the search of fundamental laws of physics, laws which would explain the basis of all matter and energy – our universe. As cell biologists, we work several orders of magnitude removed from these fundamental laws. Our building blocks are organic molecules such as proteins and sugars. We find little evidence of inevitability in the molecular pathways we study – cells have an extraordinary ability to adapt. Mutations in genes or derangement in molecular signaling can often be compensated by alternate cellular pathways.
This also points to a fundamental difference in our approaches to the world. Physicists searching for the fundamental laws of nature balance the development of fundamental theories whereas biology in its current form has primarily become an experimental discipline. The latest technological developments in DNA and RNA sequencing, genome editing, optogenetics and high resolution imaging are allowing us to amass unimaginable quantities of experimental data. In fact, the development of technologies often drives the design of experiments. The availability of a genetically engineered mouse model that allows us to track the fate of individual cells that express fluorescent proteins, for example, will give rise to numerous experiments to study cell fate in various disease models and organs. Much of the current biomedical research funding focuses on studying organisms that provide technical convenience such as genetically engineered mice or fulfill a societal goal such as curing human disease.
Uncovering fundamental concepts in biology requires comparative studies across biology and substantial investments in research involving a plethora of other species. In 1990, the National Institutes of Health (NIH – the primary government funding source for biomedical research in the United States) designated a handful of species as model organisms to study human disease, including mice, rats, zebrafish and fruit flies. A recent analysis of the species studied in scientific publications showed that in 1960, roughly half the papers studied what would subsequently be classified as model organisms whereas the other half of papers studied additional species. By 2010, over 80% of the scientific papers were now being published on model organisms and only 20% were devoted to other species, thus marking a significant dwindling of broader research goals in biology. More importantly, even among the model organisms, there has been a clear culling of research priorities with a disproportionately large growth in funding and publications for studies using mice. Thousands of scientific papers are published every month on the cell signaling pathways and molecular biology in mouse and human cells whereas only a minuscule fraction of research resources are devoted to studying signaling pathways in algae.
The question of whether or not biologists should be guided by conceptual Beauty leads us to the even more pressing question of whether we need to broaden biological research. If we want to mirror the dizzying success of fundamental physics during the past century and similarly advance fundamental biology, then we need substantially step-up investments in fundamental biological research that is not constrained by medical goals.
We often laud intellectual diversity of a scientific research group because we hope that the multitude of opinions can help point out flaws and improve the quality of research long before it is finalized and written up as a manuscript. The recent events surrounding the research in one of the world’s most famous stem cell research laboratories at Harvard shows us the disastrous effects of suppressing diverse and dissenting opinions.
The infamous “Orlic paper” was a landmark research article published in the prestigious scientific journal Nature in 2001, which showed that stem cells contained in the bone marrow could be converted into functional heart cells. After a heart attack, injections of bone marrow cells reversed much of the heart attack damage by creating new heart cells and restoring heart function. It was called the “Orlic paper” because the first author of the paper was Donald Orlic, but the lead investigator of the study was Piero Anversa, a professor and highly respected scientist at New York Medical College.
Anversa had established himself as one of the world’s leading experts on the survival and death of heart muscle cells in the 1980s and 1990s, but with the start of the new millennium, Anversa shifted his laboratory’s focus towards the emerging field of stem cell biology and its role in cardiovascular regeneration. The Orlic paper was just one of several highly influential stem cell papers to come out of Anversa’s lab at the onset of the new millenium. A 2002 Anversa paper in the New England Journal of Medicine – the world’s most highly cited academic journal –investigated the hearts of human organ transplant recipients. This study showed that up to 10% of the cells in the transplanted heart were derived from the recipient’s own body. The only conceivable explanation was that after a patient received another person’s heart, the recipient’s own cells began maintaining the health of the transplanted organ. The Orlic paper had shown the regenerative power of bone marrow cells in mouse hearts, but this new paper now offered the more tantalizing suggestion that even human hearts could be regenerated by circulating stem cells in their blood stream.
A 2003 publication in Cell by the Anversa group described another ground-breaking discovery, identifying a reservoir of stem cells contained within the heart itself. This latest coup de force found that the newly uncovered heart stem cell population resembled the bone marrow stem cells because both groups of cells bore the same stem cell protein called c-kit and both were able to make new heart muscle cells. According to Anversa, c-kit cells extracted from a heart could be re-injected back into a heart after a heart attack and regenerate more than half of the damaged heart!
These Anversa papers revolutionized cardiovascular research. Prior to 2001, most cardiovascular researchers believed that the cell turnover in the adult mammalian heart was minimal because soon after birth, heart cells stopped dividing. Some organs or tissues such as the skin contained stem cells which could divide and continuously give rise to new cells as needed. When skin is scraped during a fall from a bike, it only takes a few days for new skin cells to coat the area of injury and heal the wound. Unfortunately, the heart was not one of those self-regenerating organs. The number of heart cells was thought to be more or less fixed in adults. If heart cells were damaged by a heart attack, then the affected area was replaced by rigid scar tissue, not new heart muscle cells. If the area of damage was large, then the heart’s pump function was severely compromised and patients developed the chronic and ultimately fatal disease known as “heart failure”.
Anversa’s work challenged this dogma by putting forward a bold new theory: the adult heart was highly regenerative, its regeneration was driven by c-kit stem cells, which could be isolated and used to treat injured hearts. All one had to do was harness the regenerative potential of c-kit cells in the bone marrow and the heart, and millions of patients all over the world suffering from heart failure might be cured. Not only did Anversa publish a slew of supportive papers in highly prestigious scientific journals to challenge the dogma of the quiescent heart, he also happened to publish them at a unique time in history which maximized their impact.
In the year 2001, there were few innovative treatments available to treat patients with heart failure. The standard approach was to use medications that would delay the progression of heart failure. But even the best medications could not prevent the gradual decline of heart function. Organ transplants were a cure, but transplantable hearts were rare and only a small fraction of heart failure patients would be fortunate enough to receive a new heart. Hopes for a definitive heart failure cure were buoyed when researchers isolated human embryonic stem cells in 1998. This discovery paved the way for using highly pliable embryonic stem cells to create new heart muscle cells, which might one day be used to restore the heart’s pump function without resorting to a heart transplant.
The dreams of using embryonic stem cells to regenerate human hearts were soon squashed when the Bush administration banned the generation of new human embryonic stem cells in 2001, citing ethical concerns. These federal regulations and the lobbying of religious and political groups against human embryonic stem cells were a major blow to research on cardiovascular regeneration. Amidst this looming hiatus in cardiovascular regeneration, Anversa’s papers appeared and showed that one could steer clear of the ethical controversies surrounding embryonic stem cells by using an adult patient’s own stem cells. The Anversa group re-energized the field of cardiovascular stem cell research and cleared the path for the first human stem cell treatments in heart disease.
Instead of having to wait for the US government to reverse its restrictive policy on human embryonic stem cells, one could now initiate clinical trials with adult stem cells, treating heart attack patients with their own cells and without having to worry about an ethical quagmire. Heart failure might soon become a disease of the past. The excitement at all major national and international cardiovascular conferences was palpable whenever the Anversa group, their collaborators or other scientists working on bone marrow and cardiac stem cells presented their dizzyingly successful results. Anversa received numerous accolades for his discoveries and research grants from the NIH (National Institutes of Health) to further develop his research program. He was so successful that some researchers believed Anversa might receive the Nobel Prize for his iconoclastic work which had redefined the regenerative potential of the heart. Many of the world’s top universities were vying to recruit Anversa and his group, and he decided to relocate his research group to Harvard Medical School and Brigham and Women’s Hospital 2008.
There were naysayers and skeptics who had resisted the adult stem cell euphoria. Some researchers had spent decades studying the heart and found little to no evidence for regeneration in the adult heart. They were having difficulties reconciling their own results with those of the Anversa group. A number of practicing cardiologists who treated heart failure patients were also skeptical because they did not see the near-miraculous regenerative power of the heart in their patients. One Anversa paper went as far as suggesting that the whole heart would completely regenerate itself roughly every 8-9 years, a claim that was at odds with the clinical experience of practicing cardiologists. Other researchers pointed out serious flaws in the Anversa papers. For example, the 2002 paper on stem cells in human heart transplant patients claimed that the hearts were coated with the recipient’s regenerative cells, including cells which contained the stem cell marker Sca-1. Within days of the paper’s publication, many researchers were puzzled by this finding because Sca-1 was a marker of mouse and rat cells – not human cells! If Anversa’s group was finding rat or mouse proteins in human hearts, it was most likely due to an artifact. And if they had mistakenly found rodent cells in human hearts, so these critics surmised, perhaps other aspects of Anversa’s research were similarly flawed or riddled with artifacts.
At national and international meetings, one could observe heated debates between members of the Anversa camp and their critics. The critics then decided to change their tactics. Instead of just debating Anversa and commenting about errors in the Anversa papers, they invested substantial funds and efforts to replicate Anversa’s findings. One of the most important and rigorous attempts to assess the validity of the Orlic paper was published in 2004, by the research teams of Chuck Murry and Loren Field. Murry and Field found no evidence of bone marrow cells converting into heart muscle cells. This was a major scientific blow to the burgeoning adult stem cell movement, but even this paper could not deter the bone marrow cell champions.
The skeptics who had doubted Anversa’s claims all along may now feel vindicated, but this is not the time to gloat. Instead, the discipline of cardiovascular stem cell biology is now undergoing a process of soul-searching. How was it possible that some of the most widely read and cited papers were based on heavily flawed observations and assumptions? Why did it take more than a decade since the first refutation was published in 2004 for scientists to finally accept that the near-magical regenerative power of the heart turned out to be a pipe dream.
One reason for this lag time is pretty straightforward: It takes a tremendous amount of time to refute papers. Funding to conduct the experiments is difficult to obtain because grant funding agencies are not easily convinced to invest in studies replicating existing research. For a refutation to be accepted by the scientific community, it has to be at least as rigorous as the original, but in practice, refutations are subject to even greater scrutiny. Scientists trying to disprove another group’s claim may be asked to develop even better research tools and technologies so that their results can be seen as more definitive than those of the original group. Instead of relying on antibodies to identify c-kit cells, the 2014 refutation developed a transgenic mouse in which all c-kit cells could be genetically traced to yield more definitive results – but developing new models and tools can take years.
The scientific peer review process by external researchers is a central pillar of the quality control process in modern scientific research, but one has to be cognizant of its limitations. Peer review of a scientific manuscript is routinely performed by experts for all the major academic journals which publish original scientific results. However, peer review only involves a “review”, i.e. a general evaluation of major strengths and flaws, and peer reviewers do not see the original raw data nor are they provided with the resources to replicate the studies and confirm the veracity of the submitted results. Peer reviewers rely on the honor system, assuming that the scientists are submitting accurate representations of their data and that the data has been thoroughly scrutinized and critiqued by all the involved researchers before it is even submitted to a journal for publication. If peer reviewers were asked to actually wade through all the original data generated by the scientists and even perform confirmatory studies, then the peer review of every single manuscript could take years and one would have to find the money to pay for the replication or confirmation experiments conducted by peer reviewers. Publication of experiments would come to a grinding halt because thousands of manuscripts would be stuck in the purgatory of peer review. Relying on the integrity of the scientists submitting the data and their internal review processes may seem naïve, but it has always been the bedrock of scientific peer review. And it is precisely the internal review process which may have gone awry in the Anversa group.
Just like Pygmalion fell in love with Galatea, researchers fall in love with the hypotheses and theories that they have constructed. To minimize the effects of these personal biases, scientists regularly present their results to colleagues within their own groups at internal lab meetings and seminars or at external institutions and conferences long before they submit their data to a peer-reviewed journal. The preliminary presentations are intended to spark discussions, inviting the audience to challenge the veracity of the hypotheses and the data while the work is still in progress. Sometimes fellow group members are truly skeptical of the results, at other times they take on the devil’s advocate role to see if they can find holes in their group’s own research. The larger a group, the greater the chance that one will find colleagues within a group with dissenting views. This type of feedback is a necessary internal review process which provides valuable insights that can steer the direction of the research.
Considering the size of the Anversa group – consisting of 20, 30 or even more PhD students, postdoctoral fellows and senior scientists – it is puzzling why the discussions among the group members did not already internally challenge their hypotheses and findings, especially in light of the fact that they knew extramural scientists were having difficulties replicating the work.
“I think that most scientists, perhaps with the exception of the most lucky or most dishonest, have personal experience with failure in science—experiments that are unreproducible, hypotheses that are fundamentally incorrect. Generally, we sigh, we alter hypotheses, we develop new methods, we move on. It is the data that should guide the science.
In the Anversa group, a model with much less intellectual flexibility was applied. The “Hypothesis” was that c-kit (cd117) positive cells in the heart (or bone marrow if you read their earlier studies) were cardiac progenitors that could: 1) repair a scarred heart post-myocardial infarction, and: 2) supply the cells necessary for cardiomyocyte turnover in the normal heart.
This central theme was that which supplied the lab with upwards of $50 million worth of public funding over a decade, a number which would be much higher if one considers collaborating labs that worked on related subjects.
In theory, this hypothesis would be elegant in its simplicity and amenable to testing in current model systems. In practice, all data that did not point to the “truth” of the hypothesis were considered wrong, and experiments which would definitively show if this hypothesis was incorrect were never performed (lineage tracing e.g.).”
Discarding data that might have challenged the central hypothesis appears to have been a central principle.
According to the whistleblower, Anversa’s group did not just discard undesirable data, they actually punished group members who would question the group’s hypotheses:
“In essence, to Dr. Anversa all investigators who questioned the hypothesis were “morons,” a word he used frequently at lab meetings. For one within the group to dare question the central hypothesis, or the methods used to support it, was a quick ticket to dismissal from your position.“
The group also created an environment of strict information hierarchy and secrecy which is antithetical to the spirit of science:
“The day to day operation of the lab was conducted under a severe information embargo. The lab had Piero Anversa at the head with group leaders Annarosa Leri, Jan Kajstura and Marcello Rota immediately supervising experimentation. Below that was a group of around 25 instructors, research fellows, graduate students and technicians. Information flowed one way, which was up, and conversation between working groups was generally discouraged and often forbidden.
Raw data left one’s hands, went to the immediate superior (one of the three named above) and the next time it was seen would be in a manuscript or grant. What happened to that data in the intervening period is unclear.
A side effect of this information embargo was the limitation of the average worker to determine what was really going on in a research project. It would also effectively limit the ability of an average worker to make allegations regarding specific data/experiments, a requirement for a formal investigation.“
This segregation of information is a powerful method to maintain an authoritarian rule and is more typical for terrorist cells or intelligence agencies than for a scientific lab, but it would definitely explain how the Anversa group was able to mass produce numerous irreproducible papers without any major dissent from within the group.
In addition to the secrecy and segregation of information, the group also created an atmosphere of fear to ensure obedience:
“Although individually-tailored stated and unstated threats were present for lab members, the plight of many of us who were international fellows was especially harrowing. Many were technically and educationally underqualified compared to what might be considered average research fellows in the United States. Many also originated in Italy where Dr. Anversa continues to wield considerable influence over biomedical research.
This combination of being undesirable to many other labs should they leave their position due to lack of experience/training, dependent upon employment for U.S. visa status, and under constant threat of career suicide in your home country should you leave, was enough to make many people play along.
Even so, I witnessed several people question the findings during their time in the lab. These people and working groups were subsequently fired or resigned. I would like to note that this lab is not unique in this type of exploitative practice, but that does not make it ethically sound and certainly does not create an environment for creative, collaborative, or honest science.”
Foreign researchers are particularly dependent on their employment to maintain their visa status and the prospect of being fired from one’s job can be terrifying for anyone.
This is an anonymous account of a whistleblower and as such, it is problematic. The use of anonymous sources in science journalism could open the doors for all sorts of unfounded and malicious accusations, which is why the ethics of using anonymous sources was heavily debated at the recent ScienceOnline conference. But the claims of the whistleblower are not made in a vacuum – they have to be evaluated in the context of known facts. The whistleblower’s claim that the Anversa group and their collaborators received more than $50 million to study bone marrow cell and c-kit cell regeneration of the heart can be easily verified at the public NIH grant funding RePORTer website. The whistleblower’s claim that many of the Anversa group’s findings could not be replicated is also a verifiable fact. It may seem unfair to condemn Anversa and his group for creating an atmosphere of secrecy and obedience which undermined the scientific enterprise, caused torment among trainees and wasted millions of dollars of tax payer money simply based on one whistleblower’s account. However, if one looks at the entire picture of the amazing rise and decline of the Anversa group’s foray into cardiac regeneration, then the whistleblower’s description of the atmosphere of secrecy and hierarchy seems very plausible.
The investigation of Harvard into the Anversa group is not open to the public and therefore it is difficult to know whether the university is primarily investigating scientific errors or whether it is also looking into such claims of egregious scientific misconduct and abuse of scientific trainees. It is unlikely that Anversa’s group is the only group that might have engaged in such forms of misconduct. Threatening dissenting junior researchers with a loss of employment or visa status may be far more common than we think. The gravity of the problem requires that the NIH – the major funding agency for biomedical research in the US – should look into the prevalence of such practices in research labs and develop safeguards to prevent the abuse of science and scientists.
The family of cholesterol lowering drugs known as ‘statins’ are among the most widely prescribed medications for patients with cardiovascular disease. Large-scale clinical studies have repeatedly shown that statins can significantly lower cholesterol levels and the risk of future heart attacks, especially in patients who have already been diagnosed with cardiovascular disease. A more contentious issue is the use of statins in individuals who have no history of heart attacks, strokes or blockages in their blood vessels. Instead of waiting for the first major manifestation of cardiovascular disease, should one start statin therapy early on to prevent cardiovascular disease?
If statins were free of charge and had no side effects whatsoever, the answer would be rather straightforward: Go ahead and use them as soon as possible. However, like all medications, statins come at a price. There is the financial cost to the patient or their insurance to pay for the medications, and there is a health cost to the patients who experience potential side effects. The Guideline Panel of the American College of Cardiology (ACC) and the American Heart Association (AHA) therefore recently recommended that the preventive use of statins in individuals without known cardiovascular disease should be based on personalized risk calculations. If the risk of developing disease within the next 10 years is greater than 7.5%, then the benefits of statin therapy outweigh its risks and the treatment should be initiated. The panel also indicated that if the 10-year risk of cardiovascular disease is greater than 5%, then physicians should consider prescribing statins, but should bear in mind that the scientific evidence for this recommendation was not as strong as that for higher-risk individuals.
Using statins in low risk patients
The recommendation that individuals with comparatively low risk of developing future cardiovascular disease (10-year risk lower than 10%) would benefit from statins was met skepticism by some medical experts. In October 2013, the British Medical Journal (BMJ)published a paper by John Abramson, a lecturer at Harvard Medical School, and his colleagues which re-evaluated the data from a prior study on statin benefits in patients with less than 10% cardiovascular disease risk over 10 years. Abramson and colleagues concluded that the statin benefits were over-stated and that statin therapy should not be expanded to include this group of individuals. To further bolster their case, Abramson and colleagues also cited a 2013 study by Huabing Zhang and colleagues in the Annals of Internal Medicine which (according to Abramson et al.) had reported that 18 % of patients discontinued statins due to side effects. Abramson even highlighted the finding from the Zhang study by including it as one of four bullet points summarizing the key take-home messages of his article.
The problem with this characterization of the Zhang study is that it ignored all the caveats that Zhang and colleagues had mentioned when discussing their findings. The Zhang study was based on the retrospective review of patient charts and did not establish a true cause-and-effect relationship between the discontinuation of the statins and actual side effects of statins. Patients may stop taking medications for many reasons, but this does not necessarily mean that it is due to side effects from the medication. According to the Zhang paper, 17.4% of patients in their observational retrospective study had reported a “statin related incident” and of those only 59% had stopped the medication. The fraction of patients discontinuing statins due to suspected side effects was at most 9-10% instead of the 18% cited by Abramson. But as Zhang pointed out, their study did not include a placebo control group. Trials with placebo groups document similar rates of “side effects” in patients taking statins and those taking placebos, suggesting that only a small minority of perceived side effects are truly caused by the chemical compounds in statin drugs.
Admitting errors is only the first step
Whether 18%, 9% or a far smaller proportion of patients experience significant medication side effects is no small matter because the analysis could affect millions of patients currently being treated with statins. A gross overestimation of statin side effects could prompt physicians to prematurely discontinue medications that have been shown to significantly reduce the risk of heart attacks in a wide range of patients. On the other hand, severely underestimating statin side effects could result in the discounting of important symptoms and the suffering of patients. Abramson’s misinterpretation of statin side effect data was pointed out by readers of the BMJ soon after the article published, and it prompted an inquiry by the journal. After re-evaluating the data and discussing the issue with Abramson and colleagues, the journal issued a correction in which it clarified the misrepresentation of the Zhang paper.
Fiona Godlee, the editor-in-chief of the BMJ also wrote an editorial explaining the decision to issue a correction regarding the question of side effects and that there was not sufficient cause to retract the whole paper since the other points made by Abramson and colleagues – the lack of benefit in low risk patients – might still hold true. Instead, Godlee recognized the inherent bias of a journal’s editor when it comes to deciding on whether or not to retract a paper. Every retraction of a peer reviewed scholarly paper is somewhat of an embarrassment to the authors of the paper as well as the journal because it suggests that the peer review process failed to identify one or more major flaws. In a commendable move, the journal appointed a multidisciplinary review panel which includes leading cardiovascular epidemiologists. This panel will review the Abramson paper as well as another BMJ paper which had also cited the inaccurately high frequency of statin side effects, investigate the peer review process that failed to identify the erroneous claims and provide recommendations regarding the ultimate fate of the papers.
Reviewing peer review
Why didn’t the peer reviewers who evaluated Abramson’s article catch the error prior to its publication? We can only speculate as to why such a major error was not identified by the peer reviewers. One has to bear in mind that “peer review” for academic research journals is just that – a review. In most cases, peer reviewers do not have access to the original data and cannot check the veracity or replicability of analyses and experiments. For most journals, peer review is conducted on a voluntary (unpaid) basis by two to four expert reviewers who routinely spend multiple hours analyzing the appropriateness of the experimental design, methods, presentation of results and conclusions of a submitted manuscript. The reviewers operate under the assumption that the authors of the manuscript are professional and honest in terms of how they present the data and describe their scientific methodology.
In the case of Abramson and colleagues, the correction issued by the BMJ refers not to Abramson’s own analysis but to the misreading of another group’s research. Biomedical research papers often cite 30 or 40 studies, and it is unrealistic to expect that peer reviewers read all the cited papers and ensure that they are being properly cited and interpreted. If this were the expectation, few peer reviewers would agree to serve as volunteer reviewers since they would have hardly any time left to conduct their own research. However, in this particular case, most peer reviewers familiar with statins and the controversies surrounding their side effects should have expressed concerns regarding the extraordinarily high figure of 18% cited by Abramson and colleagues. Hopefully, the review panel will identify the reasons for the failure of BMJ’s peer review system and point out ways to improve it.
It is difficult to obtain precise numbers to quantify the actual extent of severe research misconduct and fraud since it may go undetected. Even when such cases are brought to the attention of the academic leadership, the involved committees and administrators may decide to keep their findings confidential and not disclose them to the public. However, most researchers working in academic research environments would probably agree that these are rare occurrences. A far more likely source of errors in research is the cognitive bias of the researchers. Researchers who believe in certain hypotheses and ideas are prone to interpreting data in a manner most likely to support their preconceived notions. For example, it is likely that a researcher opposed to statin usage will interpret data on side effects of statins differently than a researcher who supports statin usage. While Abramson may have been biased in the interpretation of the data generated by Zhang and colleagues, the field of cardiovascular regeneration is currently grappling in what appears to be a case of biased interpretation of one’s own data. An institutional review by Harvard Medical School and Brigham and Women’s Hospital recently determined that the work of Piero Anversa, one of the world’s most widely cited stem cell researchers, was significantly compromised and warranted a retraction. His group had reported that the adult human heart exhibited an amazing regenerative potential, suggesting that roughly every 8 to 9 years the adult human heart replaces its entire collective of beating heart cells (a 7% – 19% yearly turnover of beating heart cells). These findings were in sharp contrast to a prior study which had found only a minimal turnover of beating heart cells (1% or less per year) in adult humans. Anversa’s finding was also at odds with the observations of clinical cardiologists who rarely observe a near-miraculous recovery of heart function in patients with severe heart disease. One possible explanation for the huge discrepancy between the prior research and Anversa’s studies was that Anversa and his colleagues had not taken into account the possibility of contaminations that could have falsely elevated the cell regeneration counts.
Improving the quality of research: peer review and more
Despite the fact that researchers are prone to make errors due to inherent biases does not mean we should simply throw our hands up in the air, say “Mistakes happen!” and let matters rest. High quality science is characterized by its willingness to correct itself, and this includes improving methods to detect and correct scientific errors early on so that we can limit their detrimental impact. The realization that lack of reproducibility of peer-reviewed scientific papers is becoming a major problem for many areas of research such as psychology, stem cell research and cancer biology has prompted calls for better ways to track reproducibility and errors in science.
One important new paradigm that is being discussed to improve the quality of scholar papers is the role of post-publication peer evaluation. Instead of viewing the publication of a peer-reviewed research paper as an endpoint, post publication peer evaluation invites fellow scientists to continue commenting on the quality and accuracy of the published research even after its publication and to engage the authors in this process. Traditional peer review relies on just a handful of reviewers who decide about the fate of a manuscript, but post publication peer evaluation opens up the debate to hundreds or even thousands of readers which may be able to detect errors that could not be identified by the small number of traditional peer reviewers prior to publication. It is also becoming apparent that science journalists and science writers can play an important role in the post-publication evaluation of published research papers by investigating and communicating research flaws identified in research papers. In addition to helping dismantle the Science Mystique, critical science journalism can help ensure that corrections, retractions or other major concerns about the validity of scientific findings are communicated to a broad non-specialist audience.
In addition to these ongoing efforts to reduce errors in science by improving the evaluation of scientific papers, it may also be useful to consider new pro-active initiatives which focus on how researchers perform and design experiments. As the head of a research group at an American university, I have to take mandatory courses (in some cases on an annual basis) informing me about laboratory hazards, ethics of animal experimentation or the ethics of how to conduct human studies. However, there are no mandatory courses helping us identify our own research biases or how to minimize their impact on the interpretation of our data. There is an underlying assumption that if you are no longer a trainee, you probably know how to perform and interpret scientific experiments. I would argue that it does not hurt to remind scientists regularly – no matter how junior or senior- that they can become victims of their biases. We have to learn to continuously re-evaluate how we conduct science and to be humble enough to listen to our colleagues, especially when they disagree with us.
The patient has verified his or her identity, the surgical site, the type of procedure, and his or her consent. Check.
The surgical site is marked on a patient if such marking is appropriate for the procedure. Check.
The probe measuring blood oxygen content has been placed on the patient and is functioning. Check.
All members of the surgical and anesthesia team are aware of whether the patient has a known allergy? Check.
These were the first items on a nineteen-point World Health Organization (WHO) surgical safety checklist from an international research study to evaluate the impact of routinely using checklists in operating rooms. The research involved over 7,500 patients undergoing surgery in eight hospitals (Toronto, Canada; New Delhi, India; Amman, Jordan; Auckland, New Zealand; Manila, Philippines; Ifakara, Tanzania; London, England; and Seattle, WA) and was published in the New England Journal of Medicine in 2009.
Some of the items on the checklist were already part of standard care at many of the enrolled hospitals, such as the use of oxygen monitoring probes. Other items, such as ensuring that there was a contingency plan for major blood loss prior to each surgical procedure, were not part of routine surgical practice. The impact of checklist implementation was quite impressive, showing that this simple safety measure nearly halved the rate of death in surgical patients from 1.6% to 0.8%. The infection rate at the site of the surgical procedure also decreased from 6.2% in the months preceding the checklist introduction to a mere 3.4%.
Checklists as a Panacea?
The remarkable results of the 2009 study were met with widespread enthusiasm. This low-cost measure could be easily implemented in hospitals all over the world and could potentially lead to major improvements in patient outcomes. It also made intuitive sense that encouraging communication between surgical team members via checklists would reduce complications after surgery.
A few weeks after the study’s publication, the National Patient Safety Agency (NPSA) in the United Kingdom issued a patient safety alert, requiring National Health Service (NHS) organizations to use the WHO Surgical Safety Checklist for all patients undergoing surgical procedures. In 2010, Canada followed suit and also introduced regulations requiring the use of surgical safety checklists. However, the data for the efficacy of such lists had only been obtained in observational research studies conducted in selected hospitals. Would widespread mandatory implementation of such a system in “real world” community hospitals also lead to similar benefits?
A recently published study in the New England Journal of Medicine lead by Dr. David Urbach at the University of Toronto has now reviewed the surgery outcomes of hospitals in Ontario, Canada, comparing the rate of surgical complications during three-month periods before and after the implementation of the now mandatory checklists. Nearly all the hospitals reported that they were adhering to the checklist requirements and the vast majority used either a checklist developed by the Canadian Patient Safety Institute, which is even more comprehensive than the WHO checklist or other similar checklists. After analyzing the results of more than 200,000 procedures at 101 hospitals, Urbach and colleagues found no significant change in the rate of death after surgery after the introduction of the checklists (0.71% versus 0.65% – not statistically significant). Even the overall complication rates or the infection rates in the Ontario hospitals did not change significantly after surgical teams were required to complete the checklists.
Check the Checklist
The discrepancy in the results between the two studies is striking. How can one study demonstrate such a profound benefit of introducing checklists while a second study shows no significant impact at all? The differences between the two studies may hold some important clues. The 2009 study had a pre-checklist death rate of 1.6%, which is more than double the pre-checklist death rate in the more recent Ontario study. This may reflect the nature and complexity of the surgeries surveyed in the first study and also the socioeconomic differences. A substantial proportion of the patients in the international study were enrolled in low-income or middle-income countries. The introduction of a checklist may have been of much greater benefit to patients and hospitals that were already struggling with higher complication rates.
Furthermore, as the accompanying editorial by Dr. Lucian Leape in the New England Journal of Medicine points out, assessment of checklist implementation in the recent study by Urbach and colleagues was based on a retrospective analysis of self-reports by surgical teams and hospitals. Items may have been marked as “checked” in an effort to rush through the list and start the surgical procedures without the necessary diligence and time required to carefully go through every single item on the checklist. In the 2009 WHO study, on the other hand, surgical teams were aware of the fact that they were actively participating in a research study and the participating surgeons may have therefore been more motivated to meticulously implement all the steps on a checklist.
One of the key benefits of checklists is that they introduce a systematic and standardized approach to patient care and improve communication between team members. It is possible that the awareness of surgical teams in the Ontario hospitals in regards to patient safety and the need for systematic communication was already raised to higher level even before the introduction of the mandatory checklists so that this mandate may have had less of an impact.
The study by Urbach and colleagues does not prove that safety checklists are without benefit. It highlights that there is little scientific data supporting the use of mandatory checklists. Since the study could not obtain any data on how well the checklists were implemented in each hospital, it is possible that checklists are more effective when team members buy into their value and do not just view it as another piece of mandatory and bureaucratic paperwork.
Instead of mandating checklists, authorities should consider the benefits of allowing surgical teams to develop their own measures that improve patient safety and team communication. The safety measures will likely contain some form of physical or verbal checklists. By encouraging surgical teams to get involved in the development process and tailor the checklists according to the needs of individual patients, surgical teams and hospitals, they may be far more motivated to truly implement them.
Optimizing such tailored checklists, understanding why some studies indicate benefits of checklists whereas others do not and re-evaluating the efficacy of checklists in the non-academic setting will all require a substantial amount of future research before one can draw definitive conclusions about the efficacy of checklists. Regulatory agencies in Canada and the United Kingdom should reconsider their current mandates. Perhaps an even more important lesson to be learned is that health regulatory agencies should not rush to enforce new mandates based on limited scientific data.
Urbach DR, Govindarajan A, Saskin R, Wilton AS, & Baxter NN (2014). Introduction of surgical safety checklists in Ontario, Canada. The New England Journal of Medicine, 370 (11), 1029-38 PMID: 24620866
Since Shinya Yamanaka’s landmark discovery that adult skin cells could be reprogrammed into embryonic-like induced pluripotent stem cells (iPSCs) by introducing selected embryonic genes into adult cells, laboratories all over the world have been using modifications of the “Yamanaka method” to create their own stem cell lines. The original Yamanaka method published in 2006 used a virus which integrated into the genome of the adult cell to introduce the necessary genes. Any introduction of genetic material into a cell carries the risk of causing genetic aberrancies that could lead to complications, especially if the newly generated stem cells are intended for therapeutic usage in patients.
Researchers have therefore tried to modify the “Yamanaka method” and reduce the risk of genetic aberrations by either using genetic tools to remove the introduced genes once the cells are fully reprogrammed to a stem cell state, introducing genes without non-integrating viruses or by using complex cocktails of chemicals and growth factors in order to generate stem cells without the introduction of any genes into the adult cells.
The papers by Obokata and colleagues at the RIKEN center in Kobe, Japan use a far more simple method to reprogram adult cells. Instead of introducing foreign genes, they suggest that one can expose adult mouse cells to a severe stress such as an acidic solution. The cells which survive acid-dipping adventure (25 minutes in a solution with pH 5.7) activate their endogenous dormant embryonic genes by an unknown mechanism. The researchers then show that these activated cells take on properties of embryonic stem cells or iPSCs if they are maintained in a stem cell culture medium and treated with the necessary growth factors. Once the cells reach the stem cell state, they can then be converted into cells of any desired tissue, both in a culture dish as well as in a developing mouse embryo. Many of the experiments in the papers were performed by starting out with adult mouse lymphocytes, but the researchers also found that mouse skin fibroblasts and other cells could also be successfully converted into an embryonic-like state using the acid stress.
My first reaction was incredulity. How could such a simple and yet noxious stress such as exposing cells to acid be sufficient to initiate a complex “stemness” program? Research labs have spent years fine-tuning the introduction of the embryonic genes, trying to figure out the optimal combination of genes and timing of when the genes are essential during the reprogramming process. These two papers propose that the whole business of introducing stem cell genes into adult cells was unnecessary – All You Need Is Acid.
This sounds too good to be true. The recent history in stem cell research has taught us that we need to be skeptical. Some of the most widely cited stem cell papers cannot be replicated. This problem is not unique to stem cell research, because other biomedical research areas such as cancer biology are also struggling with issues of replicability, but the high scientific impact of burgeoning stem cell research has forced its replicability issues into the limelight. Nowadays, whenever stem cell researchers hear about a ground-breaking new stem cell discovery, they often tend to respond with some degree of skepticism until multiple independent laboratories can confirm the results.
My second reaction was that I really liked the idea. Maybe we had never tried something as straightforward as an acid stress because we were too narrow-minded, always looking for complex ways to create stem cells instead of trying simple approaches. The stress-induction of stem cell behavior may also represent a regenerative mechanism that has been conserved by evolution. When our amphibian cousins regenerate limbs following an injury, adult tissue cells are also reprogrammed to a premature state by the stress of the injury before they start building a new limb.
The idea of stress-induced reprogramming of adult cells to an embryonic-like state also has a powerful poetic appeal, which inspired me to write the following haiku:
Just because the idea of acid-induced reprogramming is so attractive does not mean that it is scientifically accurate or replicable.
A number of concerns about potential scientific misconduct in the context of the two papers have been raised and it appears that the RIKEN center is investigating these concerns. Specifically, anonymous bloggers have pointed out irregularities in the figures of the papers and that some of the images may be duplicated. We will have to wait for the results of the investigation, but even if image errors or duplications are found, this does not necessarily mean that this was intentional misconduct or fraud. Assembling manuscripts with so many images is no easy task and unintentional errors do occur. These errors are probably far more common than we think. High profile papers undergo much more scrutiny than the average peer-reviewed paper, and this is probably why we tend to uncover them more readily in such papers. For example, image duplication errors were discovered in the 2013 Cell paper on human cloning, but many researchers agreed that the errors in the 2013 Cell paper were likely due to sloppiness during the assembly of the submitted manuscript and did not constitute intentional fraud.
Irrespective of the investigation into the irregularities of figures in the two Nature papers, the key question that stem cell researchers have to now address is whether the core findings of the Obokata papers are replicable. Can adult cells – lymphocytes, skin fibroblasts or other cells – be converted into embryonic-like stem cells by an acid stress? If yes, then this will make stem cell generation far easier and it will open up a whole new field of inquiry, leading to many new exciting questions. Do human cells also respond to acid stress in the same manner as the mouse cells? How does acid stress reprogram the adult cells? Is there an acid-stress signal that directly acts on stem cell transcription factors or does the stress merely activate global epigenetic switches? Are other stressors equally effective? Does this kind of reprogramming occur in our bodies in response to an injury such as low oxygen or inflammation because these kinds of injuries can transiently create an acidic environment in our tissues?
Researchers all around the world are currently attempting to test the effect of acid exposure on the activation of stem cell genes. Paul Knoepfler’s stem cell blog is currently soliciting input from researchers trying to replicate the work. Paul makes it very clear that this is an informal exchange of ideas so that researchers can learn from each other on a “real-time” basis. It is an opportunity to find out about how colleagues are progressing without having to wait for 6-12 months for the next big stem cell meeting or the publication of a paper confirming or denying the replication of acid-induced reprogramming. Posting one’s summary of results on a blog is not as rigorous as publishing a peer-reviewed paper with all the necessary methodological details, but it can at least provide some clues as to whether some or all of the results in the controversial Obokata papers can be replicated.
If the preliminary findings of multiple labs posted on the blog indicate that lymphocytes or skin cells begin to activate their stem cell gene signature after acid stress, then we at least know that this is a project which merits further investigation and researchers will be more willing to invest valuable time and resources to conduct additional replication experiments. On the other hand, if nearly all the researchers post negative results on the blog, then it is probably not a good investment of resources to spend the next year or so trying to replicate the results.
It does not hurt to have one’s paradigms or ideas challenged by new scientific papers as long as we realize that paradigm-challenging papers need to be replicated. The Nature papers must have undergone rigorous peer review before their publication, but scientific peer review does not involve checking replicability of the results. Peer reviewers focus on assessing the internal logic, experimental design, novelty, significance and validity of the conclusions based on the presented data. The crucial step of replicability testing occurs in the post-publication phase. The post-publication exchange of results on scientific blogs by independent research labs is an opportunity to crowd-source replicability testing and thus accelerate the scientific authentication process. Irrespective of whether or not the attempts to replicate acid-induced reprogramming succeed, the willingness of the stem cell community to engage in a dialogue using scientific blogs and evaluate replicability is an important step forward.
Obokata H, Wakayama T, Sasai Y, Kojima K, Vacanti MP, Niwa H, Yamato M, & Vacanti CA (2014). Stimulus-triggered fate conversion of somatic cells into pluripotency. Nature, 505 (7485), 641-7 PMID: 24476887
A few months ago, we discussed the replicability issues associated with high-impact publications in stem cell research on this blog. Some of the most exciting and most widely cited papers using adult stem cells could not be replicated during subsequent studies, and resulted in a lot of conflicting data and frustration among stem cell scientists. However, this is not just an affliction of stem cell research, but it also applies to many other areas of research, such as cancer biology.
The cancer researchers Glenn Begley and Lee Ellis made a rather remarkable claim last year. In a commentary that analyzed the dearth of efficacious novel cancer therapies, they revealed that scientists at the biotechnology company Amgen were unable to replicate the vast majority of published pre-clinical research studies. Only 6 out of 53 landmark cancer studies could be replicated, a dismal success rate of 11%! The Amgen researchers had deliberately chosen highly innovative cancer research papers, hoping that these would form the scientific basis for future cancer therapies that they could develop. It should not come as a surprise that progress in developing new cancer treatments is so sluggish. New clinical treatments are often based on innovative scientific concepts derived from pre-clinical laboratory research. However, if the pre-clinical scientific experiments cannot be replicated, it would be folly to expect that clinical treatments based on these questionable scientific concepts would succeed.
Reproducibility of research findings is the cornerstone of science. Peer-reviewed scientific journals generally require that scientists conduct multiple repeat experiments and report the variability of their findings before publishing them. However, it is not uncommon for researchers to successfully repeat experiments and publish a paper, only to learn that colleagues at other institutions can’t replicate the findings. This does not necessarily indicate foul play. The reasons for the lack of reproducibility include intentional fraud and misconduct, yes, but more often it’s negligence, inadvertent errors, imperfectly designed experiments and the subliminal biases of the researchers or other uncontrollable variables.
Clinical studies, of new drugs, for example, are often plagued by the biological variability found in study participants. A group of patients in a trial may exhibit different responses to a new medication compared to patients enrolled in similar trials at different locations. In addition to genetic differences between patient populations, factors like differences in socioeconomic status, diet, access to healthcare, criteria used by referring physicians, standards of data analysis by researchers or the subjective nature of certain clinical outcomes – as well as many other uncharted variables – might all contribute to different results.
You can read the complete article here at Salon.com. It is important to note that the the replicability issues were identified in pre-clinical research, i.e. lab bench studies and were not the result of varying responses between patients that often plague clinical research. Pre-clinical cancer researchers use molecular and cellular techniques that are commonly used in neuroscience, immunology, stem cell biology and many other areas of the life sciences. Therefore, all of these areas of biological research may have similarly poor rates of replicability.
Human fallibility not only affects how scientists interpret and present their data, but can also have a far-reaching impact on which scientific projects receive research funding or the publication of scientific results. When manuscripts are submitted to scientific journals or when grant proposal are submitted to funding agencies, they usually undergo a review by a panel of scientists who work in the same field and can ultimately decide whether or not a paper should be published or a grant funded. One would hope that these decisions are primarily based on the scientific merit of the manuscripts or the grant proposals, but anyone who has been involved in these forms of peer review knows that, unfortunately, personal connections or personal grudges can often be decisive factors.
Lack of scientific replicability, knowing about the uncertainties that come with new scientific knowledge, fraud and fudging, biases during peer review – these are all just some of the reasons why scientists rarely believe in the mystique of science. When I discuss this with acquaintances who are non-scientists, they sometimes ask me how I can love science if I have encountered these “ugly” aspects of science. My response is that I love science despite this “ugliness”, and perhaps even because of its “ugliness”. The fact that scientific knowledge is dynamic and ephemeral, the fact that we do not need to feel embarrassed about our ignorance and uncertainties, the fact that science is conducted by humans and is infused with human failings, these are all reasons to love science. When I think of science, I am reminded of the painting “Basket of Fruit” by Caravaggio, which is a still-life of a fruit bowl, but unlike other still-life paintings of fruit, Caravaggio showed discolored and decaying leaves and fruit. The beauty and ingenuity of Caravaggio’s painting lies in its ability to show fruit how it really is, not the idealized fruit baskets that other painters would so often depict.
I recently used the Web of Science database to generate a list of the most highly cited papers in stem cell research. As of July 2013, the search for original research articles which use the key word “stem cells” resulted in the following list of the ten most widely cited papers to date:
1. Pittenger M et al. (1999)Multilineage potential of adult human mesenchymal stem cells. Science284(5411):143-147
2. Thomson JA et al. (1998)Embryonic stem cell lines derived from human blastocysts. Science282(5391):1145-1147
3. Takahashi K and Yamanaka S (2006)Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126(4): 663-676
4. Takahashi K et al.(2007)Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131(5):861-872
5. Donehower LA et al (1992)Mice deficient for p53 are developmentally normal but susceptible to spontaneous tumours. Nature 356(6366): 215-221
6. Al-Hajj M et al (2003)Prospective identification of tumorigenic breast cancer cells.Proceedings of the National Academy of Sciences 100(7): 3983-3988
7. Yu J et al (2007)Induced pluripotent stem cell lines derived from human somatic cells.Science 318(5858): 1917-1920
8. Jiang YH et al (2002)Pluripotency of mesenchymal stem cells derived from adult marrow.Nature 418(6893):41-49
9. Orlic D et al (2001)Bone marrow cells regenerate infarcted myocardium. Nature 410 (6829):701-705
10. Lu J et al (2005)MicroRNA expression profiles classify human cancers. Nature 435(7043): 834-838
Three of the articles (Donehower et al, Al-Hajj et al and Lu et al) in this “top ten list” do not focus on stem cells but are actually cancer research papers. They were probably identified by the search because the authors may have made comparisons to stem cells or used stem cells as tools.The remaining seven articles are indeed widely known in the stem cell field.
The Science paper by Pittenger and colleagues in 1999 provided a very comprehensive description of mesenchymal stem cells (MSCs), a type of adult stem cell which is found in the bone marrow alongside hematopoietic stem cells (HSCs). Despite the fact that MSCs and HSCs are both adult stem cells in the bone marrow, they have very different functions. HSCs give rise to circulating blood cells, whereas MSCs primarily form bone, fat and cartilage as was nicely demonstrated by Pittenger and colleagues.
The article by Thomson and colleagues was published in 1998 in the journal Science described the derivation of human embryonic stem cells (ESCs) and revolutionized the field of stem cell research. While adult stem cells have a very limited capacity in terms of lineages they can turn into, ESCs are derived from the early blastocyst stage of embryonic development (within the first 1-2 weeks following fertilization) and thus retain the capacity to turn into a very wide range of tissues, such as neurons, heart cells, blood vessel cells or liver cells. This paper not only identified the methods for isolating human ESCs, but also how to keep them in culture and expand them as undifferentiated stem cells.
The Cell paper by Takahashi and Yamanaka in 2006 represented another major advancement in the field of stem cell biology, because it showed for the first time that a mouse adult skin cell (fibroblast) could be reprogrammed and converted into a truly pluripotent stem cell (an induced pluripotent stem cell or iPSC) which exhibited all the major characteristics of an embryonic stem cell (ESC). It was as if the adult skin cell was traveling back in time, erasing its identity of having been a skin cell and returning to primordial, embryonic-like stem cell. Only one year later, Dr. Yamanaka’s group was able to demonstrate the same phenomena for adult human skin cells in the 2007 Cell paper (Takahashi et al), and in the same year a different group independently confirmed that adult human cells could be reprogrammed to the iPSC state (Science paper by Yu et al in 2007). The generation of iPSCs described in these three papers is probably the most remarkable discovery in stem cell biology during the past decade. It is no wonder that each of these three papers have been cited several thousand times even though they were published only six or seven years ago, and that Dr. Yamanaka was awarded the 2012 Nobel prize for this pioneering work.
All five of the above-mentioned stem cell papers have one thing in common: the results have been repeated and confirmed by numerous independent laboratories all over the world. However, this does not necessarily hold true for the other two highly cited stem cell papers on this list.
The 2002 Nature paper by Jiang and colleagues from Dr. Verfaillie’s laboratory at the University of Minnesota proposed that the bone marrow contained a rather special subset of adult MSCs which had a much broader differentiation potential than had been previously recognized. While adult MSCs were thought to primarily turn into bone, cartilage or fat when given the appropriate cues, this rare new cell type – referred to as MAPCs (multipotent adult progenitor cells) – appeared to differentiate into a much broader range of tissues. The paper even showed data from an experiment in which these adult mouse bone marrow stem cells were combined with embryonic cells and gave rise to a chimeric mouse. i.e. a mouse in which the tissues were in part derived from standard embryonic cells and in part from the newly discovered adult MAPCs. Such chimerism suggested that the MAPCs were embryonic-like, contributing to the formation of all the tissues in the mice. At the time of its publication, this paper was met with great enthusiasm because it proved that the adult body contained embryonic-like cells, hidden away in the bone marrow, and that these MAPCs could be used to regenerate ailing organs and tissues without having to use ethically problematic human embryonic stem cells.
There was just one major catch. Many laboratories around the world tried to replicate the results and were unable to identify the MAPCs, and even when they found cells that were MAPCs, they were unable to confirm the embryonic-like nature of the cells. In a remarkable example of investigative journalism, the science journalists Peter Aldhous and Eugenie Reich identified multiple irregularities in the publications involving MAPCs and documented the inability of researchers to replicate the findings by publishing the results of their investigation in the New Scientist (PDF).
The second high profile stem cell paper which was also plagued by an inability to replicate the results was the 2001 Nature paper by Orlic and colleagues. In this paper from Dr. Anversa’s laboratory, the authors suggested that adult hematopoietic (blood-forming) stem cells from the bone marrow could regenerate an infarcted heart by becoming heart cells (cardiomyocytes). It was a rather bold claim, because simply injecting these blood-forming stem cells into the heart seemed to be sufficient to redirect their fate. Instead of giving rise to red and white blood cells, these bone marrow cells were generating functional heart cells. If this were the case, then every patient could be potentially treated with their own bone marrow and grow back damaged heart tissue after a heart attack. Unfortunately, it was too good to be true. Two leading stem cell laboratories partnered up to confirm the results, but even after years of experiments, they were unable to find any evidence of adult bone marrow stem cells converting into functional heart cells. They published their findings three years later, also in the journal Nature:
Murry CE et al (2004)Haematopoietic stem cells do not transdifferentiate into cardiac myocytes in myocardial infarcts. Nature 428(6983): 664-668
Interestingly, the original paper which had made the claim that bone marrow cells can become functional heart cells has been cited nearly 3,000 times, whereas the refutation by Murry and colleagues, published in the same high-profile journal has been cited only 1,150 times. The vast majority of the nearly 3,000 citations of the 2001 paper by Orlic and colleagues occurred after it had been refuted in 2004! The 2001 Orlic et al paper has even been used to justify clinical trials in which bone marrow was obtained from heart attack patients and injected into their hearts. As expected after the refutation by Murry and colleagues, the success of these clinical trials was rather limited One of the largest bone marrow infusion trials in heart attack patients was recently published, showing no success of the therapy.
These claims of the two papers (Orlic et al and Jiang et al) were quite innovative and exciting, and they were also published in a high-profile, peer-reviewed journal, just like the other five stem cell papers. The crucial difference was the fact that their findings could not be replicated by other laboratories. Despite their lack of replicability, both papers had an enormous impact on the field of stem cell research. Senior scientists, postdocs and graduate students may have devoted a substantial amount of time and resources to developing projects that built on the findings of these two papers, only to find out that they could not be replicated. If there is a lesson to be learned, it is that we need to be rather cautious in terms of our enthusiasm for new claims in stem cell biology until they have been appropriately confirmed by other researchers. Furthermore, we need to streamline the replicability testing process so that we do not have to wait years before we find out that one of the most highly prized discoveries cannot be independently confirmed.
Update 7/24/2013: Peter Aldhous reminded me that the superb job of investigative journalism into the question of MAPCs was performed in partnership with the science writer Eugenie Reich, the author of a book on scientific fraud. I have updated the blog post to reflect this.
“…I can live with doubt and uncertainty and not knowing. I think it’s much more interesting to live not knowing than to have answers which might be wrong.“
Richard P Feynman, “The Pleasure of Finding Things Out”
It may be useful for non-specialists who are not actively involved in the scientific peer review process to get some insight into what constitutes “scientific peer review”. This blog post will give an overview of the peer review process, primarily based on my own scientific peer review experiences in biological and medical research.
The contemporary peer review process for manuscripts submitted to scientific journals usually involves the following stages:
1. Submission: The authors submit their scientific manuscript to a journal.
2.Assignment: Once the manuscript passes an initial quality control process, it is assigned to an editor, associate editor or academic editor with expertise that is at least broadly related to the focus of the manuscript.
3. Initial editorial decision: The assigned editor decides whether the manuscript is generally appropriate for the journal and category of article that it was submitted to. If the manuscript is deemed appropriate, it is sent out for review to experts in the field. If not, the manuscript is sent back to the authors and they are asked to submit it to a more appropriate journal. Occasionally, some editors may ask the authors to substantially revise the manuscript before it can even be deemed appropriate for peer review.
Many of the high profile journals, such as Nature, Cell or Science, use this stage to eliminate the bulk of submitted manuscripts. The criteria for rejecting manuscripts at this preliminary stage are rather vague and have more to do with the general goals of the journal (i.e. publishing high-impact papers that garner citations by other scientists and frequent mentions in the news media) than the scientific rigor and quality of the scientific work. Sometimes an informal nod to the editors from a renowned leader in the field via a brief email or phone call can also do the trick and help increase the likelihood of a manuscript being sent out for review.
High-profile journals also encourage a pre-submission inquiry in which authors can submit a brief description of the work instead of the complete scientific manuscript. This allows the editors to screen the potential submissions and only invite a select group of authors for a complete manuscript submission, if they are reasonably sure that the manuscript merits a full review. The precise percentages of manuscripts that are rejected at the initial state prior to the formal review or the comparative fate of manuscripts that are submitted after a pre-submission inquiry versus those submitted directly with an inquiry are very difficult to obtain.
When manuscripts are rejected prior to a review, authors usually do not receive any specific comments other than something along the lines of “your work is not appropriate for our readership” or “please consider submitting it to a subspecialty journal”. The peer review process would be far more transparent, if journals were obligated to provide accurate statistics on what percentage of manuscripts are sent out for review, what specific factors resulted in the decision on whether or not to formally review a manuscript and the acceptance rates for manuscripts that use a pre-submission inquiry versus those which do not.
4. Selection of peer reviewers: Once a manuscript has cleared the initial editorial hurdle, it is sent out for review by scientists who are chosen based on their areas of expertise. Most journals that I have either reviewed for usually require between two and four reviewers, although I have occasionally seen cases in which seven reviewers were asked to comment on a scientific manuscript. As science is becoming more interdisciplinary, scientific manuscripts increasingly span multiple areas of research and may thus require more reviewers with complementary areas of expertise.
Many journals allow the authors to suggest names for potential peer reviewers. It is generally understood that authors will not suggest active or recent collaborators as reviewers, but instead choose scientists who are most qualified. Authors are likely to suggest reviewers who will have a favorable opinion on the importance or significance of the research, but it is the editor’s decision on whether or not to accept the author suggestions or whether to select (additional) independent reviewers. A number of journals allow requests to exclude specific scientific reviewers in case the authors feel that these scientists may harbor strong biases against the submitted scientific work or against the authors.
5. Peer review reports: Once the experts accept the invitation to review a manuscript, they are usually granted 10-14 days to submit a completed report as to the significance, novelty and scientific rigor of the manuscript. This time frame is necessary because performing an in-depth peer review requires a substantial amount of time. Since peer review is conducted by scientific experts on a voluntary, unpaid basis outside of the normal work demands, peer reviewers often devote many hours of their evenings or week-ends to pour over the submitted manuscript, the figures, data tables and other supplementary data provided by the authors. A typical manuscript in the life sciences contains 4 to 8 multi-panel figures, each consisting of multiple graphs and other images. While short papers published in journals such as Nature or Science have strict limits on word counts and also restrict the number of figures in a paper to four, there are few limits on the supplemental data. I once reviewed a paper which had six multi-panel figures as part of the main manuscript, but also contained additional 10 multi-panel figures and three tables in an online “supplementary data” section!
A conscientious peer reviewer not only reads the submitted manuscript, but also tries to envision how the experiments were designed and conducted, and tries to put the findings into context of the existing literature. A six thousand word manuscript with six figures and a couple of tables represents a distillate of two or three years of research, often conducted by a team of scientists. The authors have had months or years to familiarize themselves with the experimental design, results and interpretation of the data, and some manuscripts are written in a manner that it is not easy to divine the actual intentions of the researchers. A well-written manuscript is much easier to review.
Most editors ask the reviewers to rank or grade the significance of the work, the novelty of the research as well as experimental design or approach. In addition to providing these grades or ranks, the reviewers also offer specific comments about the strengths and weaknesses of a paper. Many of the comments relate to the adequacy of the experimental design, the consistency of the results, whether the interpretations match the presented data and how these new findings relate to previously published work. These comments vary substantially from reviewer to reviewer. Some just write a handful of sentences, others write two pages of comments.
Finally, reviewers provide the editors with a confidential overall recommendation, such as reject the manuscript, return the manuscript to authors for major revisions, return manuscript to authors for minor revisions or accept the manuscript as is.
The opinions of reviewers can vary substantially, because reviewers differ in terms of their priorities for what constitutes significant research, their analytical skills, their personal biases and their threshold for what is acceptable for publication. It is not uncommon to have one reviewer outright reject a manuscript (because the manuscript would not improve much with revision) whereas another reviewer wants to accept a manuscript pending minor revisions.
6. Editorial decision: The editor receives the reviewer reports and has to decide upon the overall verdict on the manuscript. If the reviews are too disparate, the editor may solicit the opinion of additional peer reviewers to help with a final decision. The final decision letter contains editorial comments as well as the comments provided by the reviewers to explain why a certain decision was reached. Many journals send a copy of the decision letter to the reviewers, not only to maintain transparency in the decision making process but also to allow peer reviewers to read the reports or comments of the anonymous reviewers. This is a form of feedback and learning opportunity, which enables peer reviewers to assess whether their fellow reviewers picked up on strengths or weaknesses of the manuscript which they may have missed. If the decision is made to ask the authors for revisions, the authors are usually given a time frame during which to make the revisions and the revised manuscript then again undergoes the review process. The editor may decide to use the same reviewers or choose additional peer reviewers.
I chose to elucidate the peer review process in such detail because I want to highlight what “peer review” does and does not entail. At its core, it is just a review of the submitted data, not a validation of the data. Peer reviewers do not perform any experiments to check the accuracy or replicability of the data contained in the submitted manuscript. The assessment of the validity of the results does not occur during the peer review process, but months or years later when other scientists attempt to replicate the published paper.
In most cases, peer reviewers do not even have access to the raw data. It is therefore very difficult for a peer reviewer to discern whether or not scientists have chosen to submit truly representative data, or whether they chose the “best” data which optimally supports the conclusions of the manuscript. Much of scientific peer review is based on the honor system. If researchers claim that they have performed an experiment five times or that the results are statistically significant, the peer reviewers take the word of the researchers and base the review of the scientific results on this assumption.
Peer-reviewed research is understandably more rigorous than research which has not undergone any review process, but the review process is quite limited in its scope and prone to errors due to the subjective priorities and biases of editors and reviewers. Validation of the research occurs when independent scientists are able to replicate the published findings. Replication of scientific results is the gulf which separates peer review from peer validation.
The hypothalamus is located at the base of the brain and in adult humans, it has a volume of only 4cm3, less than half a percent of the total adult human brain volume. Despite its small size, the hypothalamus is one of the most important control centers in our brain because it functions as the major interface between two regulatory systems in our body: The nervous system and the endocrine (hormonal) system. It consists of many subunits (nuclei) which continuously sense inputs and then respond to these inputs by releasing neurotransmitters or hormones that regulate a broad range of vital functions, such as our metabolism, appetite, thirst, reproduction, temperature and even our internal timing system, the circadian clock. As if this huge workload wasn’t enough, researchers have now uncovered an additional role for the hypothalamus: regulating lifespan.
The recent paper “Hypothalamic programming of systemic ageing involving IKK-β,NF-κB and GnRH” published in the journal Nature (published online May 1, 2013) by Guo Zhang and colleagues at the Albert Einstein College of Medicine in New York used elegant genetic mouse models to either continuously activate or continuously suppress the function of the NF-κB protein in the hypothalamus. This protein is a key transcription factor which is found in most organs and tissues and turns on genes in response to an inflammatory stimulus. The researchers were thus able to artificially create an internal scenario in which the hypothalamus was receiving a continuous “inflammation on” or “inflammation off” input without having to provide any external infectious or inflammatory agents. The results were quite striking. Continuous activation of the inflammatory NF-κB pathway in the hypothalamus resulted in a reduction of overall lifespan in the mice, but it also resulted in a loss of muscle mass, bone mass, and cognitive function – the mice showed signs of accelerated aging. An even more remarkable finding was that continuous suppression of the inflammatory pathway extended the lifespan of the mice when compared to their littermates that did not undergo any genetic modifications. Not only did these mice live longer (median lifespan increased by 23%), but they also exhibited significantly less physical and cognitive decline than regular mice!
To investigate the mechanism by which the suppression of inflammatory signals could result in such a profound increase in longevity and functional capacity, the researchers studied Gonadotropin Releasing Hormone (GnRH), one of the major hormones released by the hypothalamus which in turn regulates the release of reproductive hormones. They found that aging or inflammatory activation indeed suppressed GnRH release, whereas inhibition of the inflammatory signaling was able to restore GnRH levels. More importantly, simply injecting the mice with GnRH was able to prevent the physical and cognitive decline in the aging mice. How the injections of GnRH were able to restore muscle mass and even cognitive function was not evaluated in the study, but the researchers did observe that the brain showed increased evidence of neuron growth, which could explain the anti-aging effects of GnRH.
This paper is not the first to link inflammation to aging, but it is the first to show that localized inflammation signals in the hypothalamus can have such a profound effect on the lifespan of mice and it is also the first to propose that suppression of GnRH may be the reason for this inflammation-aging link. As with all important scientific papers, this study raises more questions than it answers. Is GnRH not just a regulator of sex hormones, but does it also exert effects on neurons and muscle cells that are independent of its role as a regulator of reproductive hormones? The mice with prolonged life-spans were all studied in a laboratory setting and thus not exposed to infectious agents that mice (or humans, for that matter) living in the wild commonly encounter. Would suppression of the NF-κB pathway in the hypothalamus possibly compromise their ability to fend off infections or other natural forms of inflammation? It is also not clear whether the GnRH link would apply to all mammals such humans, since aging female primates have higher, (not lower!) GnRH levels. These are all questions that lie beyond the scope of this paper and they need to be addressed in future papers.
However, there are some major limitations of this study and the proposed new hypothalamus-inflammation-GnRH-aging model. First, there is one rather obvious experiment that is missing. The researchers showed that manipulating NF-κB in the hypothalamus can have a major effect on the lifespan and the cognitive as well as physical function, but for some reason the researchers did not show the results from a rather simple experiment: Does GnRH alone extend the lifespan? If GnRH were really the main pathway by which the hypothalamus regulates aging, than giving GnRH ought to have extended the lifespan of the mice.
A second limitation of the paper is that it does not distinguish between general functional decline versus decreased regeneration. Biological aging is characterized by a gradual functional decline over time, but this is due to a combination of at least two parallel processes. Existing cells and tissues accumulate damaged and become dysfunctional and regenerative stem cells or progenitor cells become exhausted and cannot keep up with the repair. This study does not assess whether increased NF-κB activation in the hypothalamus causes more cellular dysfunction, whether it merely inhibits the regenerative repair process or whether it affects both. The researchers did not perform assessments of cellular aging, such as measuring the expression levels of the cellular aging regulator p16 or quantify oxidative stress. Therefore, it is unclear whether NF-κB activation in the hypothalamus had any impact on the cellular aging (senescence) program in the brain, muscles or elsewhere in the body.
Another key limitation is that the hypothalamus has so many functions other than GnRH release, which could all contribute to aging and changes in the lifespan of the mice. The authors themselves have previously published that NF-κB in the hypothalamus regulates the link between obesity and high blood pressure and multiple other groups have already shown that the hypothalamus may affect aging via its role in metabolic regulation. Unfortunately, the current study glosses over the potential role of metabolism and high blood pressure, which could explain the observed longevity effects and instead just focuses on the more provocative but less substantiated idea of GnRH as the aging regulator.
Due to these limitations, we still have to await additional studies that confirm the role of GnRH as the target for NF-κB activation in the hypothalamus and this link between inflammation, aging and the hypothalamus.
We should also remember that biological aging is just one aspect of aging. As André Maurois once wrote, “Old age is far more than white hair, wrinkles, the feeling that it is too late and the game finished, that the stage belongs to the rising generations. The true evil is not the weakening of the body, but the indifference of the soul.”
Zhang G, Li J, Purkayastha S, Tang Y, Zhang H, Yin Y, Li B, Liu G, & Cai D (2013). Hypothalamic programming of systemic ageing involving IKK-β, NF-κB and GnRH. Nature, 497 (7448), 211-216 PMID: 23636330