Less than one fifth of PhD students in the United States will be able to pursue tenure track academic faculty careers once they graduate from their program. Reduced federal funding for research and dwindling support from the institutions for their tenure-track faculty are some of the major reasons for why there is such an imbalance between the large numbers of PhD graduates and the limited availability of academic positions. Upon completing the program, PhD graduates have to consider non-academic job opportunities such as in the industry, government agencies and non-profit foundations but not every doctoral program is equally well-suited to prepare their graduates for such alternate careers. It is therefore essential for prospective students to carefully assess the doctoral program they want to enroll in and the primary mentor they would work with. The best approach is to proactively contact prospective mentors, meet with them and learn about the research opportunities in their group but also discuss how completing the doctoral program would prepare them for their future careers.
The vast majority of professors will gladly meet a prospective graduate student and discuss research opportunities as well as long-term career options, especially if the student requesting the meeting clarifies the goal of the meeting. However, there are cases when students wait in vain for a response. Is it because their email never reached the professor because it got lost in the internet ether or a spam folder? Was the professor simply too busy to respond? A research study headed by Katherine Milkman from the University of Pennsylvania suggests that the lack of response from the professor may in part be influenced by the perceived race or gender of the student.
Milkman and her colleagues conducted a field experiment in which 6,548 professors at the leading US academic institutions (covering 89 disciplines) were contacted via email to meet with a prospective graduate student. Here is the text of the email that was sent to each professor.
Subject Line: Prospective Doctoral Student (On Campus Next
Dear Professor [surname of professor inserted here],
I am writing you because I am a prospective doctoral student with considerable interest in your research. My plan is to apply to doctoral programs this coming Fall, and I am eager to learn as much as I can about research opportunities in the meantime.
I will be on campus next Monday, and although I know it is short notice, I was wondering if you might have 10 minutes when you would be willing to meet with me to briefly talk about your work and any possible opportunities for me to get involved in your research. Any time that would be convenient for you would be fine with me, as meeting with you is my first priority during this campus visit.
Thank you in advance for your consideration.
[Student’s full name inserted here]
As a professor who frequently receives emails from people who want to work in my laboratory, I feel that the email used in the research study was extremely well-crafted. The student only wants a brief meeting to explore potential opportunities without trying to extract any specific commitment from the professor. The email clearly states the long-term goal – applying to doctoral programs. The tone is also very polite and the student expresses willingness of the prospective student to a to the professor’s schedule. Each email was also personally addressed with the name of the contacted faculty member.
Milkman’s research team then assessed whether the willingness of the professors to respond depended on the gender or ethnicity of the prospective student. Since this was an experiment, the emails and student names were all fictional but the researchers generated names which most readers would clearly associate with a specific gender and ethnicity.
Here is a list of the names they used:
White male names: Brad Anderson, Steven Smith
White female names: Meredith Roberts, Claire Smith
The researchers assessed whether the professors responded (either by agreeing to meet or providing a reason for why they could not meet) at all or whether they simply ignored the email and whether the rate of response depended on the ethnicity/gender of the student.
The overall response rate of the professors ranged from about 60% to 80%, depending on the research discipline as well as the perceived ethnicity and gender of the prospective student. When the emails were signed with names suggesting a white male background of the student, professors were far less likely to ignore the email when compared to those signed with female names or names indicating an ethnic minority background. Professors in the business sciences showed the strongest discrimination in their response rates. They ignored only 18% of emails when it appeared that they had been written by a white male and ignored 38% of the emails if they were signed with names indicating a female gender or ethnic minority background. Professors in the education disciplines ignored 21% of emails with white male names versus 35% with female or minority names. The discrimination gaps in the health sciences (33% vs 43%) and life sciences (32% vs 39%) were smaller but still significant, whereas there was no statistical difference in the humanities professor response rates. Doctoral programs in the fine arts were an interesting exception where emails from apparent white male students were more likely to be ignored (26%) than those of female or minority candidates (only 10%).
The discrimination primarily occurred at the initial response stage. When professors did respond, there was no difference in terms of whether they were able to make time for the student. The researchers also noted that responsiveness discrimination in any discipline was not restricted to one gender or ethnicity. In business doctoral programs, for example, professors were most likely to ignore emails with black female names and Indian male names. Significant discrimination against white female names (when compared to white males names) predicted an increase in discrimination against other ethnic minorities. Surprisingly, the researchers found that having higher representation of female and minority faculty at an institution did not necessarily improve the responsiveness towards requests from potential female or minority students.
This carefully designed study with a large sample size of over 6,500 professors reveals the prevalence of bias against women and ethnic minorities at the top US institutions. This bias may be so entrenched and subconscious that it cannot be remedied by simply increasing the percentage of female or ethnic minority professors in academia. Instead, it is important that professors understand that they may be victims of these biases even if they do not know it. Something as simple as deleting an email from a prospective student because we think that we are too busy to respond may be indicative of an insidious gender or racial bias that we need to understand and confront. Increased awareness and introspection as well targeted measures by institutions are the important first steps to ensure that students receive the guidance and mentorship they need, independent of their gender or ethnic background.
Milkman KL, Akinola M, & Chugh D (2015). What happens before? A field experiment exploring how pay and representation differentially shape bias on the pathway into organizations. The Journal of applied psychology, 100 (6), 1678-712 PMID: 25867167
Universities and the scientific infrastructures in Muslim-majority countries need to undergo radical reforms if they want to avoid falling by the wayside in a world characterized by major scientific and technological innovations. This is the conclusion reached by Nidhal Guessoum and Athar Osama in their recent commentary “Institutions: Revive universities of the Muslim world“, published in the scientific journal Nature. The physics and astronomy professor Guessoum (American University of Sharjah, United Arab Emirates) and Osama, who is the founder of the Muslim World Science Initiative, use the commentary to summarize the key findings of the report “Science at Universities of the Muslim World” (PDF), which was released in October 2015 by a task force of policymakers, academic vice-chancellors, deans, professors and science communicators. This report is one of the most comprehensive analyses of the state of scientific education and research in the 57 countries with a Muslim-majority population, which are members of the Organisation of Islamic Cooperation (OIC).
Here are some of the key findings:
1. Lower scientific productivity in the Muslim world: The 57 Muslim-majority countries constitute 25% of the world’s population, yet they only generate 6% of the world’s scientific publications and 1.6% of the world’s patents.
2. Lower scientific impact of papers published in the OIC countries: Not only are Muslim-majority countries severely under-represented in terms of the numbers of publications, the papers which do get published are cited far less than the papers stemming from non-Muslim countries. One illustrative example is that of Iran and Switzerland. In the 2014 SCImago ranking of publications by country, Iran was the highest-ranked Muslim-majority country with nearly 40,000 publications, just slightly ahead of Switzerland with 38,000 publications – even though Iran’s population of 77 million is nearly ten times larger than that of Switzerland. However, the average Swiss publication was more than twice as likely to garner a citation by scientific colleagues than an Iranian publication, thus indicating that the actual scientific impact of research in Switzerland was far greater than that of Iran.
To correct for economic differences between countries that may account for the quality or impact of the scientific work, the analysis also compared selected OIC countries to matched non-Muslim countries with similar per capita Gross Domestic Product (GDP) values (PDF). The per capita GDP in 2010 was $10,136 for Turkey, $8,754 for Malaysia and only $7,390 for South Africa. However, South Africa still outperformed both Turkey and Malaysia in terms of average citations per scientific paper in the years 2006-2015 (Turkey: 5.6; Malaysia: 5.0; South Africa: 9.7).
3. Muslim-majority countries make minimal investments in research and development: The world average for investing in research and development is roughly 1.8% of the GDP. Advanced developed countries invest up to 2-3 percent of their GDP, whereas the average for the OIC countries is only 0.5%, less than a third of the world average! One could perhaps understand why poverty-stricken Muslim countries such as Pakistan do not have the funds to invest in research because their more immediate concerns are to provide basic necessities to the population. However, one of the most dismaying findings of the report is the dismally low rate of research investments made by the members of the Gulf Cooperation Council (GCC, the economic union of six oil-rich gulf countries Saudi Arabia, Kuwait, Bahrain, Oman, United Arab Emirates and Qatar with a mean per capita GDP of over $30,000 which is comparable to that of the European Union). Saudi Arabia and Kuwait, for example, invest less than 0.1% of their GDP in research and development, far lower than the OIC average of 0.5%.
So how does one go about fixing this dire state of science in the Muslim world? Some fixes are rather obvious, such as increasing the investment in scientific research and education, especially in the OIC countries which have the financial means and are currently lagging far behind in terms of how much funds are made available to improve the scientific infrastructures. Guessoum and Athar also highlight the importance of introducing key metrics to assess scientific productivity and the quality of science education. It is not easy to objectively measure scientific and educational impact, and one can argue about the significance or reliability of any given metric. But without any metrics, it will become very difficult for OIC universities to identify problems and weaknesses, build new research and educational programs and reward excellence in research and teaching. There is also a need for reforming the curriculum so that it shifts its focus from lecture-based teaching, which is so prevalent in OIC universities, to inquiry-based teaching in which students learn science hands-on by experimentally testing hypotheses and are encouraged to ask questions.
In addition to these commonsense suggestions, the task force also put forward a rather intriguing proposition to strengthen scientific research and education: place a stronger emphasis on basic liberal arts in science education. I could not agree more because I strongly believe that exposing science students to the arts and humanities plays a key role in fostering the creativity and curiosity required for scientific excellence. Science is a multi-disciplinary enterprise, and scientists can benefit greatly from studying philosophy, history or literature. A course in philosophy, for example, can teach science students to question their basic assumptions about reality and objectivity, encourage them to examine their own biases, challenge authority and understand the importance of doubt and uncertainty, all of which will likely help them become critical thinkers and better scientists.
However, the specific examples provided by Guessoum and Athar do not necessarily indicate a support for this kind of a broad liberal arts education. They mention the example of the newly founded private Habib University in Karachi which mandates that all science and engineering students also take classes in the humanities, including a two semester course in “hikma” or “traditional wisdom”. Upon reviewing the details of this philosophy course on the university’s website, it seems that the course is a history of Islamic philosophy focused on antiquity and pre-modern texts which date back to the “Golden Age” of Islam. The task force also specifically applauds an online course developed by Ahmed Djebbar. He is an emeritus science historian at the University of Lille in France, which attempts to stimulate scientific curiosity in young pre-university students by relating scientific concepts to great discoveries from the Islamic “Golden Age”. My concern is that this is a rather Islamocentric form of liberal arts education. Do students who have spent all their lives growing up in a Muslim society really need to revel in the glories of a bygone era in order to get excited about science? Does the Habib University philosophy course focus on Islamic philosophy because the university feels that students should be more aware of their cultural heritage or are there concerns that exposing students to non-Islamic ideas could cause problems with students, parents, university administrators or other members of society who could perceive this as an attack on Islamic values? If the true purpose of liberal arts education is to expand the minds of students by exposing them to new ideas, wouldn’t it make more sense to focus on non-Islamic philosophy? It is definitely not a good idea to coddle Muslim students by adulating the “Golden Age” of Islam or using kid gloves when discussing philosophy in order to avoid offending them.
This leads us to a question that is not directly addressed by Guessoum and Osama: How “liberal” is a liberal arts education in countries with governments and societies that curtail the free expression of ideas? The Saudi blogger Raif Badawi was sentenced to 1,000 lashes and 10 years in prison because of his liberal views that were perceived as an attack on religion. Faculty members at universities in Saudi Arabia who teach liberal arts courses are probably very aware of these occupational hazards. At first glance, professors who teach in the sciences may not seem to be as susceptible to the wrath of religious zealots and authoritarian governments. However, the above-mentioned interdisciplinary nature of science could easily spell trouble for free-thinking professors or students. Comments about evolutionary biology, the ethics of genome editing or discussing research on sexuality could all be construed as a violation of societal and religious norms.
The 2010 study Faculty perceptions of academic freedom at a GCC university surveyed professors at an anonymous GCC university (most likely Qatar University since roughly 25% of the faculty members were Qatari nationals and the authors of the study were based in Qatar) regarding their views of academic freedom. The vast majority of faculty members (Arab and non-Arab) felt that academic freedom was important to them and that their university upheld academic freedom. However, in interviews with individual faculty members, the researchers found that the professors were engaging in self-censorship in order to avoid untoward repercussions. Here are some examples of the comments from the faculty at this GCC University:
“I am fully aware of our culture. So, when I suggest any topic in class, I don’t need external censorship except mine.”
“Yes. I avoid subjects that are culturally inappropriate.”
“Yes, all the time. I avoid all references to Israel or the Jewish people despite their contributions to world culture. I also avoid any kind of questioning of their religious tradition. I do this out of respect.”
This latter comment is especially painful for me because one of my heroes who inspired me to become a cell biologist was the Italian Jewish scientist Rita Levi-Montalcini. She revolutionized our understanding of how cells communicate with each other using growth factors. She was also forced to secretly conduct her experiments in her bedroom because the Fascists banned all “non-Aryans” from going to the university laboratory. Would faculty members who teach the discovery of growth factors at this GCC University downplay the role of the Nobel laureate Levi-Montalcini because she was Jewish? We do not know how prevalent this form of self-censorship is in other OIC countries because the research on academic freedom in Muslim-majority countries is understandably scant. Few faculty members would be willing to voice their concerns about government or university censorship and admitting to self-censorship is also not easy.
The task force report on science in the universities of Muslim-majority countries is an important first step towards reforming scientific research and education in the Muslim world. Increasing investments in research and development, using and appropriately acting on carefully selected metrics as well as introducing a core liberal arts curriculum for science students will probably all significantly improve the dire state of science in the Muslim world. However, the reform of the research and education programs needs to also include discussions about the importance of academic freedom. If Muslim societies are serious about nurturing scientific innovation, then they will need to also ensure that scientists, educators and students will be provided with the intellectual freedom that is the cornerstone of scientific creativity.
Murder your darlings. The British writer Sir Arthur Quiller Crouch shared this piece of writerly wisdom when he gave his inaugural lecture series at Cambridge, asking writers to consider deleting words, phrases or even paragraphs that are especially dear to them. The minute writers fall in love with what they write, they are bound to lose their objectivity and may not be able to judge how their choice of words will be perceived by the reader. But writers aren’t the only ones who can fall prey to the Pygmalion syndrome. Scientists often find themselves in a similar situation when they develop “pet” or “darling” hypotheses.
How do scientists decide when it is time to murder their darling hypotheses? The simple answer is that scientists ought to give up scientific hypotheses once the experimental data is unable to support them, no matter how “darling” they are. However, the problem with scientific hypotheses is that they aren’t just generated based on subjective whims. A scientific hypothesis is usually put forward after analyzing substantial amounts of experimental data. The better a hypothesis is at explaining the existing data, the more “darling” it becomes. Therefore, scientists are reluctant to discard a hypothesis because of just one piece of experimental data that contradicts it.
In addition to experimental data, a number of additional factors can also play a major role in determining whether scientists will either discard or uphold their darling scientific hypotheses. Some scientific careers are built on specific scientific hypotheses which set apart certain scientists from competing rival groups. Research grants, which are essential to the survival of a scientific laboratory by providing salary funds for the senior researchers as well as the junior trainees and research staff, are written in a hypothesis-focused manner, outlining experiments that will lead to the acceptance or rejection of selected scientific hypotheses. Well written research grants always consider the possibility that the core hypothesis may be rejected based on the future experimental data. But if the hypothesis has to be rejected then the scientist has to explain the discrepancies between the preferred hypothesis that is now falling in disrepute and all the preliminary data that had led her to formulate the initial hypothesis. Such discrepancies could endanger the renewal of the grant funding and the future of the laboratory. Last but not least, it is very difficult to publish a scholarly paper describing a rejected scientific hypothesis without providing an in-depth mechanistic explanation for why the hypothesis was wrong and proposing alternate hypotheses.
For example, it is quite reasonable for a cell biologist to formulate the hypothesis that protein A improves the survival of neurons by activating pathway X based on prior scientific studies which have shown that protein A is an activator of pathway X in neurons and other studies which prove that pathway X improves cell survival in skin cells. If the data supports the hypothesis, publishing this result is fairly straightforward because it conforms to the general expectations. However, if the data does not support this hypothesis then the scientist has to explain why. Is it because protein A did not activate pathway X in her experiments? Is it because in pathway X functions differently in neurons than in skin cells? Is it because neurons and skin cells have a different threshold for survival? Experimental results that do not conform to the predictions have the potential to uncover exciting new scientific mechanisms but chasing down these alternate explanations requires a lot of time and resources which are becoming increasingly scarce. Therefore, it shouldn’t come as a surprise that some scientists may consciously or subconsciously ignore selected pieces of experimental data which contradict their darling hypotheses.
Let us move from these hypothetical situations to the real world of laboratories. There is surprisingly little data on how and when scientists reject hypotheses, but John Fugelsang and Kevin Dunbar at Dartmouth conducted a rather unique study “Theory and data interactions of the scientific mind: Evidence from the molecular and the cognitive laboratory” in 2004 in which they researched researchers. They sat in at scientific laboratory meetings of three renowned molecular biology laboratories at carefully recorded how scientists presented their laboratory data and how they would handle results which contradicted their predictions based on their hypotheses and models.
In their final analysis, Fugelsang and Dunbar included 417 scientific results that were presented at the meetings of which roughly half (223 out of 417) were not consistent with the predictions. Only 12% of these inconsistencies lead to change of the scientific model (and thus a revision of hypotheses). In the vast majority of the cases, the laboratories decided to follow up the studies by repeating and modifying the experimental protocols, thinking that the fault did not lie with the hypotheses but instead with the manner how the experiment was conducted. In the follow up experiments, 84 of the inconsistent findings could be replicated and this in turn resulted in a gradual modification of the underlying models and hypotheses in the majority of the cases. However, even when the inconsistent results were replicated, only 61% of the models were revised which means that 39% of the cases did not lead to any significant changes.
The study did not provide much information on the long-term fate of the hypotheses and models and we obviously cannot generalize the results of three molecular biology laboratory meetings at one university to the whole scientific enterprise. Also, Fugelsang and Dunbar’s study did not have a large enough sample size to clearly identify the reasons why some scientists were willing to revise their models and others weren’t. Was it because of varying complexity of experiments and models? Was it because of the approach of the individuals who conducted the experiments or the laboratory heads? I wish there were more studies like this because it would help us understand the scientific process better and maybe improve the quality of scientific research if we learned how different scientists handle inconsistent results.
In my own experience, I have also struggled with results which defied my scientific hypotheses. In 2002, we found that stem cells in human fat tissue could help grow new blood vessels. Yes, you could obtain fat from a liposuction performed by a plastic surgeon and inject these fat-derived stem cells into animal models of low blood flow in the legs. Within a week or two, the injected cells helped restore the blood flow to near normal levels! The simplest hypothesis was that the stem cells converted into endothelial cells, the cell type which forms the lining of blood vessels. However, after several months of experiments, I found no consistent evidence of fat-derived stem cells transforming into endothelial cells. We ended up publishing a paper which proposed an alternative explanation that the stem cells were releasing growth factors that helped grow blood vessels. But this explanation was not as satisfying as I had hoped. It did not account for the fact that the stem cells had aligned themselves alongside blood vessel structures and behaved like blood vessel cells.
Even though I “murdered” my darling hypothesis of fat –derived stem cells converting into blood vessel endothelial cells at the time, I did not “bury” the hypothesis. It kept ruminating in the back of my mind until roughly one decade later when we were again studying how stem cells were improving blood vessel growth. The difference was that this time, I had access to a live-imaging confocal laser microscope which allowed us to take images of cells labeled with red and green fluorescent dyes over long periods of time. Below, you can see a video of human bone marrow mesenchymal stem cells (labeled green) and human endothelial cells (labeled red) observed with the microscope overnight. The short movie compresses images obtained throughout the night and shows that the stem cells indeed do not convert into endothelial cells. Instead, they form a scaffold and guide the endothelial cells (red) by allowing them to move alongside the green scaffold and thus construct their network. This work was published in 2013 in the Journal of Molecular and Cellular Cardiology, roughly a decade after I had been forced to give up on the initial hypothesis. Back in 2002, I had assumed that the stem cells were turning into blood vessel endothelial cells because they aligned themselves in blood vessel like structures. I had never considered the possibility that they were scaffold for the endothelial cells.
This and other similar experiences have lead me to reformulate the “murder your darlings” commandment to “murder your darling hypotheses but do not bury them”. Instead of repeatedly trying to defend scientific hypotheses that cannot be supported by emerging experimental data, it is better to give up on them. But this does not mean that we should forget and bury those initial hypotheses. With newer technologies, resources or collaborations, we may find ways to explain inconsistent results years later that were not previously available to us. This is why I regularly peruse my cemetery of dead hypotheses on my hard drive to see if there are ways of perhaps resurrecting them, not in their original form but in a modification that I am now able to test.
Fugelsang, J., Stein, C., Green, A., & Dunbar, K. (2004). Theory and Data Interactions of the Scientific Mind: Evidence From the Molecular and the Cognitive Laboratory. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 58 (2), 86-95 DOI: 10.1037/h0085799
We often laud intellectual diversity of a scientific research group because we hope that the multitude of opinions can help point out flaws and improve the quality of research long before it is finalized and written up as a manuscript. The recent events surrounding the research in one of the world’s most famous stem cell research laboratories at Harvard shows us the disastrous effects of suppressing diverse and dissenting opinions.
The infamous “Orlic paper” was a landmark research article published in the prestigious scientific journal Nature in 2001, which showed that stem cells contained in the bone marrow could be converted into functional heart cells. After a heart attack, injections of bone marrow cells reversed much of the heart attack damage by creating new heart cells and restoring heart function. It was called the “Orlic paper” because the first author of the paper was Donald Orlic, but the lead investigator of the study was Piero Anversa, a professor and highly respected scientist at New York Medical College.
Anversa had established himself as one of the world’s leading experts on the survival and death of heart muscle cells in the 1980s and 1990s, but with the start of the new millennium, Anversa shifted his laboratory’s focus towards the emerging field of stem cell biology and its role in cardiovascular regeneration. The Orlic paper was just one of several highly influential stem cell papers to come out of Anversa’s lab at the onset of the new millenium. A 2002 Anversa paper in the New England Journal of Medicine – the world’s most highly cited academic journal –investigated the hearts of human organ transplant recipients. This study showed that up to 10% of the cells in the transplanted heart were derived from the recipient’s own body. The only conceivable explanation was that after a patient received another person’s heart, the recipient’s own cells began maintaining the health of the transplanted organ. The Orlic paper had shown the regenerative power of bone marrow cells in mouse hearts, but this new paper now offered the more tantalizing suggestion that even human hearts could be regenerated by circulating stem cells in their blood stream.
A 2003 publication in Cell by the Anversa group described another ground-breaking discovery, identifying a reservoir of stem cells contained within the heart itself. This latest coup de force found that the newly uncovered heart stem cell population resembled the bone marrow stem cells because both groups of cells bore the same stem cell protein called c-kit and both were able to make new heart muscle cells. According to Anversa, c-kit cells extracted from a heart could be re-injected back into a heart after a heart attack and regenerate more than half of the damaged heart!
These Anversa papers revolutionized cardiovascular research. Prior to 2001, most cardiovascular researchers believed that the cell turnover in the adult mammalian heart was minimal because soon after birth, heart cells stopped dividing. Some organs or tissues such as the skin contained stem cells which could divide and continuously give rise to new cells as needed. When skin is scraped during a fall from a bike, it only takes a few days for new skin cells to coat the area of injury and heal the wound. Unfortunately, the heart was not one of those self-regenerating organs. The number of heart cells was thought to be more or less fixed in adults. If heart cells were damaged by a heart attack, then the affected area was replaced by rigid scar tissue, not new heart muscle cells. If the area of damage was large, then the heart’s pump function was severely compromised and patients developed the chronic and ultimately fatal disease known as “heart failure”.
Anversa’s work challenged this dogma by putting forward a bold new theory: the adult heart was highly regenerative, its regeneration was driven by c-kit stem cells, which could be isolated and used to treat injured hearts. All one had to do was harness the regenerative potential of c-kit cells in the bone marrow and the heart, and millions of patients all over the world suffering from heart failure might be cured. Not only did Anversa publish a slew of supportive papers in highly prestigious scientific journals to challenge the dogma of the quiescent heart, he also happened to publish them at a unique time in history which maximized their impact.
In the year 2001, there were few innovative treatments available to treat patients with heart failure. The standard approach was to use medications that would delay the progression of heart failure. But even the best medications could not prevent the gradual decline of heart function. Organ transplants were a cure, but transplantable hearts were rare and only a small fraction of heart failure patients would be fortunate enough to receive a new heart. Hopes for a definitive heart failure cure were buoyed when researchers isolated human embryonic stem cells in 1998. This discovery paved the way for using highly pliable embryonic stem cells to create new heart muscle cells, which might one day be used to restore the heart’s pump function without resorting to a heart transplant.
The dreams of using embryonic stem cells to regenerate human hearts were soon squashed when the Bush administration banned the generation of new human embryonic stem cells in 2001, citing ethical concerns. These federal regulations and the lobbying of religious and political groups against human embryonic stem cells were a major blow to research on cardiovascular regeneration. Amidst this looming hiatus in cardiovascular regeneration, Anversa’s papers appeared and showed that one could steer clear of the ethical controversies surrounding embryonic stem cells by using an adult patient’s own stem cells. The Anversa group re-energized the field of cardiovascular stem cell research and cleared the path for the first human stem cell treatments in heart disease.
Instead of having to wait for the US government to reverse its restrictive policy on human embryonic stem cells, one could now initiate clinical trials with adult stem cells, treating heart attack patients with their own cells and without having to worry about an ethical quagmire. Heart failure might soon become a disease of the past. The excitement at all major national and international cardiovascular conferences was palpable whenever the Anversa group, their collaborators or other scientists working on bone marrow and cardiac stem cells presented their dizzyingly successful results. Anversa received numerous accolades for his discoveries and research grants from the NIH (National Institutes of Health) to further develop his research program. He was so successful that some researchers believed Anversa might receive the Nobel Prize for his iconoclastic work which had redefined the regenerative potential of the heart. Many of the world’s top universities were vying to recruit Anversa and his group, and he decided to relocate his research group to Harvard Medical School and Brigham and Women’s Hospital 2008.
There were naysayers and skeptics who had resisted the adult stem cell euphoria. Some researchers had spent decades studying the heart and found little to no evidence for regeneration in the adult heart. They were having difficulties reconciling their own results with those of the Anversa group. A number of practicing cardiologists who treated heart failure patients were also skeptical because they did not see the near-miraculous regenerative power of the heart in their patients. One Anversa paper went as far as suggesting that the whole heart would completely regenerate itself roughly every 8-9 years, a claim that was at odds with the clinical experience of practicing cardiologists. Other researchers pointed out serious flaws in the Anversa papers. For example, the 2002 paper on stem cells in human heart transplant patients claimed that the hearts were coated with the recipient’s regenerative cells, including cells which contained the stem cell marker Sca-1. Within days of the paper’s publication, many researchers were puzzled by this finding because Sca-1 was a marker of mouse and rat cells – not human cells! If Anversa’s group was finding rat or mouse proteins in human hearts, it was most likely due to an artifact. And if they had mistakenly found rodent cells in human hearts, so these critics surmised, perhaps other aspects of Anversa’s research were similarly flawed or riddled with artifacts.
At national and international meetings, one could observe heated debates between members of the Anversa camp and their critics. The critics then decided to change their tactics. Instead of just debating Anversa and commenting about errors in the Anversa papers, they invested substantial funds and efforts to replicate Anversa’s findings. One of the most important and rigorous attempts to assess the validity of the Orlic paper was published in 2004, by the research teams of Chuck Murry and Loren Field. Murry and Field found no evidence of bone marrow cells converting into heart muscle cells. This was a major scientific blow to the burgeoning adult stem cell movement, but even this paper could not deter the bone marrow cell champions.
The skeptics who had doubted Anversa’s claims all along may now feel vindicated, but this is not the time to gloat. Instead, the discipline of cardiovascular stem cell biology is now undergoing a process of soul-searching. How was it possible that some of the most widely read and cited papers were based on heavily flawed observations and assumptions? Why did it take more than a decade since the first refutation was published in 2004 for scientists to finally accept that the near-magical regenerative power of the heart turned out to be a pipe dream.
One reason for this lag time is pretty straightforward: It takes a tremendous amount of time to refute papers. Funding to conduct the experiments is difficult to obtain because grant funding agencies are not easily convinced to invest in studies replicating existing research. For a refutation to be accepted by the scientific community, it has to be at least as rigorous as the original, but in practice, refutations are subject to even greater scrutiny. Scientists trying to disprove another group’s claim may be asked to develop even better research tools and technologies so that their results can be seen as more definitive than those of the original group. Instead of relying on antibodies to identify c-kit cells, the 2014 refutation developed a transgenic mouse in which all c-kit cells could be genetically traced to yield more definitive results – but developing new models and tools can take years.
The scientific peer review process by external researchers is a central pillar of the quality control process in modern scientific research, but one has to be cognizant of its limitations. Peer review of a scientific manuscript is routinely performed by experts for all the major academic journals which publish original scientific results. However, peer review only involves a “review”, i.e. a general evaluation of major strengths and flaws, and peer reviewers do not see the original raw data nor are they provided with the resources to replicate the studies and confirm the veracity of the submitted results. Peer reviewers rely on the honor system, assuming that the scientists are submitting accurate representations of their data and that the data has been thoroughly scrutinized and critiqued by all the involved researchers before it is even submitted to a journal for publication. If peer reviewers were asked to actually wade through all the original data generated by the scientists and even perform confirmatory studies, then the peer review of every single manuscript could take years and one would have to find the money to pay for the replication or confirmation experiments conducted by peer reviewers. Publication of experiments would come to a grinding halt because thousands of manuscripts would be stuck in the purgatory of peer review. Relying on the integrity of the scientists submitting the data and their internal review processes may seem naïve, but it has always been the bedrock of scientific peer review. And it is precisely the internal review process which may have gone awry in the Anversa group.
Just like Pygmalion fell in love with Galatea, researchers fall in love with the hypotheses and theories that they have constructed. To minimize the effects of these personal biases, scientists regularly present their results to colleagues within their own groups at internal lab meetings and seminars or at external institutions and conferences long before they submit their data to a peer-reviewed journal. The preliminary presentations are intended to spark discussions, inviting the audience to challenge the veracity of the hypotheses and the data while the work is still in progress. Sometimes fellow group members are truly skeptical of the results, at other times they take on the devil’s advocate role to see if they can find holes in their group’s own research. The larger a group, the greater the chance that one will find colleagues within a group with dissenting views. This type of feedback is a necessary internal review process which provides valuable insights that can steer the direction of the research.
Considering the size of the Anversa group – consisting of 20, 30 or even more PhD students, postdoctoral fellows and senior scientists – it is puzzling why the discussions among the group members did not already internally challenge their hypotheses and findings, especially in light of the fact that they knew extramural scientists were having difficulties replicating the work.
“I think that most scientists, perhaps with the exception of the most lucky or most dishonest, have personal experience with failure in science—experiments that are unreproducible, hypotheses that are fundamentally incorrect. Generally, we sigh, we alter hypotheses, we develop new methods, we move on. It is the data that should guide the science.
In the Anversa group, a model with much less intellectual flexibility was applied. The “Hypothesis” was that c-kit (cd117) positive cells in the heart (or bone marrow if you read their earlier studies) were cardiac progenitors that could: 1) repair a scarred heart post-myocardial infarction, and: 2) supply the cells necessary for cardiomyocyte turnover in the normal heart.
This central theme was that which supplied the lab with upwards of $50 million worth of public funding over a decade, a number which would be much higher if one considers collaborating labs that worked on related subjects.
In theory, this hypothesis would be elegant in its simplicity and amenable to testing in current model systems. In practice, all data that did not point to the “truth” of the hypothesis were considered wrong, and experiments which would definitively show if this hypothesis was incorrect were never performed (lineage tracing e.g.).”
Discarding data that might have challenged the central hypothesis appears to have been a central principle.
According to the whistleblower, Anversa’s group did not just discard undesirable data, they actually punished group members who would question the group’s hypotheses:
“In essence, to Dr. Anversa all investigators who questioned the hypothesis were “morons,” a word he used frequently at lab meetings. For one within the group to dare question the central hypothesis, or the methods used to support it, was a quick ticket to dismissal from your position.“
The group also created an environment of strict information hierarchy and secrecy which is antithetical to the spirit of science:
“The day to day operation of the lab was conducted under a severe information embargo. The lab had Piero Anversa at the head with group leaders Annarosa Leri, Jan Kajstura and Marcello Rota immediately supervising experimentation. Below that was a group of around 25 instructors, research fellows, graduate students and technicians. Information flowed one way, which was up, and conversation between working groups was generally discouraged and often forbidden.
Raw data left one’s hands, went to the immediate superior (one of the three named above) and the next time it was seen would be in a manuscript or grant. What happened to that data in the intervening period is unclear.
A side effect of this information embargo was the limitation of the average worker to determine what was really going on in a research project. It would also effectively limit the ability of an average worker to make allegations regarding specific data/experiments, a requirement for a formal investigation.“
This segregation of information is a powerful method to maintain an authoritarian rule and is more typical for terrorist cells or intelligence agencies than for a scientific lab, but it would definitely explain how the Anversa group was able to mass produce numerous irreproducible papers without any major dissent from within the group.
In addition to the secrecy and segregation of information, the group also created an atmosphere of fear to ensure obedience:
“Although individually-tailored stated and unstated threats were present for lab members, the plight of many of us who were international fellows was especially harrowing. Many were technically and educationally underqualified compared to what might be considered average research fellows in the United States. Many also originated in Italy where Dr. Anversa continues to wield considerable influence over biomedical research.
This combination of being undesirable to many other labs should they leave their position due to lack of experience/training, dependent upon employment for U.S. visa status, and under constant threat of career suicide in your home country should you leave, was enough to make many people play along.
Even so, I witnessed several people question the findings during their time in the lab. These people and working groups were subsequently fired or resigned. I would like to note that this lab is not unique in this type of exploitative practice, but that does not make it ethically sound and certainly does not create an environment for creative, collaborative, or honest science.”
Foreign researchers are particularly dependent on their employment to maintain their visa status and the prospect of being fired from one’s job can be terrifying for anyone.
This is an anonymous account of a whistleblower and as such, it is problematic. The use of anonymous sources in science journalism could open the doors for all sorts of unfounded and malicious accusations, which is why the ethics of using anonymous sources was heavily debated at the recent ScienceOnline conference. But the claims of the whistleblower are not made in a vacuum – they have to be evaluated in the context of known facts. The whistleblower’s claim that the Anversa group and their collaborators received more than $50 million to study bone marrow cell and c-kit cell regeneration of the heart can be easily verified at the public NIH grant funding RePORTer website. The whistleblower’s claim that many of the Anversa group’s findings could not be replicated is also a verifiable fact. It may seem unfair to condemn Anversa and his group for creating an atmosphere of secrecy and obedience which undermined the scientific enterprise, caused torment among trainees and wasted millions of dollars of tax payer money simply based on one whistleblower’s account. However, if one looks at the entire picture of the amazing rise and decline of the Anversa group’s foray into cardiac regeneration, then the whistleblower’s description of the atmosphere of secrecy and hierarchy seems very plausible.
The investigation of Harvard into the Anversa group is not open to the public and therefore it is difficult to know whether the university is primarily investigating scientific errors or whether it is also looking into such claims of egregious scientific misconduct and abuse of scientific trainees. It is unlikely that Anversa’s group is the only group that might have engaged in such forms of misconduct. Threatening dissenting junior researchers with a loss of employment or visa status may be far more common than we think. The gravity of the problem requires that the NIH – the major funding agency for biomedical research in the US – should look into the prevalence of such practices in research labs and develop safeguards to prevent the abuse of science and scientists.
Neutrality is prized by scientists and journalists. Scientists are supposed to report and analyze their scientific research in a neutral fashion. Similarly, journalistic professionalism requires a neutral and objective stance when reporting or analyzing news. Nevertheless, scientists and journalists are also aware of the fact that there is no perfect neutrality. We are all victims of our conscious and unconscious biases and how we report data or events is colored by our biases. Not only is it impossible to be truly “neutral”, but one can even question whether “neutrality” should be a universal mandate. Neutrality can make us passive, especially when we see a clear ethical mandate to take action. Should one report in a neutral manner about genocide instead of becoming an advocate for the victims? Should a scientist who observes a destruction of ecosystems report on this in a neutral manner? Is it acceptable or perhaps even required for such a scientist to abandon neutrality and becoming an advocate to protect the ecosystems?
Science bloggers or science journalists have to struggle to find the right balance between neutrality and advocacy. Political bloggers and journalists who are enthusiastic supporters of a political party will find it difficult to preserve neutrality in their writing, but their target audiences may not necessarily expect them to remain neutral. I am often fascinated and excited by scientific discoveries and concepts that I want to write about, but I also notice how my enthusiasm for science compromises my neutrality. Should science bloggers strive for neutrality and avoid advocacy? Or is it understood that their audiences do not expect neutrality?
One way to increase objectivity and neutrality in science writing is to provide balanced views. When discussing a scientific discovery or concept, one can also cite or reference scientists with opposing views. This underscores that scientific opinion is not a monolith and that most scientific findings can and should be challenged. However, the mandate to provide balance can also lead to “false balance” when two opposing opinions are presented as two equivalent perspectives, even though one of the two sides has little to no scientific evidence to back up its claims. More than 99% of all climatologists agree about the importance of anthropogenic global warming, therefore it would be “false balance” to give equal space to opposing fringe views. Most science bloggers would also avoid “false balance” when it comes to reporting about the scientific value of homeopathy since nearly every scientist in the world agrees that homeopathy has no scientific data to back it up.
But how should science bloggers decide what constitutes “necessary balance” versus “false balance” when writing about areas of research where the scientific evidence is more ambivalent. How about a scientific discovery which 80% of scientists think is a landmark finding and 20% of scientists believe is a fluke? How does one find out about the scientific rigor of the various viewpoints and how should a blog post reflect these differences in opinion? Press releases of universities or research institutions usually only cite the researchers that conducted a scientific study, but how does one find out about other scientists who disagree with the significance of the new study?
3. Anonymous Sources
Most scientific peer review is conducted with anonymous sources. The editors of peer reviewed scientific journals send out newly submitted manuscripts to expert reviewers in the field but they try to make sure that the names of the reviewers remain confidential. This helps ensure that the reviewers can comment freely about any potential flaws in the manuscript without having to fear retaliation from the authors who might be incensed about the critique. Even in the post-publication phase, anonymous commenters can leave critical comments about a published study at the post-publication peer review website PubPeer. The comments made by anonymous as well as identified commenters at PubPeer played an important role in raising questions about recent controversial stem cell papers. On the other hand, anonymous sources may also use their cover to make baseless accusations and malign researchers. In the case of journals, the responsibility lies with the editors to ensure that their anonymous reviewers are indeed behaving in a professional manner and not abusing their anonymity.
Investigative political journalists also often rely on anonymous sources and whistle-blowers to receive critical information that would have otherwise been impossible to obtain. Journalists are also trained to ensure that their anonymous sources are credible and that they are not abusing their anonymity.
Should science bloggers and science journalists also consider using anonymous sources? Would unnamed scientists provide a more thorough critical appraisal of the quality of scientific research or would this open the door to abuse?
I hope that you leave comments on this post, tweet your thoughts using the #scioStandards hashtag and discuss your views at the Science Online conference.
I will be facilitating the discussion at this session, which will take place at noon on Saturday, March 1, just before the final session of the conference. The title of the session is rather vague, and the purpose of the session is for attendees to exchange their views on whether we can agree on certain scientific and journalistic standards for science blogging.
Individual science bloggers have very different professional backgrounds and they also write for a rather diverse audience. Some bloggers are part of larger networks, others host a blog on their own personal website. Some are paid, others write for free. Most bloggers have developed their own personal styles for how they write about scientific studies, the process of scientific discovery, science policy and the lives of people involved in science. Considering the heterogeneity in the science blogging community, is it even feasible to identify “standards” for scientific blogging? Are there some core scientific and journalistic standards that most science bloggers can agree on? Would such “standards” merely serve as informal guidelines or should they be used as measures to assess the quality of science blogging?
These are the kinds of questions that we will try to discuss at the session. I hope that we will have a lively discussion, share our respective viewpoints and see what we can learn from each other. To gauge the interest levels of the attendees, I am going to pitch a few potential discussion topics on this blog and use your feedback to facilitate the discussion. I would welcome all of your responses and comments, independent of whether you intend to attend the conference or the session. I will also post these questions in the Science Online discussion forum.
One of the challenges we face when we blog about specific scientific studies is determining how much background reading is necessary to write a reasonably accurate blog post. Most science bloggers probably read the original research paper they intend to write about, but even this can be challenging at times. Scientific papers aren’t very long. Journals usually restrict the word count of original research papers to somewhere between 2,000 words to 8,000 words (depending on each scientific journal’s policy and whether the study is a published as a short communication or a full-length article). However, original research papers are also accompanied four to eight multi-paneled figures with extensive legends.
Nowadays, research papers frequently include additional figures, data-sets and detailed descriptions of scientific methods that are published online and not subject to the word count limit. A 2,000 word short communication with two data figures in the main manuscript may therefore be accompanied by eight “supplemental” online-only figures and an additional 2,000 words of text describing the methods in detail. A single manuscript usually summarizes the results of multiple years of experimental work, which is why this condensed end-product is quite dense. It can take hours to properly study the published research study and understand the intricate details.
Is it enough to merely read the original research paper in order to blog about it? Scientific papers include a brief introduction section, but these tend to be written for colleagues who are well-acquainted with the background and significance of the research. However, unless one happens to blog about a paper that is directly related to one’s own work, most of us probably need additional background reading to fully understand the significance of a newly published study.
An expert on liver stem cells, for example, who wants blog about the significance of a new paper on lung stem cells will probably need substantial amount of additional background reading. One may have to read at least one or two older research papers by the authors or their scientific colleagues / competitors to grasp what makes the new study so unique. It may also be helpful to read at least one review paper (e.g. a review article summarizing recent lung stem cell discoveries) to understand the “big picture”. Some research papers are accompanied by scientific editorials which can provide important insights into the strengths and limitations of the paper in question.
All of this reading adds up. If it takes a few hours to understand the main paper that one intends to blog about, and an additional 2-3 hours to read other papers or editorials, a science blogger may end up having to invest 4-5 hours of reading before one has even begun to write the intended blog post.
What strategies have science bloggers developed to manage their time efficiently and make sure they can meet (external or self-imposed) deadlines but still complete the necessary background reading?
Should bloggers provide references and links to the additional papers they consulted?
Should bloggers try to focus on a narrow area of expertise so that over time they develop enough of a background in this niche area so that they do not need so much background reading?
Are there major differences in the expectations of how much background reading is necessary? For example, does an area such as stem cell research or nanotechnology require far more background reading because every day numerous new papers are published and it is so difficult to keep up with the pace of the research?
Is it acceptable to take short-cuts? Could one just read the paper that one wants to blog about and forget about additional background reading, hoping that the background provided in the paper is sufficient and balanced?
Can one avoid reading the supplementary figures or texts of a paper and just stick to the main text of a paper, relying on the fact that the peer reviewers of the published paper would have caught any irregularities in the supplementary data?
Is it possible to primarily rely on a press release or an interview with the researchers of the paper and just skim the results of the paper instead of spending a few hours trying to read the original paper?
Or do such short-cuts compromise the scientific and journalistic quality of science blogs?
Would a discussion about expectations, standards and strategies to manage background reading be helpful for participants of the session?
Here is an excerpt from my latest post on the 3Quarksdaily blog:
Beware of what you share.Employers now routinely utilize internet search engines or social network searches to obtain information about job applicants. A survey of 2,184 hiring managers and human resource professionals conducted by the online employment website CareerBuilder.com revealed that 39% use social networking sites to research job candidates. Of the group who used social networks to evaluate job applicants, 43% found content on a social networking site that caused them to not hire a candidate, whereas only 19% found information that that has caused them to hire a candidate. The top reasons for rejecting a candidate based on information gleaned from social networking sites were provocative or inappropriate photos/information, including information about the job applicants’ history of substance abuse. This should not come as a surprise to job applicants in the US. After all, it is not uncommon for employers to invade the privacy of job applicants by conducting extensive background searches, ranging from the applicant’s employment history and credit rating to checking up on any history of lawsuits or run-ins with law enforcement agencies. Some employers also require drug testing of job applicants. The internet and social networking websites merely offer employers an additional array of tools to scrutinize their applicants. But how do we feel about digital sleuthing when it comes to relationship that is very different than the employer-applicant relationship – one which is characterized by profound trust, intimacy and respect, such as the relationship between healthcare providers and their patients?
The Hastings Center Report is a peer-reviewed academic bioethics journal which discusses the ethics of “Googling a Patient” in its most recent issue. It first describes a specific case of a twenty-six year old patient who sees a surgeon and requests a prophylactic mastectomy of both breasts. She says that she does not have breast cancer yet, but that her family is at very high risk for cancer. Her mother, sister, aunts, and a cousin have all had breast cancer; a teenage cousin had ovarian cancer at the age of nineteen; and that her brother was treated for esophageal cancer at the age of fifteen. She also says that she herself has suffered from a form of skin cancer (melanoma) at the age of twenty-five and that she wants to undergo the removal of her breasts without further workup because she wants to avoid developing breast cancer. She says that her prior mammogram had already shown abnormalities and she had been told by another surgeon that she needed the mastectomy.
Such prophylactic mastectomies, i.e. removal of both breasts, are indeed performed if young women are considered to be at very high risk for breast cancer based on their genetic profile and family history. The patient’s family history – her mother, sister and aunts being diagnosed with breast cancer – are indicative of a very high risk, but other aspects of the history such as her brother developing esophageal cancer at the age of fifteen are rather unusual. The surgeon confers with the patient’s primary care physician prior to performing the mastectomy and is puzzled by the fact that the primary care physician cannot confirm many of the claims made by the patient regarding her prior medical history or her family history. The physicians find no evidence of the patient ever having been diagnosed with a melanoma and they also cannot find documentation of the prior workup. The surgeon then asks a genetic counselor to meet with the patient and help resolve the discrepancies. During the evaluation process, the genetic counselor decides to ‘google’ the patient.
The genetic counselor finds two Facebook pages that are linked to the patient. One page appears to be a personal profile of the patient, stating that in addition to battling stage four melanoma (a very advanced stage of skin cancer with very low survival rates), she has recently been diagnosed with breast cancer. She also provides a link to a website soliciting donations to attend a summit for young cancer patients. The other Facebook page shows multiple pictures of the patient with a bald head, suggesting that she is undergoing chemotherapy, which is obviously not true according to what the genetic counselor and the surgeon have observed. Once this information is forwarded to the surgeon, he decides to cancel the planned surgery. It is not clear why the patient was intent on having the mastectomy and what she would gain from it, but the obtained information from the Facebook pages and the previously noted discrepancies are reason enough for the surgeon to rebuff the patient’s request for the surgery.
If you want to learn more about how ethics experts analyzed the situation and how common it is for psychologists enrolled in doctoral programs to use search engines or social networking sites in order to obtain more information about their patients/clients, please read the complete article at 3Quarksdaily.com.
There is a fundamental asymmetry that exists in contemporary peer review of scientific papers. Most scientific journals do not hide the identity of the authors of a submitted manuscript. The scientific reviewers, on the other hand, remain anonymous. Their identities are only known to the editors, who use the assessments of these scientific reviewers to help decide whether or not to accept a scientific manuscript. Even though the comments of the reviewers are usually passed along to the authors of the manuscript, the names of the reviewers are not. There is a good reason for that. Critical comments of peer reviewers can lead to a rejection of a manuscript, or cause substantial delays in its publication, sometimes requiring many months of additional work that needs to be performed by the scientists who authored the manuscript. Scientists who receive such criticisms are understandably disappointed, but in some cases this disappointment can turn into anger and could potentially even lead to retributions against the peer reviewers, if their identities were ever disclosed. The cloak of anonymity thus makes it much easier for peer reviewers to offer honest and critical assessments of the submitted manuscript.
Unfortunately, this asymmetry – the peer reviewers knowing the names of the authors but the authors not knowing the names of the peer reviewers – can create problems. Some peer reviewers may be biased either against or in favor of a manuscript merely because they recognize the names of the authors or the institutions at which the authors work. There is an expectation that peer reviewers judge a paper only based on its scientific merit, but knowledge of the authors could still consciously or subconsciously impact the assessments made by the peer reviewers. Scientific peer reviewers may be much more lenient towards manuscripts of colleagues that they have known for many years and who they consider to be their friends. The reviewers may be more critical of manuscripts submitted by rival groups with whom they have had hostile exchanges in the past or by institutions that they do not trust. A recent study observed that scientists who review applications of students exhibit a subtle gender bias that favors male students, and it may be possible that similar gender bias exists in the peer review evaluation of manuscripts.
The journals Nature Geoscience and Nature Climate Change of the Nature Publishing Group have recently announced a new “Double-blind peer review” approach to correct this asymmetry. The journals will allow authors to remain anonymous during the peer review process. The hope is that hiding the identities of the authors could reduce bias among peer reviewers. The journals decided to implement this approach on a trial basis following a survey, in which three-quarters of respondents were supportive of a double-blind peer review. As the announcement correctly points out, this will only work if the authors are willing to phrase their paper in a manner that does not give away their identity. Instead of writing “as we have previously described”, authors write “as has been previously described” when citing prior publications.
The editors of Nature Geoscience state:
From our experience, authors who try to guess the identity of a referee are very often wrong. It seems unlikely that referees will be any more successful when guessing the identity of authors.
I respectfully disagree with this statement. Reviewers can remain anonymous because they rarely make direct references to their own work in the review process. Authors of a scientific manuscript, on the other hand, often publish a paper in the context of their own prior work. Even if the names and addresses of the authors were hidden on the title page and even if the usage of first-person pronouns in the context of prior publications was omitted, the manuscript would likely still contain multiple references to a group’s prior work. These references as well as any mentions of an institution’s facilities or administrative committees that approve animal and human studies could potentially give away the identity of the authors. It would be much easier for reviewers to guess the identity of some of the authors than for authors to guess the identity of the reviewers.
But even if referees correctly identify the research group that a paper is coming from, they are much less likely to guess who the first author is. One of our motivations for setting up a double-blind trial is the possibility that female authors are subjected to tougher peer review than their male colleagues — a distinct possibility in view of evidence that subtle gender biases affect assessments of competence, appropriate salaries and other aspects of academic life (Proc. Natl Acad. Sci. USA 109, 16474–16479; 2012). If the first author is unknown, this bias will be largely removed.
The double-blind peer review system would definitely make it harder to guess the identity of the first author and would remove biases of reviewers associated with knowing the identity of first authors. The references to prior work would enable a reviewer to infer that the submitted manuscript was authored by the research group of the senior scientist X at the University Y, but it would be nearly impossible for the reviewer to ascertain the identity of the first authors (often postdoctoral fellows, graduate students or junior faculty members). However, based on my discussions with fellow peer reviewers, I think that it is rather rare for reviewers to have a strong bias against or in favor of first authors. The biases are usually associated with knowing the identity of the senior or lead authors.
Many scientists would agree that there is a need for reforming the peer review process and that we need to reduce biased assessments of submitted manuscripts. However, I am not convinced that increasing blindness is necessarily the best approach. In addition to the asymmetry of anonymity in contemporary peer review, there is another form of asymmetry that should be addressed: Manuscripts are eventually made public, the comments of peer reviewers usually are not made public.
This asymmetry allows some peer reviewers to be sloppy in their assessments of manuscripts. While some peer reviewers provide thoughtful and constructive criticism, others just make offhanded comments, either dismissing a manuscript for no good reason or sometimes accepting it without carefully evaluating all its strengths and weaknesses. The solution to this problem is not increasing “blindness”, but instead increasing transparency of the peer review process. The open access journal F1000Research has a post-publication review process for scientific manuscripts, in which a paper is first published and the names and assessments of the referees are openly disclosed. The open access journal PeerJ offers an alternate approach, in which peer reviewers can choose to either disclose their names or to stay anonymous and authors can choose to disclose the comments they received during the peer review process. This “pro-choice” model would allow reviewers to remain anonymous even if the authors choose to publicly disclose the reviewer comments.
Scientific peer review can play an important role in ensuring the quality of science, if it is conducted appropriately and provides reasonably objective and constructive critiques. Constructive criticism is essential for the growth of scientific knowledge. It is important that we foster a culture of respect for criticism in science, whether it occurs during the peer review process or when science writers analyze published studies. “Double blind” is an excellent way to collect experimental data, because it reduces the bias of the experimenter, but it may not be the best way to improve peer review. When it comes to peer review and scientific criticism, we should strive for more transparency and a culture of mutual respect and dialogue.
The ENCODE (Encyclopedia Of DNA Elements) project received quite a bit of attention when its results were publicized last year. This project involved a very large consortium of scientists with the goal to identify all the functional elements in the human genome. In September 2012, 30 papers were published in a coordinated release and their extraordinary claim was that roughly 80% of the human genome was “functional”. This was in direct contrast to the prevailing view among molecular biologists that the bulk of human DNA was just “junk DNA”, i.e. sequences of DNA for which one could not assign any specific function. The ENCODE papers contained huge amounts of data, collating the work of hundreds of scientists who had worked on this for nearly a decade. But what garnered most attention, among scientists, the media and the public was the “80%” claim and the supposed “death of junk DNA“.
Soon after the discovery of DNA, the primary function ascribed to DNA was its role as a template from which messenger RNA could be transcribed and then translated into functional proteins. Using this definition of “function”, only 1-2% of the human DNA would be functional because they actually encoded for proteins. The term “junk DNA” was coined to describe the 98-99% of non-coding DNA which appeared to primarily represent genetic remnants of our evolutionary past without any specific function in our present day cells.
However, in the past decades, scientists have uncovered more and more functions for the non-coding DNA segments that were previously thought to be merely “junk”. Non-coding DNA can, for example, act as a binding site for regulatory proteins and exert an influence on protein-coding DNA. There has also been an increasing awareness of the presence of various types of non-coding RNA molecules, i.e. RNA molecules which are transcribed from the DNA but not subsequently translated into proteins. Some of these non-coding RNAs have known regulatory functions, others may not have any or their functions have not yet been established.
Despite these discoveries, most scientists were in agreement that only a small fraction of DNA was “functional”, even when all the non-coding pieces of DNA with known functions were included. The bulk of our genome was still thought to be non-functional. The term “junk DNA” was used less frequently by scientists, because it was becoming apparent that we were probably going to discover even more functional elements in the non-coding DNA.
In September 2012, everyone was talking about “junk DNA” again, because the ENCODE scientists claimed their data showed that 80% of the human genome was “functional”. Most scientists had expected that the ENCODE project would uncover some new functions for non-coding DNA, but the 80% figure was way out of proportion to what everyone had expected. The problem was that the ENCODE project used a very low bar for “function”. Binding to the DNA or any kind of chemical DNA modification was already seen as a sign of “function”, without necessarily proving that these pieces of DNA had any significant impact on the function of a cell.
3. Josh Witten points out the irony of Graur accusing ENCODE of seeking hype, even though Graur and his colleagues seem to use sarcasm and ridicule to also increase the visibility of their work. I think Josh’s blog post is an excellent analysis of the problems with ENCODE and the problems associated with Graur’s tone.
On Twitter, I engaged in a debate with Benoit Bruneau, my fellow Scilogs blogger Malcolm Campbell and Jonathan Eisen and I thought it would be helpful to share the Storify version here. There was a general consensus that even though some of the points mentioned by Graur and colleagues are indeed correct, their sarcastic tone was uncalled for. Scientists can be critical of each other, but can and should do so in a respectful and professional manner, without necessarily resorting to insults or mockery. //storify.com/jalees_rehman/encode-debate.js
[<a href=”//storify.com/jalees_rehman/encode-debate” target=”_blank”>View the story “ENCODE controversy and professionalism in scientific debates” on Storify</a>]
Graur D, Zheng Y, Price N, Azevedo RB, Zufall RA, & Elhaik E (2013). On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome biology and evolution PMID: 23431001
According to most of these news reports, the design of the study was rather straightforward. Schoolchildren ages 9 to 11 in a Vancouver school district were randomly assigned to two groups for a four week intervention: Half of the children were asked to perform kind acts, while the other half were asked to keep track of pleasant places they visited. Happiness and acceptance by their peers was assessed at the beginning and the end of the four week intervention period. The children were allowed to choose the “acts of kindness” or the “pleasant places”. The “acts of kindness” group chose acts such as sharing their lunch or giving their mothers a hug. The “pleasant places” group chose to visit places such as the playground or a grandparent’s house.
At the end of the four week intervention, both groups of children showed increased signs of happiness, but the news reports differed in terms of the impact of the intervention on the acceptance of the children.
The students were asked to report how happy they were and identify classmates they would like to work with in school activities. After four weeks, both groups said they were happier, but the kids who had performed acts of kindness reported experiencing greater acceptance from their peers – they were chosen most often by other students as children the other students wanted to work with.
The Huffington Post interpretation (a re-post from Livescience) was that the children performing the “acts of kindness” became more accepted by others, i.e. more popular.
Which of the two interpretations was the correct one? Furthermore, how significant were the improvements in happiness and acceptance?
I decided to read the original PLOS One paper and I was quite surprised by what I found:
The manuscript (in its published form, as of December 27, 2012) had no figures and no tables in the “Results” section. The entire “Results” section consisted of just two short paragraphs. The first paragraph described the affect and happiness scores:
Consistent with previous research, overall, students in both the kindness and whereabouts groups showed significant increases in positive affect (γ00 = 0.15, S.E. = 0.04, t(17) = 3.66, p<.001) and marginally significant increases in life satisfaction (γ00 = 0.09, S.E. = 0.05, t(17) = 1.73, p = .08) and happiness (γ00 = 0.11, S.E. = 0.08, t(17) = 1.50, p = .13). No significant differences were detected between the kindness and whereabouts groups on any of these variables (all ps>.18). Results of t-tests mirrored these analyses, with both groups independently demonstrating increases in positive affect, happiness, and life satisfaction (all ts>1.67, all ps<.10).
There are no actual values given, so it is difficult to know how big the changes are. If a starting score is 15, then a change of 1.5 is only a 10% change. On the other hand, if the starting score is 3, then a change of 1.5 represents a 50% change. The Methods section of the paper also does not describe the statistics employed to analyze the data. Just relying on arbitrary p-value thresholds is problematic, but if one were to use the infamous p-value threshold of 0.05 for significance, one can assume that there was a significant change in the affect or mood of children (p-value <0.001), a marginally significant trend of increased life satisfaction (p-value of 0.08) and no really significant change in happiness (p-value of 0.13).
It is surprising that the authors do not show the actual scores for each of the two groups. After all, one of the goals of the study was to test whether performing “acts of kindness” has a bigger impact on happiness and acceptance than the visiting “pleasant places” (“whereabouts” group). There is a generic statement “ No significant differences were detected between the kindness and whereabouts groups on any of these variables (all ps>.18).”, but what were the actual happiness and satisfaction scores for each of the groups? The next sentence is also cryptic: “Results of t-tests mirrored these analyses, with both groups independently demonstrating increases in positive affect, happiness, and life satisfaction (all ts>1.67, all ps<.10).” Does this mean that p<0.1 was the threshold of significance?Do these p-values refer to the post-intervention versus pre-intervention analysis for each tested variable in each of the two groups? If yes, why not show the actual data for both groups?
The second (and final) paragraph of the Results section described acceptance of the children by their peers. Children were asked who they would like to “would like to be in school activities [i.e., spend time] with’’:
All students increased in the raw number of peer nominations they received from classmates (γ00 = 0.68, S.E. = 0.27, t(17) = 2.37, p = .02), but those who performed kind acts (M = +1.57; SD = 1.90) increased significantly more than those who visited places (M = +0.71; SD = 2.17), γ01 = 0.83, S.E. = 0.39, t(17) = 2.10, p = .05, gaining an average of 1.5 friends. The model excluded a nonsignificant term controlling for classroom size (p = .12), which did not affect the significance of the kindness term. The effects of changes in life satisfaction, happiness, and positive affect on peer acceptance were tested in subsequent models and all found to be nonsignificant (all ps>.54). When controlling for changes in well-being, the effect of the kindness condition on peer acceptance remained significant. Hence, changes in well-being did not predict changes in peer acceptance, and the effect of performing acts of kindness on peer acceptance was over and above the effect of changes in well-being.
This is again just a summary of the data, and not the actual data itself. Going to “pleasant places” increased the average number of “friends” (I am not sure I would use “friend” to describe someone who nominates me as a potential partner in a school activity) by 0.71, performing “acts of kindness” increased the average number of friends by 1.57. It did answer the question that was raised by the conflicting news reports. According to the presented data, the “acts of kindness” kids were more accepted by others and there was no data on whether they also became more accepting of others. I then looked at the Methods section to understand the statistics and models used for the analysis and found that there were no details included in the paper. The Methods section just ended with the following sentences:
Pre-post changes in self-reports and peer nominations were analyzed using multilevel modeling to account for students’ nesting within classrooms. No baseline condition differences were found on any outcome variables. Further details about method and results are available from the first author.
Based on reviewing the actual paper, I am quite surprised that PLOS One accepted it for publication. There are minimal data presented in the paper, no actual baseline scores regarding peer acceptance or happiness, incomplete methods and the rather grand title of “Kindness Counts: Prompting Prosocial Behavior in Preadolescents Boosts Peer Acceptance and Well-Being” considering the marginally significant data. One is left with many unanswered questions:
1) What if kids had not been asked to perform additional “acts of kindness” or additional visits to “pleasant places” and had instead merely logged these positive activities that they usually performed as part of their routine? This would have been a very important control group.
2) Why did the authors only show brief summaries of the analyses and omit to show all of the actual affect, happiness, satisfaction and peer acceptance data?
3) Did the kids in both groups also become more accepting of their peers?
It is quite remarkable that going to places one likes, such as a shopping mall is just as effective pro-social behavior (performing “acts of kindness”) in terms of improving happiness and well-being. The visits to pleasant places also helped gain peer acceptance, just not quite as much as performing acts of kindness. However, the somewhat selfish sounding headline “Hanging out at the mall makes kids happier and a bit more popular” is not as attractive as the warm and fuzzy headline “Random acts of kindness can make kids more popular“. This may be the reason why the “prosocial” or “kindness” aspect of this study was emphasized so strongly by the news media.
In summary, the limited data in this published paper suggests that children who are asked to intentionally hang out at places they like and keep track of these for four weeks seem to become happier, similar to kids who make an effort to perform additional acts of kindness. Both groups of children gain acceptance by their peers, but the children who perform acts of kindness fare slightly better. There are no clear descriptions of the statistical methods, no actual scores for the two groups (only the changes in scores are shown) and important control groups (such as children who keep track of their positive activities, without increasing them) are missing. Therefore, definitive conclusions cannot be drawn from these limited data. Unfortunately, none of the above-mentioned news reports highlighted the weaknesses, and instead jumped on the bandwagon of interpreting this study as scientific evidence for the importance of kindness. Some of the titles of the news reports even made references to bullying, even though bullying was not at all assessed in the study.
This does not mean that we should discourage our children from being kind. On the contrary, there are many moral reasons to encourage our children to be kind, and there is no need for a scientific justification for kindness. However, if one does invoke science as a reason for kindness, it should be based on scientifically rigorous and comprehensive data.