Murder Your Darling Hypotheses But Do Not Bury Them

“Whenever you feel an impulse to perpetrate a piece of exceptionally fine writing, obey it—whole-heartedly—and delete it before sending your manuscript to press. Murder your darlings.”

Sir Arthur Quiller-Couch (1863–1944). On the Art of Writing. 1916


Murder your darlings. The British writer Sir Arthur Quiller Crouch shared this piece of writerly wisdom when he gave his inaugural lecture series at Cambridge, asking writers to consider deleting words, phrases or even paragraphs that are especially dear to them. The minute writers fall in love with what they write, they are bound to lose their objectivity and may not be able to judge how their choice of words will be perceived by the reader. But writers aren’t the only ones who can fall prey to the Pygmalion syndrome. Scientists often find themselves in a similar situation when they develop “pet” or “darling” hypotheses.

Hypothesis via Shutterstock
Hypothesis via Shutterstock

How do scientists decide when it is time to murder their darling hypotheses? The simple answer is that scientists ought to give up scientific hypotheses once the experimental data is unable to support them, no matter how “darling” they are. However, the problem with scientific hypotheses is that they aren’t just generated based on subjective whims. A scientific hypothesis is usually put forward after analyzing substantial amounts of experimental data. The better a hypothesis is at explaining the existing data, the more “darling” it becomes. Therefore, scientists are reluctant to discard a hypothesis because of just one piece of experimental data that contradicts it.

In addition to experimental data, a number of additional factors can also play a major role in determining whether scientists will either discard or uphold their darling scientific hypotheses. Some scientific careers are built on specific scientific hypotheses which set apart certain scientists from competing rival groups. Research grants, which are essential to the survival of a scientific laboratory by providing salary funds for the senior researchers as well as the junior trainees and research staff, are written in a hypothesis-focused manner, outlining experiments that will lead to the acceptance or rejection of selected scientific hypotheses. Well written research grants always consider the possibility that the core hypothesis may be rejected based on the future experimental data. But if the hypothesis has to be rejected then the scientist has to explain the discrepancies between the preferred hypothesis that is now falling in disrepute and all the preliminary data that had led her to formulate the initial hypothesis. Such discrepancies could endanger the renewal of the grant funding and the future of the laboratory. Last but not least, it is very difficult to publish a scholarly paper describing a rejected scientific hypothesis without providing an in-depth mechanistic explanation for why the hypothesis was wrong and proposing alternate hypotheses.

For example, it is quite reasonable for a cell biologist to formulate the hypothesis that protein A improves the survival of neurons by activating pathway X based on prior scientific studies which have shown that protein A is an activator of pathway X in neurons and other studies which prove that pathway X improves cell survival in skin cells. If the data supports the hypothesis, publishing this result is fairly straightforward because it conforms to the general expectations. However, if the data does not support this hypothesis then the scientist has to explain why. Is it because protein A did not activate pathway X in her experiments? Is it because in pathway X functions differently in neurons than in skin cells? Is it because neurons and skin cells have a different threshold for survival? Experimental results that do not conform to the predictions have the potential to uncover exciting new scientific mechanisms but chasing down these alternate explanations requires a lot of time and resources which are becoming increasingly scarce. Therefore, it shouldn’t come as a surprise that some scientists may consciously or subconsciously ignore selected pieces of experimental data which contradict their darling hypotheses.

Let us move from these hypothetical situations to the real world of laboratories. There is surprisingly little data on how and when scientists reject hypotheses, but John Fugelsang and Kevin Dunbar at Dartmouth conducted a rather unique study “Theory and data interactions of the scientific mind: Evidence from the molecular and the cognitive laboratory” in 2004 in which they researched researchers. They sat in at scientific laboratory meetings of three renowned molecular biology laboratories at carefully recorded how scientists presented their laboratory data and how they would handle results which contradicted their predictions based on their hypotheses and models.

In their final analysis, Fugelsang and Dunbar included 417 scientific results that were presented at the meetings of which roughly half (223 out of 417) were not consistent with the predictions. Only 12% of these inconsistencies lead to change of the scientific model (and thus a revision of hypotheses). In the vast majority of the cases, the laboratories decided to follow up the studies by repeating and modifying the experimental protocols, thinking that the fault did not lie with the hypotheses but instead with the manner how the experiment was conducted. In the follow up experiments, 84 of the inconsistent findings could be replicated and this in turn resulted in a gradual modification of the underlying models and hypotheses in the majority of the cases. However, even when the inconsistent results were replicated, only 61% of the models were revised which means that 39% of the cases did not lead to any significant changes.

The study did not provide much information on the long-term fate of the hypotheses and models and we obviously cannot generalize the results of three molecular biology laboratory meetings at one university to the whole scientific enterprise. Also, Fugelsang and Dunbar’s study did not have a large enough sample size to clearly identify the reasons why some scientists were willing to revise their models and others weren’t. Was it because of varying complexity of experiments and models? Was it because of the approach of the individuals who conducted the experiments or the laboratory heads? I wish there were more studies like this because it would help us understand the scientific process better and maybe improve the quality of scientific research if we learned how different scientists handle inconsistent results.

In my own experience, I have also struggled with results which defied my scientific hypotheses. In 2002, we found that stem cells in human fat tissue could help grow new blood vessels. Yes, you could obtain fat from a liposuction performed by a plastic surgeon and inject these fat-derived stem cells into animal models of low blood flow in the legs. Within a week or two, the injected cells helped restore the blood flow to near normal levels! The simplest hypothesis was that the stem cells converted into endothelial cells, the cell type which forms the lining of blood vessels. However, after several months of experiments, I found no consistent evidence of fat-derived stem cells transforming into endothelial cells. We ended up publishing a paper which proposed an alternative explanation that the stem cells were releasing growth factors that helped grow blood vessels. But this explanation was not as satisfying as I had hoped. It did not account for the fact that the stem cells had aligned themselves alongside blood vessel structures and behaved like blood vessel cells.

Even though I “murdered” my darling hypothesis of fat –derived stem cells converting into blood vessel endothelial cells at the time, I did not “bury” the hypothesis. It kept ruminating in the back of my mind until roughly one decade later when we were again studying how stem cells were improving blood vessel growth. The difference was that this time, I had access to a live-imaging confocal laser microscope which allowed us to take images of cells labeled with red and green fluorescent dyes over long periods of time. Below, you can see a video of human bone marrow mesenchymal stem cells (labeled green) and human endothelial cells (labeled red) observed with the microscope overnight. The short movie compresses images obtained throughout the night and shows that the stem cells indeed do not convert into endothelial cells. Instead, they form a scaffold and guide the endothelial cells (red) by allowing them to move alongside the green scaffold and thus construct their network. This work was published in 2013 in the Journal of Molecular and Cellular Cardiology, roughly a decade after I had been forced to give up on the initial hypothesis. Back in 2002, I had assumed that the stem cells were turning into blood vessel endothelial cells because they aligned themselves in blood vessel like structures. I had never considered the possibility that they were scaffold for the endothelial cells.

This and other similar experiences have lead me to reformulate the “murder your darlings” commandment to “murder your darling hypotheses but do not bury them”. Instead of repeatedly trying to defend scientific hypotheses that cannot be supported by emerging experimental data, it is better to give up on them. But this does not mean that we should forget and bury those initial hypotheses. With newer technologies, resources or collaborations, we may find ways to explain inconsistent results years later that were not previously available to us. This is why I regularly peruse my cemetery of dead hypotheses on my hard drive to see if there are ways of perhaps resurrecting them, not in their original form but in a modification that I am now able to test.



Fugelsang, J., Stein, C., Green, A., & Dunbar, K. (2004). Theory and Data Interactions of the Scientific Mind: Evidence From the Molecular and the Cognitive Laboratory. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 58 (2), 86-95 DOI: 10.1037/h0085799


Note: An earlier version of this article first appeared on 3Quarksdaily.

New White House Budget: NIH funding will not be restored to pre-sequester levels

The Federation of American Societies for Experimental Biology (FASEB) recommended that the White House increase the annual NIH budget to $32 billion dollars to help restore US biomedical research funding levels to those of 2003 (link):


The broad program of research supported by NIH is essential for advancing our understanding of basic biological functions, reducing human suffering, and protecting the country against new and re-emerging disease threats. Biomedical research is also a primary source of new innovations in health care and other areas.

Exciting new NIH initiatives are poised to accelerate our progress in the search for cures. It would be tragic if we could not capitalize on the many opportunities before us. The development of a universal vaccine to protect adults and children against both seasonal and pandemic flu and development of gene chips and DNA sequencing technologies that can predict risk for high blood pressure, kidney disease, diabetes, and obesity are just a few of the research breakthroughs that will be delayed if we fail to sustain the investment in NIH. 

As a result of our prior investment, we are the world leader in biomedical research. We should not abdicate our competitive edge. Without adequate funding, NIH will have to sacrifice valuable lines of research. The termination of ongoing studies and the diminished availability of grant support will result in the closure of laboratories and the loss of highly skilled jobs. At a time when we are trying to encourage more students to pursue science and engineering studies, talented young scientists are being driven from science by the disruption of their training and lack of career opportunities. 

Rising costs of research, the increasing complexity of the scientific enterprise, and a loss of purchasing power at NIH due to flat budgets have made it increasingly competitive for individual investigators to obtain funding. Today, only one in six grant applications will be supported, the lowest rate in NIH history. Increasing the NIH budget to $32.0 billion would provide the agency with an additional $1.36 billion which could restore funding for R01 grants (multiyear awards to investigators for specified projects) back to the level achieved in 2003 and support an additional 1,700 researchers while still providing much needed financial support for other critical areas of the NIH portfolio.

Unfortunately, the newly released White House budget for 2015 (PDF) will only provide a minimal increase in annual NIH funding from $29.9 billion to $ 30.2 billion, which is still lower than the pre-sequester $30.6 billion.

It is much lower than what FASEB had suggested and it is going to be increasingly difficult for US biomedical research to sustain its competitive edge. The White House budget also emphasizes neuroscience and Alzheimer’s research:


Biomedical research contributes to improving the health of the American people. The Budget includes $30.2 billion for NIH to support research at institutions across the United States, continuing the Administration’s commitment to investment in Alzheimer’s research and NIH’s contribution to the multiagency BRAIN (Brain Research through Advancing Innovative Neurotechnologies) initiative. The Budget increases funding for innovative, high-risk high-reward research to help spur development of new therapeutics to treat diseases and disorders that affect millions of Americans, such as cancer and Alzheimer’s disease. The Budget includes funding for a new advanced research program modeled after the cutting-edge Defense Advanced Research Projects Agency (DARPA) program at the Department of Defense. NIH will also implement new policies to improve transparency and reduce administrative costs. The Opportunity, Growth, and Security Initiative includes an additional $970 million for NIH, which would support about 650 additional new grants and further increase funding for the BRAIN and DARPA-inspired initiatives, and invest in other critical priorities.

While this is good news for neuroscientists, the essentially flat NIH budget will force the NIH to cut funding to basic biomedical research in non-neuroscience areas including basic cell biology, molecular biology and biochemistry.

The outlook for US biomedical research remains gloomy.


Note: This article was first published on the “Fragments of Truth” blog.

NIH Grant Scores Are Poor Predictors Of Scientific Impact

The most important federal funding mechanism for biomedical research in the United States is the R01 grant proposal submitted to the National Institutes of Health (NIH). Most scientists submitting R01 proposals request around $250,000 per year for 5 years. This may sound like a lot of money, but these requested funds have to pay for the salaries of the research staff including the salary of the principal investigator. The money that is left over once the salaries are subtracted has to cover the costs of new scientific equipment, maintenance contracts for existing equipment, monthly expenses for research reagents such as chemicals, cell lines, cell culture media and molecular biology assay kits, housing animals, user fees for research core facilities….. basically a very long list of expenditures. Universities that submit the grant proposals to the NIH add on their own “indirect costs” to pay for general expenses such as maintaining the building and providing general administrative support, but the researchers and their laboratories rarely receive any of these “indirect costs”.

Instead, the investigators who receive a notification that their R01 proposals have been awarded often find out that the NIH has reduced the requested money by either cutting the annual budget or by shortening the funding period from 5 years to 4 years. They then have to decide how to ensure that their laboratory will survive with the reduced funding, how they can ensure that nobody is forced to lose their jobs and that the research can be conducted under these financial constraints without compromising its scientific rigor. These scientists are the lucky ones, because the vast majority of the R01 proposals do not get funded. And the lack of R01 funding in recent years has forced many scientists to shut down their research laboratories.


When an R01 proposal is submitted to the NIH, it is assigned to one of its institutes such as the NHLBI (National Heart Lung and Blood Institute) or the NCI (National Cancer Institute) depending on the main research focus. Each institute of the NIH is allotted a certain budget for funding extramural applicants, so the institute assignment plays an important role in determining whether or not there is money available to fund the proposal. In addition to the institute assignment, each proposal is also assigned to a panel of expert peer reviewers, so called “study sections”. The study section members are active scientists who review the grant proposals and rank them by assigning scores to each grant. The grant proposals describe experiments that the respective applicants plan to conduct during the next five years. The study section members try to identify grant proposals that describe research which will have the highest impact on the field. They also have to take into account that the proposed work is based on solid preliminary data, that it will yield meaningful results even if the scientific hypotheses of the applicants turn out to be wrong and that the applicants have the necessary expertise and resources to conduct the work.


Identifying the grants that fall in the lower half of the rank list is not too difficult, because study section members can easily spot the grants which present a disorganized plan and rationale for the proposed experiments. But it becomes very challenging to discriminate between grants in the top half. Some study section members may think that a grant is outstanding (e.g. belongs in the top 10th percentile) whereas others may think that it is just good (e.g. belongs in the top 33rd percentile). After the study section members review each other’s critiques of the discussed grant, they usually come to a consensus, but everyone is aware of the difficulties of making such assessments. The very nature of research is the unpredictability of its path. It is impossible to make an objective assessment of the impact of a proposed five-year scientific project because a lot can happen during those five years. For example, nowadays one comes across many grant applications that propose to use the CRISPR genome editing tool to genetically modify cells. This technique has only become broadly available during the last 1-2 years and is quite exciting but we also do not know much about potential pitfalls of the approach. Some study section members are bound to be impressed by applicants who want to use this cutting-edge genome editing technique and rank their proposal highly, whereas other study section members may find this approach too premature. Small differences in the subjective assessments of the potential impact between study section members can result in a grant proposal receiving a 10th percentile score versus a 19th percentile score.


Ten or fifteen years ago, this difference in the percentile score would not have been too tragic because the NIH was funding more than 30% of the submitted research grant applications, but now the success rate has dropped down to 17%! Therefore, the subjective assessment of whether a grant deserves a 10th percentile versus a 19th percentile research impact score can determine whether or not the grant will be funded. This determination in turn will have a major impact on the personal lives and careers of the graduate students, postdoctoral fellows, research assistants and principal investigators who may depend on the funding of the submitted grant in order to keep their jobs and their laboratory running. It would be reassuring to know that the score assigned to a grant application is at least a good prognostic indicator of how much of a scientific impact the proposed research will have. It never feels good to deny research funding to a laboratory, but we also have a duty to fund the best research. If there was indeed a clear association between grant score and future impact, one could at least take solace in the fact that grant applications which received poor scores would have really not resulted in meaningful research.


A recent paper published in Circulation Research, a major cardiovascular research journal, challenges the assumption that the scores a grant application receives can reliably predict the future impact of the research. In the study “Percentile Ranking and Citation Impact of a Large Cohort of NHLBI-Funded Cardiovascular R01 Grants” by Danthi and colleagues, researchers at the National Heart Lung and Blood Institute (NHLBI) reviewed the percentile ranking scores of 1,492 R01 grant applications assigned to the NHLBI as well as the productivity of the funded grants. They assessed grants funded 2001-2008 and the scientific publications ensuing as a result of the funding. Their basic finding is that there is no obvious correlation between the percentile score and the scientific impact, as assessed by the number of publications as well as the number of citations each publication received. The funded R01 grant applications were divided in three categories: Category 1= <10.0 % (i.e. the cream of the crop); Category 2 = 10.0 – 19.9% (i.e. pretty darn good) and Category 3 = 20.0 – 41.8% (good but not stellar). The median number of publications was 8.0 for Category 1, 8.0 for Category 2 and 8.5 for Category 3. This means that even though category 3 grants were deemed to be of significantly worse quality or impact than Category 1 applications, they resulted in just as many scientific publications. But what about the quality of the publications? Did the poorly scored Category 3 grant applications fund research that was of little impact? No, the scientific impact as assessed by citations of the published papers was the same for no matter how the grant applications had been ranked. In fact, the poorly scored grants (Category 3 grants) received less funding but still produced the same amount of publications and citations of the published research as their highly scored Category 1 counterparts.


There are few important limitations to this study. The scientific impact was measured as number of publications and number of citations, which are notoriously poor measures of impact. For example, a controversial paper may be refuted but if it is frequently cited in the context of the refutation, it would be considered “high impact”. Another limitation was the assessment of shared funding. In each category, the median number of grants acknowledged in a paper was 2.5. Because a single paper often involves the collaboration of multiple scientists, the collaborative papers routinely acknowledge all the research funding which contributed to the publication. In order to correct for this, the study adjusted the counts for publications and citations by dividing by the number of acknowledged grants. For example, if a paper cited three grants and garnered 30 citations, each grant would be credited with only a third of a publication (0.3333…) and with 10 citations. This is a rather crude method because it does not take into account that some papers are primarily funded by one grant and other grants may have just provided minor support. It is also not clear from the methodology how the study accounted for funding from other government agencies (such as other NIH institutes or funding from the Department of Veterans Affairs). However, it is noteworthy that when they analyzed the papers that were only funded by one grant, they still found no difference in the productivity of the three categories of percentile scores.  The current study only focused on NHLBI grants (cardiovascular, lung and blood research) so it is not clear whether these findings can be generalized to all NIH grants. A fascinating question that was also not addressed by the study is why the Category 3 grants received the lower score. Did the study section reviewers feel that the applicants were proposing research that was too high-risk? Were the grant applicants unable to formulate their ideas in a cogent fashion? Performing such analyses would require reviewing the study sections’ summary statements for each grant but this cumbersome analysis would be helpful in understanding how we can reform the grant review process.


The results of this study are sobering because they remind us of how bad we are at predicting the future impact of research when we review grant applications. The other important take-home message is that we are currently losing out on quite a bit of important research because the NIH does not receive adequate funding. Back in the years 2001-2008, it was still possible to receive grant funding for grants in Category 3 (percentile ranking 20.0 – 41.8%). However, the NIH budget has remained more or less flat or even suffered major cuts (for example during the sequester) despite the fact that the cost of biomedical research continues to rise and many more investigators are now submitting grant applications to sustain their research laboratories. In the current funding environment, the majority of the Category 3 grants would not be funded despite the fact that they were just as productive as Category 1 grants. By maintaining the current low level of NIH funding, many laboratories will not receive the critical funding they need to conduct cutting edge biomedical research, some of which could have far greater impact than the research conducted by researchers receiving high scores.


Going forward, we need to devise new ways of assessing the quality of research grants to identify the most meritorious grant applications, but we also need to recognize that the NIH is in dire need of a major increase in its annual budget.
Narasimhan Danthi, Colin O Wu, Peibei Shi, & Michael S Lauer (2014). Percentile Ranking and Citation Impact of a Large Cohort of NHLBI-Funded Cardiovascular R01 Grants Circulation Research