Is the Analysis of Gene Expression Based on an Erroneous Assumption?

The MIT-based researcher Rick Young is one of the world’s top molecular biologists. His laboratory at the Whitehead Institute for Biomedical Research has helped define many of the key principles of how gene expression is regulated, especially in stem cells and cancer cells. At a symposium organized by the International Society for Stem Cell Research (ISSCR), Rick presented some very provocative data today, which is bound to result in controversial discussions about how researchers should assess gene expression.

Ptolemey’s world map from Harmonica Macrocosmica

It has become very common for molecular biology laboratories to use global gene expression analyses to understand the molecular signature of a cell. These global analyses can measure the gene expression of thousands of genes in a single experiment. By comparing the gene expression profiles of different groups of cells, such as cancer cells and their healthy counterparts, many important new genes or new roles for known genes have been uncovered. The Gene Expression Omnibus is a public repository for the huge amount of molecular information that is generated. So far, more than 800,000 samples have been analyzed, covering the gene expression in a vast array of organisms and disease states.

Rick himself has extensively used such expression analyses to characterize cancer cells and stem cells, but at the ISSCR symposium, he showed that most of these analyses are based on the erroneous assumption that the total RNA content in cells remains constant. When the gene expression in cancer cells is compared to that of healthy non-cancer cells, the analysis is routinely performed by normalizing or standardizing the RNA content. The same amount of RNA from cancer cells and non-cancer cells is obtained and the global analyses are able to detect relative differences in gene expression. However, a problem arises when one cell type is generating far more RNA than the cell type it is being compared to.

In a paper that was published today in the journal Cell entitled “Revisiting Global Gene Expression Analysis”, Rick Young and his colleagues discuss their recent discovery that the cancer-linked gene regulator c-Myc increases total gene expression by two to three-fold. Cells expressing the c-Myc gene therefore contain far more total RNA than cells that don’t express it. This means that most genes will be expressed at substantially higher levels in the c-Myc cells. However, if one were to perform a traditional gene expression analysis comparing c-Myc cells versus cells without c-Myc, one would “control” for these differences in RNA amount by using the same amount of RNA for both cell types. This traditional standardization makes a lot of sense; after all, how would one be able to compare the gene expression profile in the two samples, if we loaded different amounts of RNA? The problem with this common-sense standardization is that it misses out on global shifts of gene expression, such as those initiated by potent regulators such as c-Myc. According to Rick Young, one answer to the problem is to include an additional control by “spiking” the samples with defined amounts of known RNA. This additional control would allow us to then analyze if there is also an absolute change in gene expression, in addition to the relative changes that current gene analyses can detect.

In some ways, this seems like a minor technical point, but I think that it actually points to a very central problem in how we perform gene expression analysis, as well as many other assays in cell biology and molecular biology. One is easily tempted to use exciting large scale analyses to study the genome, epigenome, proteome or phenome of cells. These high-tech analyses generate mountains of data and we spend an inordinate amount of time trying to make sense of the data. However, we sometimes forget to question the very basic assumptions that we have made. My mentor Till Roenneberg taught me how important it was to use the right controls in every experiment. The key word here is “right” controls, because merely including controls without thinking about their appropriateness is not sufficient. I think that Rick Young’s work is an important reminder for all of us to continuously re-evaluate the assumptions we make, because such a re-evaluation is a pre-requisite for good research practice.



One thought on “Is the Analysis of Gene Expression Based on an Erroneous Assumption?

  1. Paul Orwin

    A related issue has been seen in bacterial gene expression studies for a long time, because there are no “housekeeping genes” that can be reliably used. Lisa Alvarez-Cohen at Berkeley came up with a technique that uses a spike of firefly luciferase as the internal standard in qPCR. The same approach should work in euks (maybe use a bacterial gene). This certainly points to some needed circumspection when analyzing global patterns of gene expression.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.