ENCODE DNA Project: Big Data to Solve Genome Mysteries

Despite decades of impressive medical advances, the human body continues to harbor its share of mysteries. Why are some individuals or groups prone to cancer, or more likely to lose their eyesight in old age? Or, to paraphrase the old Bill Hicks joke: how can Keith Richards survive years’ worth of rock-star bacchanalia in apparently fine shape, while a runner and health nut like Jim Fixx dies relatively young?

The answer almost certainly lurks in the genes. And while the Human Genome Project identified and mapped the human genome, it’s been the task of ENCODE, the Encyclopedia of DNA Elements, to more thoroughly explore all the genome’s elements—especially the “dark regions” between the protein-coding genes. Once dismissed as the “useless” part of DNA, it’s turning out those regions have a major influence on our health and individuality.

ENCODE is overseen by the National Human Genome Research Institute, or NHGRI, a branch of the U.S. National Institutes of Health. It involved 440 scientists divided into 32 groups, with each group conducting 24 types of experiment on 150 cell lines in order to figure out the functionality of those DNA regions. Their combined work resulted in more than 30 papers released in a variety of scientific journals on Sept. 6.

Those experiments generated tons of data. “The real fun starts when the various data sets are layered together,” Brendan Maher wrote in a summary piece in Nature. That overlapping data forms a more comprehensive map of the genome, revealing how those regions beyond the genes nonetheless contribute to the latter’s function. So much for “junk DNA,” or the derisive term previously applied to those genomic areas that researchers previously believed held no use; as it turns out, those areas are vital to the genome’s operations.

Now that they’ve developed a “manual” of sorts for human DNA, researchers are exploring how variations in those regions potentially contribute to disease. If a switch that controls gene function is modified, for example, will that result in some type of cancer? “We are now, thanks to ENCODE, able to attack much more complex diseases,” Manolis Kellis, a computational genomicist at the Massachusetts Institute of Technology, told Nature. But there’s much more data-crunching to be done, accompanied by years of trials and experimentation and mountains of research papers, before these discoveries could translate into meaningful therapies.

Meanwhile, Keith Richards is still going strong.


Image: Sashkin/Shutterstock.com