GW Computational Biology Institute: Making Sense of Big Data

Researchers at the interdisciplinary institute are investigating how complex biological systems operate and evolve.

Computational Biology Institute
From left, Max Alekseyev, Keith Crandall, Marcos Perez-Losada and Jeremy Goecks of the Computational Biology Institute
April 14, 2014

By Lauren Ingeno

It has been a mere decade since the Human Genome Project was successfully completed, but today, DNA sequencing and genetic data sharing have the potential to revolutionize health care.

It is drastically easier and cheaper to sequence genomes than it was even a few years ago, and new technologies have opened up the possibility of personal genome sequencing as a tool for diagnosing and treating fatal diseases. Unfortunately, each human genome contains an estimated 30,000 genes with 3.2 billion base pairs, amounting to 100 gigabytes of data with decent coverage, or roughly 20 percent of a laptop’s storage space.

So how will researchers and physicians store and manage that enormous amount of data, and more importantly, will they be able to understand it?

A team of researchers at the George Washington University’s Computational Biology Institute (CBI) is poised to answer this and other challenging questions about what is called “big data,” a buzzword used to describe a collection of data sets too large to be handled by regular software.

The nine-month-old institute, led by Director Keith Crandall, is opening new doors of discovery that could benefit millions.

“The data collection is no longer the hurdle in biology. That part is relatively trivial,” Dr. Crandall said. “It’s making sense of all that genomic data that you get, that is the real trick. It is a key time to be developing a research program in computational biology.”

Computational biology, explained Dr. Crandall, is an interdisciplinary field that involves the application of data-analysis approaches to biological systems through software development, computer simulation and mathematical modeling. Using computational methods, Dr. Crandall and his team have already pioneered ways to trace disease outbreaks, study the evolution of drug resistance and diagnose the root cause of illnesses faster and more effectively.

For example, when a deadly, unknown virus was infecting salmon farms in Chile, Dr. Crandall and his research partners were able to sequence DNA from the fish and diagnose a new virus—which researchers at the CBI are now in the process of describing—that had never been seen before. 

“That’s the problem with the current technique—you can only find what’s already been seen,” he said. “And there is a lot out there that hasn’t been seen before. We can figure that out.”

When Dr. Crandall began his position as director of the CBI in July 2013, he was tasked with establishing the institute’s research priorities and building it from the ground up.

He was given the freedom to recruit a team of faculty members who would join him at the institute, which is housed on GW’s Virginia Science and Technology Campus. To date, he has hired two tenure-track researchers and two research professors. Dr. Crandall is looking to add two more faculty members to the CBI by next fall. The three focal areas of the institute are biodiversity informatics, systems biology and translational medicine.

When hiring faculty, Dr. Crandall thought carefully about building a team with a variety of specialties and interests, from the fields of biology, computer science, engineering, math and statistics. 

“The first thing I want is high quality. But secondly, I want folks with skill sets that are complementary to one another that integrate well into an overall package,” he said. “I’m the jack of all trades, the master of none. So I’m in a good position to put all this stuff together.”

With his first two hires—Max Alekseyev, the “math guy,” and Jeremy Goecks, the “visualization guy,”—he found exactly what he was looking for.

Dr. Alekseyev, an associate professor in the Department of Mathematics, is a theoretician, whose research focuses on developing new methods of discrete mathematics to solve open biological problems, with a particular interest in evolutionary biology.

Dr. Goecks, an assistant professor of computational biology, focuses on developing approaches and software to better interact with high-throughput genomics data.

“Visualization people are very hard to get in academia, because it turns out they all want to write gaming software and make lots of money. Not many of them are interested in things like cancer genomics and making data easier for physicians to deal with,” Dr. Crandall said. “But Jeremy is one of those people.”

Dr. Goecks and a team of researchers from other universities have developed an open, web-based platform, called Galaxy, intended for data intensive biomedical research. The primary function of Galaxy is to make it easier for physicians—who likely do not have computer programming experience—to be able to perform, reproduce and share complete analyses using complicated computational tools.

“Galaxy is about accessibility,” Dr. Goecks said. “Genome data are incredibly large, and you can’t make sense of them unless you use computers. So, ultimately somebody has to be able to manage your computer, to make it do what you want it to do. Biologists, oncologists and clinicians who want to do sequence analysis need something to help them get started.”

The Galaxy platform enables researchers to easily re-run another’s analysis as well as share data, making it an excellent collaborative tool, Dr. Goecks said.

The second area of Dr. Goecks’ research is in cancer genomics, discovering what gene mutations are causing a cell to become a cancer and tying those variations to drugs for treatment.

“Now that there are large cancer gene variant databases available, we might be able to say, ‘These mutations have been seen before, and we know these drugs act on these mutations,’” he said.

With that knowledge, it might be possible to sequence a cancer patient’s genes and run it through the Galaxy pipeline, which would target a potential drug for the patient. Dr. Goecks admits that the system is “very cautious,” as targeting a single drug for cancer treatment is not always effective.

While Dr. Alekseyev and Dr. Goecks are computer programmers, CBI faculty members Taylor Maxwell and Marcos Perez-Losada, both research professors of computational biology, are bioinformaticians, who use the software written by programmers to analyze state-of-the-art data sets.

Dr. Maxwell’s background is in evolutionary biology and population genetics, with a research focus in human statistical genetics. He observes gene-by-gene interactions, looking at how a specific location of a gene or DNA sequence affects the relationship between known risk factors and disease, such as coronary heart disease and Alzheimer's.

Dr. Perez-Losada’s research interests are in biodiversity informatics and infectious diseases. By studying the genetic diversity and molecular dynamics of gonorrhea and HIV, Dr. Perez-Losada has been able to suggest better control and prevention strategies for these diseases.

The complementary skill sets of each of the researchers will benefit one another, said Dr. Maxwell.

“The Galaxy resource that Dr. Goecks has developed, for example, is very suitable for all of us in the CBI, as we struggle to deal with all of the empirical sequence data coming down the line,” he said. “Dr. Goecks’ work will help us establish a system to manage, process and do standard analyses.”

In turn, the data will help Dr. Goecks determine what directions he may want to take Galaxy and may provide access to data for other questions of interest, Dr. Maxwell added.

With such close proximity to Washington, D.C., the CBI has developed a number of partnerships with organizations such as the National Museum of Natural History, the National Zoo and the National Institutes of Health, to name a few.

In collaboration with the Children’s National Medical Center, the CBI is working on a research study to better understand the causes of asthma in children. The CBI is also working with the D.C. Developmental Center for AIDS Research, attempting to unlock new secrets in personalized medicine based on individual genetics and biomarkers, as Washington, D.C., has the highest rate of HIV infection in the nation.

Additionally, the institute has partnered with the U.S. Fish and Wildlife Service, along with schools and departments within GW, including the School of Medicine and Health Sciences, the Milken Institute School of Public Health and the School of Engineering and Applied Science.

Dr. Crandall said the CBI is breaking down the silos between academic disciplines, and bringing many different voices to the table, which in turn, will lead to better science. The CBI will serve as a model for future initiatives at GW, such as the forthcoming Genomics Institute, said Vice President for Research Leo Chalupa.

“The team Dr. Crandall has put together is stellar, and the CBI is evolving exactly as we had hoped," Dr. Chalupa said. "I am excited to use this model for institutional initiatives and institutes moving forward.”