By Ruth Steinhardt
Ali Rahnavard, an assistant professor of biostatistics and bioinformatics and the Computational Biology Institute at George Washington University’s Milken Institute School of Public Health, received a $200,000 Rapid Response Research grant (RAPID) from the National Science Foundation to use machine learning to study genetic variability in the SARS-CoV-2 coronavirus, which causes COVID-19, and to identify associations with clinical variables of health outcomes.
Like all organisms, viruses evolve—more rapidly than, say, mammals, but still within a traceable web of genomic exchanges and mutations. Dr. Rahnavard’s team is developing a computational platform that will compare the genetic sequences of previous strains of this coronavirus to the newest ones, identifying the sites of genetic difference. These differences are then tested for associations with clinical variables like geographic origin and health outcome.
“These analyses will help explain why SARS-CoV-2 is so uniquely infectious and powerful, and could point to ways to defeat it,” Dr. Rahnavard said.
The platform also uses biomarker discovery techniques to characterize the virus effects on the human body, helping explain the observed diversity of health outcomes with COVID-19 and how these relate to human genetic variations.
“We think of humans as the platform this virus runs on,” Dr. Rahnavard said. “It doesn’t act by itself. It’s just genetic material wrapped in proteins, but when it gets to other species, human or animal, it can replicate and have an effect.”
One region in the virus's RNA, for example, codes for the three-dimensional protein shapes around the virus envelope that allow it to bind to receptors in the human body and begin replicating. Researchers could use their understanding of the genetic code that causes these structures to create vaccines with identical structures. These would bond to the receptors in the human body, essentially blocking off the portals by which COVID-19 wreaks its destructive effect.
“We want to look at all these regions of the genetic code and prioritize those that would help with vaccine development,” Dr. Rahnavard stated.
Dr. Rahnavard began paying close attention to the COVID-19 pandemic in December of last year, when he was expecting his new child and considering how to keep his family healthy.
“I found it an interesting topic with respect to the science, but I didn’t know it would rapidly become a major global issue,” he said.
As the disease spread, he tracked the information emerging from laboratories in affected countries, particularly China.
“I saw that all the scientific questions were rooted in the nature of the virus,” he said.
He also saw that powerful machine learning techniques would be necessary to synthesize the sheer volume and complexity of data emerging about SARS-CoV-2 and its antecedents.
“This is very complicated information, and we need the tools to gather and understand it,” Dr. Rahnavard said.
Dr. Rahnavard hopes the algorithms he and his team create can be a framework on which scientists develop not only answers from, but also useful questions about the SARS-CoV-2 virus.
“Our work addresses the key question about the root of the disease,” he said. “That gives us a long-term understanding of the virus, which is very important for the medical and scientific community as well as the public.”