GW Professor Leads Revolutionary Health Data Integration Project Funded by ARPA-H

SMHS’ Raja Mazumder is co-leader of a major project funded by the agency, which backs high-potential, high-impact, unconventional research.

October 28, 2024

Raja Mazumder

Co-lead Raja Mazumder sees the project as a potential bridge between the clinical and research worlds. (Raja Mazumder)

A professor at the George Washington University’s School of Medicine and Health Sciences is part of a multi-team national project building a secure, integrated health data ecosystem that will bridge data silos to enable broader biomedical research and innovation.

Raja Mazumder, professor of biochemistry and molecular medicine and co-director of the McCormick Genomic Proteomic Center, will co-lead a team as part of the Biomedical Data Fabric (BDF) toolbox funded by the Advanced Research Projects Agency for Health (ARPA-H).

Mazumder is co-principal investigator on the project, partnering with big data analytics platform DNA-HIVE to develop the Federated Ecosystem for Analytics and Standardization Technologies (FEAST), an infrastructure that would enable secure internal database query into cancer health records, medical device data, and more while protecting the privacy of patients and their health data.

The “federated” aspect of FEAST refers to its opt-in, secure structure. Having been adopted by a hospital system or research institution, FEAST would be accessed internally, using AI to query internal data and contributing its analysis—not, in most cases, the data itself—in readable format to a dashboard shareable with other FEAST institutions. Because it does not require opening any database to external query, this model makes a data breach highly unlikely.

“We emphasize the word federated, because the kind of data we want to make available for use has a long history of being in a very closed and secure environment and rightly so,” Mazumder said. “You cannot just take patient data and give it away. Our project would keep those checks and balances in place, but also use technology to bridge the silos that exist right now.”

As increasing amounts of data are generated, access becomes more expensive, more complicated and more important—both for researchers hunting down potential causes and cures for diseases like cancer and for patients afflicted by those diseases. Hospitals and healthcare organizations use different and often incompatible data platforms to collect and store patient data, meaning that even if they have permission to share data, their formats may be incompatible with one another. Additionally, these systems also have in place necessary firewalls and other security measures to protect patients’ data privacy. Though these are crucial protections, they have created a siloed health data landscape across the U.S. and internationally, restricting sharing even by those who have opted in to contributing their information to further research. Simple, secure, readable data, within an opt-in system, would allow researchers and practitioners to understand, treat and cure diseases more readily.

Mazumder saw this firsthand recently, when he heard from a patient research organization with a trove of data they were authorized and eager to share with researchers at other institutes—but with no standardized mechanism in place to make that data shareable across systems and institutions, a representative from the organization had to come to Mazumder’s office in Ross Hall with a physical storage drive and encryption key, from which he had to determine how to extract shareable analysis manually.

To Mazumder, the problem was a surprising one to encounter in 2024, when instantaneous digital access to information at any distance is almost a given. Fortunately, in this particular case, the group was situated close enough to Foggy Bottom to make physical data sharing possible. But this is exactly the kind of situation that FEAST would be able to address.

“Right now, there are scenarios where patients have said ‘My data can be used and should be used,’ but those data are sitting in some inaccessible environment because we haven't broken down the barriers of usage,” Mazumder said.

An institution using FEAST need not open its databases to external queries, eliminating potential privacy concerns. It also would not have to fully change or update its data platform to become compatible with others, a process that might be inconvenient, costly and perhaps even legally problematic: The data platforms most widely used by hospitals and healthcare systems are also for-profit companies, with Epic and Oracle Cerner holding the two largest market shares. Requiring every institution to use the same system in order to share information would create a de facto corporate monopoly.

FEAST also could integrate with public biomedical databases to alert researchers to potential avenues of inquiry. If hospital data indicates that a certain protein is disproportionately expressed among patients with a certain type of cancer, for instance, FEAST could query UniProt—a vast repository of information about protein sequence and function—for more information about that protein, including its mutations and potential overexpression in other diseases. That creates the potential to discover new connections, propose new hypotheses and develop new treatment options.

“There are two worlds—the clinical world and the biomedical informatics world—and they don’t always talk to each other,” Mazumder said. “So, our project is not only to make electronic medical record data shareable across these different places, but also to map it to biomedical knowledge that has information on genes, proteins, glycans, metabolites and more. All of this data can allow better decision-making for researchers, for regulatory scientists and in the long term for patients.”

ARPA-H was established in 2022 by the Biden administration. Its mandate is to support high-potential, high-impact biomedical and health research that for reasons of cost or complexity cannot be developed through traditional commercial or research avenues.

FEAST qualified for an ARPA-H award because of its massive potential to enable breakthroughs and its vast scope, Mazumder said. While widespread adoption is far off, it would almost certainly require new legislation regulating data sharing, machine learning and patient consent protocols.

Recently, when all teams on the BDF project met in Colorado for the first time to announce their projects and share their work, it became clear to Mazumder that FEAST could be a bridge not only between established institutions in the clinical and research world, but also between colleagues on the BDF project as it develops. The teams will be able to share each other’s advances, adopt each other’s tools and reap the rewards of each other’s research, with potentially huge results for all.

“If all of these other projects work, our project will benefit also, and vice versa,” Mazumder said. “What we are proposing will be able to connect a lot of these things together.”