NIH’s “All of Us” Data Browser Unveiled

Joshua Denny, M.D., of Vanderbilt heads the All of Us Data and Research Center.
Researchers can access aggregate survey and EHR data from 100,000 participants—and counting.

For the past year, the NIH has been recruiting participants for the All of Us Research Program—one of the largest biomedical databases in existence. The NIH’s target is one million enrollees, each providing EHR data, survey responses and biologic samples. The data are aggregated and publicly accessible through a Research Hub that launched in May 2019.

Integrating data for All of Us is a massive undertaking, said Joshua Denny, M.D., professor of biomedical informatics at Vanderbilt University Medical Center. Denny heads the All of Us Data and Research Center, which is a partnership between Vanderbilt, the Broad Institute, Verily, and Columbia University. They have spent much of the last year designing and beta-testing its Data Browser (now live) and Workbench (anticipated Winter 2019).

“We’re building a national infrastructure,” Denny said. “There are now over 300 sites entering data through our web systems. We’re making one giant dataset out of it.”

Focusing on Diversity

All of Us will support precision medicine healthcare and research for a widely diverse group of different backgrounds, ages and regions. At last count, 104,440 survey respondents included nearly 50 percent racial and ethnic minorities.

“This is one of the largest genomic cohorts in the world… it provides an opportunity to learn about people normally underrepresented in biomedical research.”

“The diversity angle is important. This is one of the largest genomic cohorts in the world, and it provides an opportunity to learn about people normally underrepresented in biomedical research,” Denny said.

Harmonizing Data

All of Us participants provide survey responses—collected via a smartphone app or website—and visit a nearby clinic to provide biologic samples that are then shipped to a biobank at Mayo Clinic for analysis.

The Data and Research Center team has been coding and designing tech infrastructure that integrates deidentified data from millions of biologic samples with EHRs and survey responses. They use a common data model, OMOP, as a coding structure that can accommodate healthcare terminology from different datasets.

“It’s a lot of data harmonization. We use a common data model to streamline how we represent data characteristics, like lab values,” Denny explained. “It’s a monumental effort. There are over 10 EHRs contributing data, plus the enrollment sites—and all the data need to be accessible.”

Open Access for Everyone

What sets All of Us apart from other biomedical data hubs is its broad accessibility. The program “values researchers of all types as partners” and even crowdsources research questions. Its anonymized data are primed for citizen science projects, Denny said.

“We hope to have good ‘onramps’ for very diverse audiences. The data be used to support students of all levels, patient populations looking to learn more about their condition, in addition to higher-level biomedical research.”

Already, aggregate data available through the Data Browser can provide rationale to study a specific condition or population. Researchers can use it to highlight at-risk groups for funding applications, or to help drive recruitment efforts, as just a few examples.

Growing Possibilities

In the next phase of the program, researchers will be able to apply for access to the All of Us Workbench. They will complete an authentication process—including ethics training—to access more granular data.

The team is creating Workbench features that include project-specific workspaces that can be shared among collaborators, and a Cohort Builder to help researchers build specific studies. Statistical analysis software will be integrated directly in the Workbench, as well as a Jupyter Notebook feature to support analyses across other computer programming languages.

The All of Us program also plans to expand enrollment approaches to further reach minority populations, and to add data from additional survey questions to the browser, all to better focus research efforts.

“There has been a very real need for better precision medicine tools and approaches,” Denny said. “Now we’re up and running and growing rapidly.”