A new model captures and automates clinical suspicion of a genetic disease.

At Vanderbilt University Medical Center, staff data scientist Theodore Morley and geneticist Douglas Ruderfer, Ph.D., an associate professor of medicine, are developing tools for shortcutting the process of characterizing rare disease gene variants. Using thousands of deidentified patient EHRs, the team trained a predictive algorithm that can connect the dots between the diagnostic odyssey of symptoms and genetic test orders.

“We were aiming to build a model that captures and automates clinical suspicion of a genetic disease,” Morley said. They hypothesized that clinical suspicion which led physicians to order chromosomal microarrays (CMAs) would, in itself, be a strong predictor of rare disease variants. The work was recently published in Nature Medicine.

“Our work could contribute to more systematic and timely screening, alerting providers of patients that might benefit from a genetic test,” Ruderfer said.

An Inverted Analysis

“Under current practice, genetic testing is not equally or completely provided to those who might benefit most,” Ruderfer said. As a result, many patients with rare genetic diseases often slog through a diagnostic odyssey – years of specialist appointments and testing, and the accompanying emotional roller coaster – before getting a diagnosis.

“We were aiming to build a model that captures and automates clinical suspicion of a genetic disease.”

A newer approach uses a scientific process inversion often favored for enabling rapid translation from laboratory to clinic. Using the wealth of EHR data now available, researchers like Ruderfer’s team are working backwards from phenotype to genotype, with nearly immediate clinical utility.

In this model, the potential validation barrier is that some unknown proportion of those getting a CMA would not have a suspected disease. “Not everyone who gets a genetic test gets a diagnosis, where we know what variant caused their disease,” Ruderfer noted. “What we have done instead is capture an informed suspicion of genetic disease and demonstrate that this data has a high association with certain symptom clusters,” Ruderfer said.

To date, specimen banks haven’t accumulated enough genetically sequenced samples on the rare disease population to determine who has actually had confirmed diagnoses, emphasizing the need for alternative approaches. Said Morley, “Even where we have a large amount of data on rare diseases, the dataset is usually comprised of a large number of different diseases, due to their rarity.”

Testing the Model

In their most recent study, the team trained a machine learning algorithm on deidentified EHRs of 1,818 patients who had a CMA order, and 7,236 matched controls. The trained algorithm successfully identified 87 percent of cases where genetic testing had been ordered and 96 percent of controls where no genetic testing was ordered.

They then assessed the algorithm’s ability to identify patients with specific genetic diseases in 6,445 patients using BioVU, Vanderbilt’s DNA biobank. The algorithm successfully identified patients with pathogenic copy number variations, for example – the genetic abnormality behind DiGeorge syndrome.

Finally, they validated it across a larger population at Vanderbilt, and externally at a second hospital. “We were concerned that differences in procedures, data availability, and coding might mean it performed differently, but we found similar results in both of these cases,” Morley said. “This model can be applied en masse, and is portable for use in other medical centers’ systems.”

Next on Tap: Diagnostic Odyssey Matches

The team envisions clinicians and geneticists using the model as a screening tool. “If someone’s recent combination of diagnoses surpasses the threshold of where a CMA genetic test may be warranted, the record is flagged,” Ruderfer said.

“Our goal is obviously to identify as many patients as possible that we think we can help through genetic testing, but not to burden providers with more flags or screens,” he said. “I can imagine a scenario where a genetic counselor sees the flag, then does a quick review and consults with the provider on potential referral for a genetic test.”

“This model can be applied en masse, and is portable for use in other medical centers’ systems.”

Ruderfer and Morley are now working on training a model on actual diagnoses. “Our first study was a proof of concept, saying it looks like this person is a candidate for genetic testing at large,” Ruderfer said. “Now, we are working on whether we can make a prediction on the presence of a genetic disease, as opposed to clinical suspicion of one, and which genetic disease is most likely. This work is being made possible by the growing number of confirmed diagnoses we have in our records.”

About the Expert

Douglas Ruderfer, Ph.D.

Douglas Ruderfer, Ph.D., is an assistant professor of genetic medicine, psychiatry and behavioral sciences and biomedical informatics at Vanderbilt University Medical Center. His research focuses on elucidating the genetic etiology of behavioral health traits and psychiatric diagnoses, and using genetics/genomics to understand biological mechanisms and interventional strategies of psychiatric disorders.

Theodore Morley

Theodore Morley is a data scientist at Vanderbilt University Medical Center. His computational focuses have primarily been in biology, natural language processing, and machine learning.