One challenge in developing new treatments for skin conditions is reliably quantifying affected areas. Research dermatologists must pore over images to demarcate lesion boundaries—and stay consistent in their assessments between images and patients. Researchers in pathology and radiology rely on similar approaches.
New research suggests that the process could be automated using machine learning and crowdsourcing. The goal is to produce enough annotated images of a particular condition to reliably support research into new treatments.¹
Researchers from Vanderbilt University Medical Center have been applying artificial intelligence in the context of graft-versus-host-disease (GVHD), a serious complication associated with bone marrow and allogeneic hematopoietic stem cell transplants. Experts agree that 30 to 70 percent of transplant recipients develop GVHD—but they do not always agree on the appearance of GVHD-associated skin lesions.
“We have a lot of GVHD therapies that seem to work in our patients, but when we get to larger studies, they don’t pan out. Part of the problem is that we’re not able to measure the disease consistently,” said Eric Tkaczyk, M.D., assistant professor of dermatology and biomedical engineering at Vanderbilt University Medical Center.
Step One: Finding Lesions
“We’re trying to automate the process we currently do with our eyes.”
Through a series of studies, Tkaczyk has been working with Madan Jagasia, M.D., Chief Medical Officer of the Vanderbilt-Ingram Cancer Center, to standardize GVHD skin measurements. “We’re trying to automate the process we currently do with our eyes,” Tkaczyk said. “We’re using machine learning to recognize affected skin in photos and help enhance automation.”
They started by teaming up with Benoit Dawant, Ph.D., director of the Vanderbilt Institute for Surgery and Engineering. The researchers assembled computer scientists and bioengineers to automate the process of “segmentation,” defining boundaries in patient photos, the same kind of boundary-defining approach used to train self-driving cars.
Jianing Wang, graduate student in Dawant’s laboratory, created a labeled dataset of more than 400 3D patient photos broken into 4,000 2D images. She then applied a machine learning technique that uses a deep learning convolutional neural network (CNN) to identify skin lesions in the images. Wang trained the CNN from a small collection of expert-labeled images. Results published in the Proceedings of the Society of Photo-Optical Instrumentation Engineers showed the newly trained artificial intelligence model could successfully segment GVHD lesions from images on its own.
Step Two: Distinguishing Lesions
When the researchers began to compare results from their automated system to those of experts, they noticed a few differences. While the size of a lesion may not change, over time the skin affected by GVHD can change color or texture—an important clinical indication.
In a follow-up study published in the Nature journal Bone Marrow Transplantation, the researchers again used machine learning, this time to zero in on GVHD lesion color. They successfully applied automated image processing to overcome disagreement between experts on lesion severity and to sort identified lesions by color.
The study is an important first step toward using artificial intelligence to recognize lesions in diverse skin tones, Tkaczyk said. “The networks aren’t brilliant. They’re incredibly good memorizers. Our research shows they can recognize variants, but if we haven’t shown them a diverse population they aren’t going to know what to do with a new skin tone or lesion color.”
Step Three: Crowdsourcing
The key to successful machine learning is a large and diverse training dataset. Crowdsourcing provides an opportunity to increase the number of demarcated images and further enhance the model.
In a pilot study published in Skin Research and Technology, the researchers teamed with Daniel Fabbri, Ph.D., assistant professor of biomedical informatics at Vanderbilt, and recruited seven nurses and med students to identify GVHD lesions in patient images. They asked a board-certified dermatologist with GVHD expertise to do the same. The crowd workers had limited knowledge of GVHD—only that acquired through a brief slide presentation.
The pixel-by-pixel match between the crowd workers and the expert across 410 2D images was 76 percent, as measured by what is known as the “Dice index,” which quantifies the amount of overlap between regions. Tkaczyk noted: “This places this group of crowd workers, as a collective, very much on par with expert evaluation for chronic GVHD.”
The studies are building blocks toward using artificial intelligence to improve GVHD diagnoses and monitoring. The researchers believe similar approaches could support pathology and radiology, too. “If we are able to create clear definitions and expert consensus to drive machine learning on subtle images of rashes, our approach could apply to many other visual medical problems, particularly those with a large degree of expert variation,” Tkaczyk said.