Stanford Researchers Use Text and Images from Pathologists’ Twitter Accounts to Train New Pathology AI Model

Researchers intend their new AI image retrieval tool to help pathologists locate similar case images to reference for diagnostics, research, and education

Researchers at Stanford University turned to an unusual source—the X social media platform (formerly known as Twitter)—to train an artificial intelligence (AI) system that can look at clinical laboratory pathology images and then retrieve similar images from a database. This is an indication that pathologists are increasingly collecting and storing images of representative cases in their social media accounts. They then consult those libraries when working on new cases that have unusual or unfamiliar features.

The Stanford Medicine scientists trained their AI system—known as Pathology Language and Image Pretraining (PLIP)—on the OpenPath pathology dataset, which contains more than 200,000 images paired with natural language descriptions. The researchers collected most of the data by retrieving tweets in which pathologists posted images accompanied by comments.

“It might be surprising to some folks that there is actually a lot of high-quality medical knowledge that is shared on Twitter,” said researcher James Zou, PhD, Assistant Professor of Biomedical Data Science and senior author of the study, in a Stanford Medicine SCOPE blog post, which added that “the social media platform has become a popular forum for pathologists to share interesting images—so much so that the community has widely adopted a set of 32 hashtags to identify subspecialties.”

“It’s a very active community, which is why we were able to curate hundreds of thousands of these high-quality pathology discussions from Twitter,” Zou said.

The Stanford researchers published their findings in the journal Nature Medicine titled, “A Visual-Language Foundation Model for Pathology Image Analysis Using Medical Twitter.”

“The main application is to help human pathologists look for similar cases to reference,” James Zou, PhD (above), Assistant Professor of Biomedical Data Science, senior author of the study, and his colleagues wrote in Nature Medicine. “Our approach demonstrates that publicly shared medical information is a tremendous resource that can be harnessed to develop medical artificial intelligence for enhancing diagnosis, knowledge sharing, and education.” Leveraging pathologists’ use of social media to store case images for future reference has worked out well for the Stanford Medicine study. (Photo copyright: Stanford University.)

Retrieving Pathology Images from Tweets

“The lack of annotated publicly-available medical images is a major barrier for innovations,” the researchers wrote in Nature Medicine. “At the same time, many de-identified images and much knowledge are shared by clinicians on public forums such as medical Twitter.”

In this case, the goal “is to train a model that can understand both the visual image and the text description,” Zou said in the SCOPE blog post.

Because X is popular among pathologists, the United States and Canadian Academy of Pathology (USCAP), and Pathology Hashtag Ontology project, have recommended a standard series of hashtags, including 32 hashtags for subspecialties, the study authors noted.

Examples include:

#EyePath for Ophthalmic Pathology,
#GIPath for Gastrointestinal and Liver Pathology,
#HemePath for Hematopathology, and
#IDpath for Infectious Disease (clinical) Pathology.

“Pathology is perhaps even more suited to Twitter than many other medical fields because for most pathologists, the bulk of our daily work revolves around the interpretation of images for the diagnosis of human disease,” wrote Jerad M. Gardner, MD, a dermatopathologist and section head of bone/soft tissue pathology at Geisinger Medical Center in Danville, Pa., in a blog post about the Pathology Hashtag Ontology project. “Twitter allows us to easily share images of amazing cases with one another, and we can also discuss new controversies, share links to the most cutting edge literature, and interact with and promote the cause of our pathology professional organizations.”

The researchers used the 32 subspecialty hashtags to retrieve English-language tweets posted from 2006 to 2022. Images in the tweets were “typically high-resolution views of cells or tissues stained with dye,” according to the SCOPE blog post.

The researchers collected a total of 232,067 tweets and 243,375 image-text pairs across the 32 subspecialties, they reported. They augmented this with 88,250 replies that received the highest number of likes and had at least one keyword from the ICD-11 codebook. The SCOPE blog post noted that the rankings by “likes” enabled the researchers to screen for high-quality replies.

They then refined the dataset by removing duplicates, retweets, non-pathology images, and tweets marked by Twitter as being “sensitive.” They also removed tweets containing question marks, as this was an indicator that the practitioner was asking a question about an image rather than providing a description, the researchers wrote in Nature Medicine.

They cleaned the text by removing hashtags, Twitter handles, HTML tags, emojis, and links to websites, the researchers noted.

The final OpenPath dataset included:

116,504 image-text pairs from Twitter posts,
59,869 from replies, and
32,041 image-text pairs scraped from the internet or obtained from the LAION dataset.

The latter is an open-source database from Germany that can be used to train text-to-image AI software such as Stable Diffusion.

Training the PLIP AI Platform

Once they had the dataset, the next step was to train the PLIP AI model. This required a technique known as contrastive learning, the researchers wrote, in which the AI learns to associate features from the images with portions of the text.

As explained in Baeldung, an online technology publication, contrastive learning is based on the idea that “it is easier for someone with no prior knowledge, like a kid, to learn new things by contrasting between similar and dissimilar things instead of learning to recognize them one by one.”

“The power of such a model is that we don’t tell it specifically what features to look for. It’s learning the relevant features by itself,” Zou said in the SCOPE blog post.

The resulting AI PLIP tool will enable “a clinician to input a new image or text description to search for similar annotated images in the database—a sort of Google Image search customized for pathologists,” SCOPE explained.

“Maybe a pathologist is looking at something that’s a bit unusual or ambiguous,” Zou told SCOPE. “They could use PLIP to retrieve similar images, then reference those cases to help them make their diagnoses.”

The Stanford University researchers continue to collect pathology images from X. “The more data you have, the more it will improve,” Zou said.

Pathologists will want to keep an eye on the Stanford Medicine research team’s progress. The PLIP AI tool may be a boon to diagnostics and improve patient outcomes and care.

—Stephen Beale

Related Information:

New AI Tool for Pathologists Trained by Twitter (Now Known as X)

A Visual-Language Foundation Model for Pathology Image Analysis Using Medical Twitter

AI + Twitter = Foundation Visual-Language AI for Pathology

Pathology Foundation Model Leverages Medical Twitter Images, Comments

A Visual-Language Foundation Model for Pathology Image Analysis Using Medical Twitter (Preprint)

Pathology Language and Image Pre-Training (PLIP)

Introducing the Pathology Hashtag Ontology

Stanford Researchers Use Text and Images from Pathologists’ Twitter Accounts to Train New Pathology AI Model

E-Briefings Categories