News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel

News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel
Sign In

Google DeepMind Says Its New Artificial Intelligence Tool Can Predict Which Genetic Variants Are Likely to Cause Disease

Genetic engineers at the lab used the new tool to generate a catalog of 71 million possible missense variants, classifying 89% as either benign or pathogenic

Genetic engineers continue to use artificial intelligence (AI) and deep learning to develop research tools that have implications for clinical laboratories. The latest development involves Google’s DeepMind artificial intelligence lab which has created an AI tool that, they say, can predict whether a single-letter substitution in DNA—known as a missense variant (aka, missense mutation)—is likely to cause disease.

The Google engineers used their new model—dubbed AlphaMissense—to generate a catalog of 71 million possible missense variants. They were able to classify 89% as likely to be either benign or pathogenic mutations. That compares with just 0.1% that have been classified using conventional methods, according to the DeepMind engineers.

This is yet another example of how Google is investing to develop solutions for healthcare and medical care. In this case, DeepMind might find genetic sequences that are associated with disease or health conditions. In turn, these genetic sequences could eventually become biomarkers that clinical laboratories could use to help physicians make earlier, more accurate diagnoses and allow faster interventions that improve patient care.

The Google engineers published their findings in the journal Science titled, “Accurate Proteome-wide Missense Variant Effect Prediction with AlphaMissense.” They also released the catalog of predictions online for use by other researchers.

Jun Cheng, PhD (left), and Žiga Avsec, PhD (right)

“AI tools that can accurately predict the effect of variants have the power to accelerate research across fields from molecular biology to clinical and statistical genetics,” wrote Google DeepMind engineers Jun Cheng, PhD (left), and Žiga Avsec, PhD (right), in a blog post describing the new tool. Clinical laboratories benefit from the diagnostic biomarkers generated by this type of research. (Photo copyrights: LinkedIn.)

AI’s Effect on Genetic Research

Genetic experiments to identify which mutations cause disease are both costly and time-consuming, Google DeepMind engineers Jun Cheng, PhD, and Žiga Avsec, PhD, wrote in a blog post. However, artificial intelligence sped up that process considerably.

“By using AI predictions, researchers can get a preview of results for thousands of proteins at a time, which can help to prioritize resources and accelerate more complex studies,” they noted.

Of all possible 71 million variants, approximately 6%, or four million, have already been seen in humans, they wrote, noting that the average person carries more than 9,000. Most are benign, “but others are pathogenic and can severely disrupt protein function,” causing diseases such as cystic fibrosis, sickle-cell anemia, and cancer.

“A missense variant is a single letter substitution in DNA that results in a different amino acid within a protein,” Cheng and Avsec wrote in the blog post. “If you think of DNA as a language, switching one letter can change a word and alter the meaning of a sentence altogether. In this case, a substitution changes which amino acid is translated, which can affect the function of a protein.”

In the Google DeepMind study, AlphaMissense predicted that 57% of the 71 million variants are “likely benign,” 32% are “likely pathogenic,” and 11% are “uncertain.”

The AlphaMissense model is adapted from an earlier model called AlphaFold which uses amino acid genetic sequences to predict the structure of proteins.

“AlphaMissense was fed data on DNA from humans and closely related primates to learn which missense mutations are common, and therefore probably benign, and which are rare and potentially harmful,” The Guardian reported. “At the same time, the program familiarized itself with the ‘language’ of proteins by studying millions of protein sequences and learning what a ‘healthy’ protein looks like.”

The model assigned each variant a score between 0 and 1 to rate the likelihood of pathogenicity [the potential for a pathogen to cause disease]. “The continuous score allows users to choose a threshold for classifying variants as pathogenic or benign that matches their accuracy requirements,” Avsec and Cheng wrote in their blog post.

However, they also acknowledged that it doesn’t indicate exactly how the variation causes disease.

The engineers cautioned that the predictions in the catalog are not intended for clinical use. Instead, they “should be interpreted with other sources of evidence.” However, “this work has the potential to improve the diagnosis of rare genetic disorders, and help discover new disease-causing genes,” they noted.

Genomics England Sees a Helpful Tool

BBC noted that AlphaMissense has been tested by Genomics England, which works with the UK’s National Health Service. “The new tool is really bringing a new perspective to the data,” Ellen Thomas, PhD, Genomics England’s Deputy Chief Medical Officer, told the BBC. “It will help clinical scientists make sense of genetic data so that it is useful for patients and for their clinical teams.”

AlphaMissense is “a big step forward,” Ewan Birney, PhD, Deputy Director General of the European Molecular Biology Laboratory (EMBL) told the BBC. “It will help clinical researchers prioritize where to look to find areas that could cause disease.”

Other experts, however, who spoke with MIT Technology Review were less enthusiastic.

“DeepMind is being DeepMind,” Insilico Medicine founder/CEO Alex Zhavoronkov, PhD, told the MIT publication. “Amazing on PR and good work on AI.”

Heidi Rehm, PhD, co-director of the Program in Medical and Population Genetics at the Broad Institute, suggested that the DeepMind engineers overstated the certainty of the model’s predictions. She told the publication that she was “disappointed” that they labeled the variants as benign or pathogenic.

“The models are improving, but none are perfect, and they still don’t get you to pathogenic or not,” she said.

“Typically, experts don’t declare a mutation pathogenic until they have real-world data from patients, evidence of inheritance patterns in families, and lab tests—information that’s shared through public websites of variants such as ClinVar,” the MIT article noted.

Is AlphaMissense a Biosecurity Risk?

Although DeepMind has released its catalog of variations, MIT Technology Review notes that the lab isn’t releasing the entire AI model due to what it describes as a “biosecurity risk.”

The concern is that “bad actors” could try using it on non-human species, DeepMind said. But one anonymous expert described the restrictions “as a transparent effort to stop others from quickly deploying the model for their own uses,” the MIT article noted.

And so, genetics research takes a huge step forward thanks to Google DeepMind, artificial intelligence, and deep learning. Clinical laboratories and pathologists may soon have useful new tools that help healthcare provider diagnose diseases. Time will tell. But the developments are certain worth watching.

—Stephen Beale

Related Information:

AlphaFold Is Accelerating Research in Nearly Every Field of Biology

A Catalogue of Genetic Mutations to Help Pinpoint the Cause of Diseases

Accurate Proteome-wide Missense Variant Effect Prediction with AlphaMissense

Google DeepMind AI Speeds Up Search for Disease Genes

DeepMind Is Using AI to Pinpoint the Causes of Genetic Disease

DeepMind’s New AI Can Predict Genetic Diseases

Proteomics-based Clinical Laboratory Testing May Get a Major Boost as Google’s DeepMind Research Lab Is Making Public Its Entire AI Database of Human Protein Predictions

DeepMind hopes its unrivaled collection of data, enabled by artificial intelligence, may advance development of precision medicines, new medical laboratory tests, and therapeutic treatments

‘Tis the season for giving, and one United Kingdom-based artificial intelligence (AI) research laboratory is making a sizeable gift. After using AI and machine learning to create “the most comprehensive map of human proteins,” in existence, DeepMind, a subsidiary of Alphabet Inc. (NASDAQ:GOOGL), parent company of Google, plans to give away for free its database of millions of protein structure predictions to the global scientific community and to all of humanity, The Verge reported.

Pathologists and clinical laboratory scientists developing proteomic assays understand the significance of this gesture. They know how difficult and expensive it is to determine protein structures using sequencing of amino acids. That’s because the various types of amino acids in use cause the [DNA] string to “fold.” Thus, the availability of this data may accelerate the development of more diagnostic tests based on proteomics.

“For decades, scientists have been trying to find a method to reliably determine a protein’s structure just from its sequence of amino acids. Attraction and repulsion between the 20 different types of amino acids cause the string to fold in a feat of ‘spontaneous origami,’ forming the intricate curls, loops, and pleats of a protein’s 3D structure. This grand scientific challenge is known as the protein-folding problem,” a DeepMind statement noted.

Enter DeepMind’s AlphaFold AI platform to help iron things out. “Experimental techniques for determining structures are painstakingly laborious and time consuming (sometimes taking years and millions of dollars). Our latest version [of AlphaFold] can now predict the shape of a protein, at scale and in minutes, down to atomic accuracy. This is a significant breakthrough and highlights the impact AI can have on science,” DeepMind stated.

Release of Data Will Be ‘Transformative’

In July, DeepMind announced it would begin releasing data from its AlphaFold Protein Structure Database which contains “predictions for the structure of some 350,000 proteins across 20 different organisms,” The Verge reported, adding, “Most significantly, the release includes predictions for 98% of all human proteins, around 20,000 different structures, which are collectively known as the human proteome. By the end of the year, DeepMind hopes to release predictions for 100 million protein structures.”

According to Edith Heard, PhD, Director General of the European Molecular Biology Laboratory (EMBL), the open release of such a dataset will be “transformative for our understanding of how life works,” The Verge reported.  

Demis Hassabis

“I see this as the culmination of the entire 10-year-plus lifetime of DeepMind,” company CEO and co-founder Demis Hassabis (above), told The Verge. “From the beginning, this is what we set out to do: to make breakthroughs in AI, test that on games like Go and Atari, [and] apply that to real-world problems, to see if we can accelerate scientific breakthroughs and use those to benefit humanity.” The release of DeepMind’s entire protein prediction database will certainly do that. Clinical laboratory scientists worldwide will have free access to use it in developing new precision medicine treatments based on proteomics. (Photo copyright: BBC.)

Free Data about Proteins Will Accelerate Research on Diseases, Treatments

Research into how protein folds and, thereby, functions could have implications to fighting diseases and developing new medicines, according to DeepMind. 

“This will be one of the most important datasets since the mapping of the human genome,” said Ewan Birney, PhD, Deputy Director General of the EMBL, in the DeepMind statement. EMBL worked with DeepMind on the dataset.

DeepMind protein prediction data are already being used by scientists in medical research. “Anyone can use it for anything. They just need to credit the people involved in the citation,” said Demis Hassabis, DeepMind CEO and Co-founder, in The Verge.

In a blog article, Hassabis listed several projects and organizations already using AlphaFold. They include:

“As researchers seek cures for diseases and pursue solutions to other big problems facing humankind—including antibiotic resistance, microplastic pollution, and climate change—they will benefit from fresh insights in the structure of proteins,” Hassabis wrote.

Because of the deep financial backing that Alphabet/Google can offer, it is reasonable to predict that DeepMind will make progress with its AI technology that regularly adds capabilities and accuracy, allowing AlphaFold to be effective for many uses.

This will be particularly true for the development of new diagnostic assays that will give clinical laboratories better tools for diagnosing disease earlier and more accurately.

—Donna Marie Pocius

Related Information:

DeepMind Creates ‘Transformative’ Map of Human Proteins Drawn by Artificial Intelligence

AlphaFold Can Accurately Predict 3D Models of Protein Structures and Has the Potential to Accelerate Research in Every Field of Biology

Putting the Power of AlphaFold into the World’s Hands

Highly Accurate Protein Structure Prediction with AlphaFold

International Team of Genetic Researchers Claim to Have Successfully Mapped the Entire Human Genome

With 100% of the human genome mapped, new genetic diagnostic and disease screening tests may soon be available for clinical laboratories and pathology groups

Utilizing technology developed by two different biotechnology/genetic sequencing companies, an international consortium of genetic scientists claim to have sequenced 100% of the entire human genome, “including the missing parts,” STAT reported. This will give clinical laboratories access to the complete 3.055 billion base pair (bp) sequence of the human genome.

Pacific Biosciences (PacBio) of Menlo Park, Calif., and Oxford Nanopore Technologies of Oxford Science Park, United Kingdom (UK), independently developed the technologies that aided the group of scientists, known collectively as the Telomere-to-Telomere (T2T) Consortium, in the complete mapping of the human genome.

If validated, this achievement could greatly impact future genetic research and genetic diagnostics development. That also will be true for precision medicine and disease-screening testing.

The T2T scientists presented their findings in a paper, titled, “The Complete Sequence of a Human Genome,” published in bioRxiv, an open-access biology preprint server hosted by Cold Spring Harbor Laboratory.

Completing the First “End-to-End” Genetic Sequencing

In June of 2000, the Human Genome Project (HGP) announced it had successfully created the first “working draft” of the human genome. But according to the National Human Genome Research Institute (NHGRI), the draft did not include 100% of the human genome. It “consists of overlapping fragments covering 97% of the human genome, of which sequence has already been assembled for approximately 85% of the genome,” an NHGRI press release noted.

“The original genome papers were carefully worded because they did not sequence every DNA molecule from one end to the other,” Ewan Birney, PhD, Deputy Director General of the European Molecular Biology Laboratory (EMBL) and Director of EMBL’s European Bioinformatics Institute (EMBL-EBI), told STAT. “What this group has done is show that they can do it end-to-end. That’s important for future research because it shows what is possible,” he added.

In their published paper, the T2T scientists wrote, “Addressing this remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium has finished the first truly complete 3.055 billion base pair (bp) sequence of a human genome, representing the largest improvement to the human reference genome since its initial release.”

Tale of Two Genetic Sequencing Technologies

Humans have a total of 46 chromosomes in 23 pairs that represent tens of thousands of individual genes. Each individual gene consists of numbers of base pairs and there are billions of these base pairs within the human genome. In 2000, scientists estimated that humans have only 30,000 to 35,000 genes, but that number has since been reduced to just above 20,000 genes.

According to STAT, “The work was possible because the Oxford Nanopore and PacBio technologies do not cut the DNA up into tiny puzzle pieces.”

PacBio used HiFi sequencing, which is only a few years old and provides the benefits of both short and long reads. STAT noted that PacBio’s technology “uses lasers to examine the same sequence of DNA again and again, creating a readout that can be highly accurate.” According to the company’s website, “HiFi reads are produced by calling consensus from subreads generated by multiple passes of the enzyme around a circularized template. This results in a HiFi read that is both long and accurate.”

Oxford Nanopore uses electrical current in its sequencing devices. In this technology, strands of base pairs are pressed through a microscopic nanopore one molecule at a time. Those molecules are then zapped with electrical currents to enable scientists to determine what type of molecule they are and, in turn, identify the full strand.

The T2T Consortium acknowledge in their paper that they had trouble with approximately 0.3% of the genome, but that, though there may be a few errors, there are no gaps.

Karen Miga

“You’re just trying to dig into this final unknown of the human genome,” Karen Miga (above), Assistant Professor in the Biomolecular Engineering Department at the University of California, Santa Cruz (UCSC), Associate Director at the UCSC Genomics Institute, and lead author of the T2T Consortium study, told STAT. “It’s just never been done before and the reason it hasn’t been done before is because it’s hard.” (Photo copyright: University of California, Santa Cruz.)

Might New Precision Medicine Therapies Come from T2T Consortium’s Research?

The researchers claim in their paper that the number of known base pairs has grown from 2.92 billion to 3.05 billion and that the number of known genes has increased by 0.4%. Through their research, they also discovered 115 new genes that code for proteins.

The T2T Consortium scientists also noted that the genome they sequenced for their research did not come from a person but rather from a hydatidiform mole, a rare growth that occasionally forms on the inside of a women’s uterus. The hydatidiform occurs when a sperm fertilizes an egg that has no nucleus. As a result, the cells examined for the T2T study contained only 23 chromosomes instead of the full 46 found in most humans.

Although the T2T Consortium’s work is a huge leap forward in the study of the human genome, more research is needed. The consortium plans to publish its findings in a peer-reviewed medical journal. In addition, both PacBio and Oxford Nanopore plan to develop a way to sequence the entire 46 chromosome human genome in the future.

The future of genetic research and gene sequencing is to create technologies that will allow researchers to identify single nucleotide polymorphisms (SNPs) that contain longer strings of DNA. Because these SNPs in the human genome correlate with medical conditions and response to specific genetic therapies, advancing knowledge of the genome can ultimately provide beneficial insights that may lead to new genetic tests for medical diagnoses and help medical professionals determine the best, personalized therapies for individual patients.

—JP Schlingman

Related Information

Scientists Say They’ve Finally Sequenced the Entire Human Genome. Yes, All of It.

Researchers Claim They Have Sequenced the Entirety of the Human Genome—Including the Missing Parts

The Complete Sequence of a Human Genome

HiFi Reads for Highly Accurate Long-Read Sequencing

President Clinton Announces the Completion of the First Survey of the Entire Human Genome

Genome the Crowning Achievement of Medicine in 2000

International Human Genome Sequencing Consortium Announces “Working Draft” of Human Genome

;