News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel

News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel
Sign In

Proteomics-based Clinical Laboratory Testing May Get a Major Boost as Google’s DeepMind Research Lab Is Making Public Its Entire AI Database of Human Protein Predictions

DeepMind hopes its unrivaled collection of data, enabled by artificial intelligence, may advance development of precision medicines, new medical laboratory tests, and therapeutic treatments

‘Tis the season for giving, and one United Kingdom-based artificial intelligence (AI) research laboratory is making a sizeable gift. After using AI and machine learning to create “the most comprehensive map of human proteins,” in existence, DeepMind, a subsidiary of Alphabet Inc. (NASDAQ:GOOGL), parent company of Google, plans to give away for free its database of millions of protein structure predictions to the global scientific community and to all of humanity, The Verge reported.

Pathologists and clinical laboratory scientists developing proteomic assays understand the significance of this gesture. They know how difficult and expensive it is to determine protein structures using sequencing of amino acids. That’s because the various types of amino acids in use cause the [DNA] string to “fold.” Thus, the availability of this data may accelerate the development of more diagnostic tests based on proteomics.

“For decades, scientists have been trying to find a method to reliably determine a protein’s structure just from its sequence of amino acids. Attraction and repulsion between the 20 different types of amino acids cause the string to fold in a feat of ‘spontaneous origami,’ forming the intricate curls, loops, and pleats of a protein’s 3D structure. This grand scientific challenge is known as the protein-folding problem,” a DeepMind statement noted.

Enter DeepMind’s AlphaFold AI platform to help iron things out. “Experimental techniques for determining structures are painstakingly laborious and time consuming (sometimes taking years and millions of dollars). Our latest version [of AlphaFold] can now predict the shape of a protein, at scale and in minutes, down to atomic accuracy. This is a significant breakthrough and highlights the impact AI can have on science,” DeepMind stated.

Release of Data Will Be ‘Transformative’

In July, DeepMind announced it would begin releasing data from its AlphaFold Protein Structure Database which contains “predictions for the structure of some 350,000 proteins across 20 different organisms,” The Verge reported, adding, “Most significantly, the release includes predictions for 98% of all human proteins, around 20,000 different structures, which are collectively known as the human proteome. By the end of the year, DeepMind hopes to release predictions for 100 million protein structures.”

According to Edith Heard, PhD, Director General of the European Molecular Biology Laboratory (EMBL), the open release of such a dataset will be “transformative for our understanding of how life works,” The Verge reported.  

Demis Hassabis

“I see this as the culmination of the entire 10-year-plus lifetime of DeepMind,” company CEO and co-founder Demis Hassabis (above), told The Verge. “From the beginning, this is what we set out to do: to make breakthroughs in AI, test that on games like Go and Atari, [and] apply that to real-world problems, to see if we can accelerate scientific breakthroughs and use those to benefit humanity.” The release of DeepMind’s entire protein prediction database will certainly do that. Clinical laboratory scientists worldwide will have free access to use it in developing new precision medicine treatments based on proteomics. (Photo copyright: BBC.)

Free Data about Proteins Will Accelerate Research on Diseases, Treatments

Research into how protein folds and, thereby, functions could have implications to fighting diseases and developing new medicines, according to DeepMind. 

“This will be one of the most important datasets since the mapping of the human genome,” said Ewan Birney, PhD, Deputy Director General of the EMBL, in the DeepMind statement. EMBL worked with DeepMind on the dataset.

DeepMind protein prediction data are already being used by scientists in medical research. “Anyone can use it for anything. They just need to credit the people involved in the citation,” said Demis Hassabis, DeepMind CEO and Co-founder, in The Verge.

In a blog article, Hassabis listed several projects and organizations already using AlphaFold. They include:

“As researchers seek cures for diseases and pursue solutions to other big problems facing humankind—including antibiotic resistance, microplastic pollution, and climate change—they will benefit from fresh insights in the structure of proteins,” Hassabis wrote.

Because of the deep financial backing that Alphabet/Google can offer, it is reasonable to predict that DeepMind will make progress with its AI technology that regularly adds capabilities and accuracy, allowing AlphaFold to be effective for many uses.

This will be particularly true for the development of new diagnostic assays that will give clinical laboratories better tools for diagnosing disease earlier and more accurately.

—Donna Marie Pocius

Related Information:

DeepMind Creates ‘Transformative’ Map of Human Proteins Drawn by Artificial Intelligence

AlphaFold Can Accurately Predict 3D Models of Protein Structures and Has the Potential to Accelerate Research in Every Field of Biology

Putting the Power of AlphaFold into the World’s Hands

Highly Accurate Protein Structure Prediction with AlphaFold

ProteomeTools Researchers Announces Milestone Creation of 330,000-Peptide Human Proteome and Creating Resource for Developing New Medical Laboratory Tests

Project should provide treasure-trove of molecular information on human protein and lead to development of new biomarkers for use in clinical laboratory tests and personalized medicine

Human proteins provide clinical laboratories and anatomic pathology groups with a rich source of biomarkers used in medical tests and personalized medicine. Pathologists, therefore, should take note of a major milestone achieved by researchers from the Technical University of Munich (TUM) that moves science closer to developing a way to understand the complete human proteome.

Scientists participating in the ProteomeTools project have announced the synthesis of a library of more than 330,000 peptides representing essentially all canonical proteins of the human proteome.

Translating Human Proteome into Molecular and Digital Tools

The ProteomeTools project is “a joint effort of TUM, JPT Peptide Technologies, SAP SE, and Thermo Fisher Scientific … dedicated to translating the human proteome into molecular and digital tools for drug discovery, personalized medicine, and life science research.” Over the course of the project, 1.4 million synthetic peptides covering essentially all human gene products will be synthesized and analyzed using multimodal liquid chromatography-tandem mass spectrometry (LC-MS/MS).

ProteomeTools published their first paper, “Building ProteomeTools Based on a Complete Synthetic Human Proteome,” which detailed their work in Nature Methods.

“ProteomeTools was started as a collaborative effort bringing together academic and industrial partners to make important contributions to the field of proteomics. It is gratifying to see that this work is now producing a wealth of significant results,” stated TUM researcher Bernhard Kuster, PhD, one of the leaders of the effort and senior author on the Nature Methods paper, in a TUM news release.

Thousands of New Biomarkers for Clinical Laboratories, and More!

Kuster discussed the significance of the consortium’s work in an article published in Genome Web, which described ProteomeTools as “a resource that provides the proteomics community with a set of established standards against which it can compare experimental data.”

“In proteomics today, we are doing everything by inference,” Kuster stated to Genome Web. “We have a tandem mass spectrum and we use a computer algorithm to match it to a peptide sequence that [is generated] in silico to simulate what their spectrum might look like without us actually knowing what it looks like. That is a very fundamental problem.”

Bernhard Kuster, PhD

Bernhard Kuster, PhD (above center), of the Technical University of Munich (TUM), led a team of researchers from the ProteomeTools project who completed a tandem mass spectrometry analysis of more than 330,000 synthetic tryptic peptides representing essentially all of the canonical human gene products. The resource eventually will cover all one million peptides. (Photo copyright: Andreas Heddergott/TUM.)

In the Genome Web article, Kuster provides an example of how researchers could use the information developed by ProteomeTools, noting it could be useful for confirming peptide identification in borderline cases. “Because the spectra for these synthetic peptides are available to everyone, you could look up a protein or peptide ID that you find exciting, but where the [experimental] data might not totally convince you as to whether it is true or not,” he explained.

Kuster also states that he believes the resource has the potential to allow “the field to move away from conventional database searching methods toward a spectral matching approach.”

The TUM news release notes that the ProteomeTools project “will generate a further one million peptides and corresponding spectra with a focus on splice variants, cancer mutations, and post-translational modifications, such as phosphorylation, acetylation, and ubiquitinylation.” The end result could be a treasure-trove of molecular information on the human proteome and development of thousands of new biomarkers for clinical use for therapeutic drugs, and more.

“Representing the human proteome by tandem mass spectra of synthetic peptides alleviates some of the current issues with protein identification and quantification. The libraries of peptides and spectra now allow us to develop new and improve upon existing hardware, software, workflows, and reagents for proteomics. Making all the data available to the public provides a wonderful opportunity to exploit this resource beyond what a single laboratory can do. We are now reaching out to the community to suggest interesting sets of peptides to make and measure as well as to create LC-MS/MS data on platforms not available to the ProteomeTools consortium,” Kuster stated in the TUM news release.

All data from the ProteomeTools project is available at the ProteomeXchange Consortium. Pathologists and clinical laboratory professionals working to develop new assays will find it to be a valuable resource.

—Andrea Downing Peck

Related Information:

Researchers Build Complete Synthetic Human Proteome

Building Proteome Tools Based on a Complete Synthetic Human Proteome

Milestone for the Analysis of Human Proteomes

Researchers Produce First Map of Human Proteome, Generating Promise for Developing Novel Medical Laboratory Tests and New Therapeutics

The human proteome map provides a catalog of proteins expressed in nondiseased issues and organs to use as baseline in understanding changes that occur in disease

Given the growing importance of proteins in medical laboratory testing, pathologists will want to know about a major milestone recently achieved in this field. Researchers have announced that drafts of the complete human proteome have been released to the public.

Experts are comparing this to the first complete map of the human genome that was made public in 2000. Clinical laboratory managers and pathologists know how the availability of this information provided the foundation for rapid advances in understanding different aspects involving DNA and RNA.
(more…)

;