Results of the UK study confirm for clinical laboratory professionals the importance of fully understanding the design and function of SNP chips they may be using in their labs
Here is another example of a long-established clinical laboratory test that—upon new evidence—turns out to be not as accurate as once thought. According to research conducted at the University of Exeter in Devon, UK, Single-nucleotide polymorphism (SNP) chips (aka, SNP microarrays)—technology commonly used in commercial genetic testing—is inadequate at detecting rare gene variants that can increase breast cancer risk.
A news release announcing the results of the large-scale study states, “A technology that is widely used by commercial genetic testing companies is ‘extremely unreliable’ in detecting very rare variants, meaning results suggesting individuals carry rare disease-causing genetic variants are usually wrong.”
Why is this a significant finding for clinical laboratories? Because medical laboratories performing genetic tests that use SNP chips should be aware that rare genetic variants—which are clinically relevant to a patient’s case—may not be detected and/or reported by the tests they are running.
UK Researchers Find ‘Shockingly High False Positives’
The conclusion reached by the Exeter researchers, the BMJ study states, is that “SNP chips are extremely unreliable for genotyping very rare pathogenic variants and should not be used to guide health decisions without validation.”
Leigh Jackson, PhD, Lecturer in Genomic Medicine at University of Exeter and co-author of the BMJ study, said in the news release, “The number of false positives on rare genetic variants produced by SNP chips was shockingly high. To be clear: a very rare, disease-causing variant detected using [an] SNP chip is more likely to be wrong than right.”
Large-Scale Study Taps UK Biobank Data
The Exeter researchers were concerned about cases of unnecessary invasive medical procedures being scheduled by women after learning of rare genetic variations in BRCA1 (breast cancer type 1) and BRCA2 (breast cancer 2) tests.
“The inherent technical limitation of SNP chips for correctly detecting rare genetic variants is further exacerbated when the variants themselves are linked to very rare diseases. As with any diagnostic test, the positive predictive value for low prevalence conditions will necessarily be low in most individuals. For pathogenic BRCA variants in the UK Biobank, the SNP chips had an extremely low positive predictive value (1-17%) when compared with sequencing. Were these results to be fed back to individuals, the clinical implications would be profound. Women with a positive BRCA result face a lifetime of additional screening and potentially prophylactic surgery that is unwarranted in the case of a false positive result,” they wrote.
Using UK Biobank data from 49,908 participants (55% were female), the researchers compared next-generation sequencing (NGS) to SNP chip genotyping. They found that SNP chips—which test genetic variation at hundreds-of-thousands of specific locations across the genome—performed well when compared to NGS for common variants, such as those related to type 2 diabetes and ancestry assessment, the study noted.
“Because SNP chips are such a widely used and high-performing assay for common genetic variants, we were also surprised that the differing performance of SNP chips for detecting rare variants was not well appreciated in the wider research or medical communities. Luckily, we had recently received both SNP chip and genome-wide DNA sequencing data on 50,000 individuals through the UK Biobank—a population cohort of adult volunteers from across the UK. This large dataset allowed us to systematically investigate the performance of SNP chips across millions of genetic variants with a wide range of frequencies, down to those present in fewer than 1 in 50,000 individuals,” wrote Wright and Associate Professor of Bioinformatics and Human Genetics at Exeter, Michael Weedon, PhD, in a BMJ blog post.
The Exeter researchers also analyzed data from a small group of people in the Personal Genome Project who had both SNP genotyping and sequencing information available. They focused their analysis on rare pathogenic variants in BRCA1 and BRCA2 genes.
The researchers found:
The rarer the variant, the less reliable the test result. For example, for “very rare variants” in less than one in 100,000 people, 84% found by SNP chips were false positives.
Low positive predictive values of about 16% for very rare variants in the UK Biobank.
Nearly all (20 of 21) customers of commercial genetic testing had at least one false positive rare disease-causing variant incorrectly genotyped.
SNP chips detect common genetic variants “extremely well.”
Advantages and Capabilities of SNP Chips
Compared to next-gen genetic sequencing, SNP chips are less costly. The chips use “grids of hundreds of thousands of beads that react to specific gene variants by glowing in different colors,” New Scientist explained.
Common variants of BRCA1 and BRCA2 can be found using SNP chips with 99% accuracy, New Scientist reported based on study data.
However, when the task is to find thousands of rare variants in BRCA1 and BRCA2 genes, SNP chips do not fare so well.
“It is just not the right technology for the job when it comes to rare variants. They’re excellent for the common variants that are present in lots of people. But the rarer the variant is, the less likely they are to be able to correctly detect it,” Wright told CNN.
SNP chips can’t detect all variants because they struggle to cluster needed data, the Exeter researchers explained.
“SNP chips perform poorly for genotyping rare genetic variants owing to their reliance on data clustering. Clustering data from multiple individuals with similar genotypes works very well when variants are common,” the researchers wrote. “Clustering becomes more difficult as the number of people with a particular genotype decreases.”
Clinical laboratories Using SNP Chips
The researchers at Exeter unveiled important information that pathologists and medical laboratory professionals will want to understand and monitor. Cancer patients with rare genetic variants may not be diagnosed accurately because SNP chips were not designed to identify specific genetic variants. Those patients may need additional testing to validate diagnoses and prevent harm.
Protecting patient privacy is of critical importance, and yet researchers reidentified data using only a few additional data points, casting doubt on the effectiveness of existing federally required data security methods and sharing protocols
Therefore, recent coverage in The Guardian which reported on how easily so-called “deidentified data” can be reidentified with just a few additional data points should be of particular interest to clinical laboratory and health network managers and stakeholders.
“We found that patients can be re-identified, without decryption, through a process of linking the unencrypted parts of the record with known information about the individual such as medical procedures and year of birth,” Culnane stated in a UM news release. “This shows the surprising ease with which de-identification can fail, highlighting the risky balance between data sharing and privacy.”
In a similar study published in Scientific Reports, Yves-Alexandre de Montjoye, PhD, a computation private researcher, used location data on 1.5 million people from a mobile phone dataset collected over 15 months to identify 95% of the people in an anonymized dataset using four unique data points. With just two unique data points, he could identify 50% of the people in the dataset.
“Location data is a fingerprint. It’s a piece of information that’s likely to exist across a broad range of data sets and could potentially be used as a global identifier,” Montjoye told The Guardian.
The problem is exacerbated by the fact that everything we do online these days generates data—much of it open to the public. “If you want to be a functioning member of society, you have no ability to restrict the amount of data that’s being vacuumed out of you to a meaningful level,” Chris Vickery, a security researcher and Director of Cyber Risk Research at UpGuard, told The Guardian.
This privacy vulnerability isn’t restricted to just users of the Internet and social media. In 2013, Latanya Sweeney, PhD, Professor and Director at Harvard’s Data Privacy Lab, performed similar analysis on approximately 579 participants in the Personal Genome Project who provided their zip code, date of birth, and gender to be included in the dataset. Of those analyzed, she named 42% of the individuals. Personal Genome Project later confirmed 97% of her submitted names according to Forbes.
In testimony before the Privacy and Integrity Advisory Committee of the Department of Homeland Security (DHS), Latanya Sweeney, PhD (above), Professor and Director at Harvard’s Data Privacy Lab stated, “One problem is that people don’t understand what makes data unique or identifiable. For example, in 1997 I was able to show how medical information that had all explicit identifiers, such as name, address and Social Security number removed could be reidentified using publicly available population registers (e.g., a voter list). In this particular example, I was able to show how the medical record of William Weld, the Governor of Massachusetts of the time, could be reidentified using only his date of birth, gender, and ZIP. In fact, 87% of the population of the United States is uniquely identified by date of birth (e.g., month, day, and year), gender, and their 5-digit ZIP codes. The point is that data that may look anonymous is not necessarily anonymous. Scientific assessment is needed.” (Photo copyright: US Department of Health and Human Services.)
“Open publication of deidentified records like health, census, tax or Centrelink data is bound to fail, as it is trying to achieve two inconsistent aims: the protection of individual privacy and publication of detailed individual records,” Dr. Teague noted in the UM news release. “We need a much more controlled release in a secure research environment, as well as the ability to provide patients greater control and visibility over their data.”
While studies are mounting to show how vulnerable deidentified information might be, there’s little in the way of movement to fix the issue. Nevertheless, clinical laboratories should consider carefully any decision to sell anonymized (AKA, blinded) patient data for data mining purposes. The data may still contain enough identifying information to be used inappropriately. (See Dark Daily, “Coverage of Alexion Investigation Highlights the Risk to Clinical Laboratories That Sell Blinded Medical Data,” June 21, 2017.)
Should regulators and governments address the issue, clinical laboratories and healthcare providers could find more stringent regulations on the sharing of data—both identified and deidentified—and increased liability and responsibility regarding its governance and safekeeping.
Until then, any healthcare professional or researcher should consider the implications of deidentification—both to patients and businesses—should people use the data shared in unexpected and potentially malicious ways.
Because of isolation from the worldwide DNA pool for the past 1,200 years, Faroese population is vulnerable to recessive gene disorders
Because of the dramatic—and still falling—cost of DNA sequencing, an ambitious project is launching with the goal of sequencing the full DNA of all 50,000 residents of the Faroe Islands. When completed, this project has the potential to reshape molecular diagnostics and clinical laboratory testing.
FarGen is the name of this effort and pathologists and clinical laboratory managers will want to follow its progress. Organizers of this unique effort expect that it will speed up the use of personalized medicine in mainstream medicine. This tiny, self-governing Danish land, located between Iceland and Norway, is moving forward with plans to decipher complete DNA sequences for every one of its 50,000 citizens. (more…)