Nov 9, 2018 | Instruments & Equipment, Laboratory Instruments & Laboratory Equipment, Laboratory Management and Operations, Laboratory News, Laboratory Pathology
Studies show consumer genealogy databases are much broader than is generally known. If your cousins are in such a database, it’s likely you are too
Recent news stories highlighted crime investigators who used the DNA data in consumer genetic genealogy databases to solve cold cases. Though not widely known, such uses of direct-to-consumer DNA databases is becoming more commonplace, which might eventually lead to requests for clinical laboratories to assist in criminal investigations involving DNA data.
Case in point: investigators found the Golden State Killer, a serial killer/rapist/burglar who terrorized multiple California counties over a dozen years in the 1970s to 1980s, after uploading a DNA sample from the crime scene to GEDmatch, an open-data genomics database that features tools for genealogy research. They made the arrest after discovering a distant relative’s DNA in the genealogy database and matching it to the suspect, CBS News revealed in a 60 Minutes Overtime online report.
These and other investigators are using a technique called familial DNA testing (AKA, DNA Profiling), which enables them to use genetic material from relatives to solve crimes.
Clinical laboratories oversee DNA databases. Could DNA databases—developed and managed over years by medical laboratories for patient care—be subpoenaed by law enforcement investigating crimes?
The question raises many issues for society and for labs, including privacy responsibilities and appropriate use of genetic information. On the other hand, the genetic genie is already out of the bottle.
Leveraging Familia DNA to Solve Crimes a New Trend
“The solving of the Golden State Killer case opened this method up as a possibility, and other crime labs are taking advantage of it. Clearly, a trend has started,” Ruth Dickover, PhD, Director of Forensic Science, University of California, Davis, told the Los Angeles Times.
Indeed, the use of familial DNA testing is moving forward. The Verge reported 19 cold case samples have been identified in recent familial DNA testing and public database searches. It also said two new published studies may propel the technique further.
One study, published in the journal Science, suggests nearly every American of European ancestry may soon be identified through familial DNA testing.
The other study, published in Cell, shows that a person’s relatives can be detected when forensic DNA data are compared with consumer genetic databases.
Noah Rosenberg, PhD (above left), Professor of Population Genetics and Society Biology at Stanford University, is shown above working with Jaehee Kim, PhD (right), a Postdoctoral Research Fellow in Biology, on math that could be used to track down relatives in genealogy databases based on forensic DNA. “This could be a way of expanding the reach of forensic genetics, potentially for solving even more cold cases. But at the same time, it could be exposing participants in those databases to forensic searches they might not have anticipated,” he told Wired. (Photo copyright: Stanford University/L.A. Cicero.)
15 Million People Already in Genealogy Databases
Researchers at Columbia University in New York and Hebrew University of Jerusalem told Science they were motivated by the recent trend of investigations leveraging third-party consumer genomics services to find criminals. But they perceived a gap.
“The big limitation is coverage. And even if you find an individual it requires complex analysis from that point,” Yaniv Erlich, PhD, Associate Professor at Columbia and Chief Science Officer at MyHeritage, told The Verge. MyHeritage is an online genealogy platform.
Others offering consumer genetic testing and family history exploration include 23andMe and Ancestry. As of April 2018, more than 15 million people have participated in direct-to-consumer genetic testing, the researchers noted.
The study aimed to find the likelihood that a person can be identified using a long-range familial search. It included these steps and findings:
- Statistical analysis of 1.28 million people in the MyHeritage database;
- Pairs of people with “identity-by-descent” were removed to avoid bias, such as first cousins and closer relationships;
- Researchers aimed at finding a third cousin or closer relatives for each person in the database;
- 60% of the 1.28 million people were matched with a third cousin or closer relative.
“We project that about 60% of the searches for individuals of European-descent will result in a third cousin or closer match, which can allow their identification using demographic identifiers. Moreover, the technique could implicate nearly any US individual of European descent in the near future,” the researchers wrote.
In an interview with Wired, Erlich added, “The takeaway is it doesn’t matter if you’ve been tested or not tested. You can be identified because the databases already cover such large fractions of the US—at least for European ancestry.”
Matching Forensic and Consumer Genetic Data
Meanwhile, the study published in Cell by researchers at Stanford University, University of California, Davis, and the University of Michigan also suggests investigators could compare forensic DNA samples with consumer genetic databases to find people related to criminals.
That study found:
- 30% to 32% of people in a forensic database could be related to a child or parent in a consumer database;
- 35% to 36% could be tied to a sibling.
These studies reveal that genetic data and familial DNA testing can help law enforcement find suspects, which is a good thing for society. But people who uploaded DNA data to some direct-to-consumer databases may find themselves caught up in searches they do not know about. So may their cousins.
Dark Daily recently covered other similar studies that showed it takes just one person’s DNA to reveal genetic information on an entire family. (See, “The Problems with Ancestry DNA Analyses,” October 18, 2018.) These developments in the use of DNA databases to identify criminals should be an early warning to clinical laboratories building databases of genetic information that, at some future point, law enforcement agencies might want access to those databases as part of ongoing criminal investigations.
—Donna Marie Pocius
Related Information:
Could Your DNA Help Solve a Cold Case?
So Many People Have Had Their DNA Sequenced That They’ve Put Other People’s Privacy in Jeopardy
The DNA Technique That Caught the Golden State Killer is More Powerful than We Thought
Identity Inference of Genomic Data Using Long-Range Familial Searches
Statistical Detection of Relatives Typed with Disjoint Forensic and Biomedical Loci
Genome Hackers Show No One’s DNA is Anonymous Anymore
Stanford Researchers Discover a New Way to Find Relatives from Forensic DNA
The Problems with Ancestry DNA Analyses
Oct 25, 2017 | Instruments & Equipment, Laboratory Instruments & Laboratory Equipment, Laboratory Management and Operations, Laboratory News, Laboratory Pathology
Researchers demonstrated it was feasible to encode digital malware onto a strand of synthesized DNA and infect the gene sequencers and computer networks used by medical laboratories
As if anatomic pathology groups and clinical laboratory leaders don’t already have enough to think about, here comes a security vulnerability right out of a sci-fi thriller. Researchers at the University of Washington (UW) have used synthesized DNA to encode digital malware into a physical strand of DNA capable of establishing a remote connection to the computer network on which the sequenced DNA is read!
Stated differently, researchers have now demonstrated that is possible for bad guys to hack into a medical laboratory’s instrument systems and computer network using a physical strand of synthesized DNA that is encoded with digital malware.
Another Threat to Clinical Laboratories, Pathology Groups?
Does this translate into an immediate security issue for medical laboratories? For now, the threat is only theoretical. While researchers did succeed, their study findings should provide some comfort to pathology groups or medical laboratories worried about the implications of DNA-based malware. The UW researchers published their findings at the 2017 USENIX Security Symposium.
Synthetic DNA Malware Exploit is More Proof-of-Concept than Immediate Threat
At its core, computer code (AKA source code) is similar to DNA in that it is composed of a set number of states—with binary, zeroes, and ones. This led UW researchers to question whether they could translate the AGCT elements (adenine, guanine, cytosine, and thymine) of DNA into binary code capable of hacking DNA sequencers and accessing the information they contain.
In an article in The Atlantic, Tadayoshi Kohno, PhD, Short-Dooley Professor in the Department of Computer Science and Engineering at UW, who led the research team, noted that, “The present-day threat is very small, and people don’t need to lose sleep immediately. But we wanted to know what was possible and what the issues are down the line.”
Complexity of Engineering a DNA-Powered Computer Virus
To begin the process, researchers needed to create a specific DNA strand encoded with the exact proteins that would later convert into their exploit. An article in ArsTechnica suggests this would be a challenge due to the physical properties of DNA’s double-helix design.
In the article, John Timmer, PhD, wrote, “DNA with Gs and Cs forms a stronger double-helix. Too many of them, and the strand won’t open up easily for sequencing. Too few, and it’ll pop open when you don’t want it to.”
The study shows it took multiple attempts to find a DNA sequence that would both carry the malware code and withstand the synthesizing and sequencing processes. Even then, researchers needed an exploit for the software used on sequencers in clinical laboratories and other diagnostics providers to prove their theory. Study authors used their own modified version of an open-source sequencing software, adding an exploit they could target, instead of a version of the software already publicly in use.
Lee Organick (above left), Karl Koscher (center), and Peter Ney (right) worked with Luis Ceze and Tadayoshi Kohno, PhD, at the University of Washington to develop the DNA sequence containing the malware code. The researchers determined that it was feasible for the gene instruments used by clinical laboratories to be infected with the malware, which could then move to infect a clinical lab’s computer network. (Photo copyright: University of Washington.)
With their proteins synthesized and customized software in place, researchers still faced challenges getting the code to trigger. “With reads randomly appearing in an FASTQ file,” the researchers noted, “we would expect the modified program to be exploited 37.4% of the time.”
As with genetic code, the binary code of a program is highly sensitive to errors. Any misread bases or splitting of the code resulted in failure. When sequencers only read a few hundred bases at a time, ensuring the code doesn’t hit one of these splits is a challenge.
One unique difference between binary and genetic code also caused trouble—genetic sequences aren’t direction dependent, while binary sequences are. If the code is read in reverse, it won’t execute properly.
Future Concerns for Clinical Laboratories and Genetic Researchers
Today, the threat to medical laboratories and the sensitive data generated by sequencing is minor. However, tomorrow that threat could be more common.
In a WIRED article on the subject, Jason Callahan, Chief Information Security Officer for Illumina stated, “This is interesting research about potential long-term risks. We agree with the premise of the study—that this does not pose an imminent threat and is not a typical cyber security capability.”
Don Rule, founder of Translational Software, agrees. When asked about the threat posed to clinical laboratories, he said, “… if you have to pre-introduce the hack in the analytics program, this is a pretty circuitous way to take over a computer. I can see how it is feasible and right now Norton Antivirus is not looking for viruses encoded in the AGCT code set, but we are right not to lose a lot of sleep over it.”
However, as genetic sequencing becomes a common part of medicine, attackers might have increased reason to disrupt services or intercept data. The UW researchers cite “important domains like forensics, medicine, and agriculture” as potential targets.
While their successful attack was highly engineered, their research into open-source sequencing software revealed a range of common security weaknesses. Many clinical laboratories and anatomic pathology groups also run proprietary analysis software or use hardware with embedded software.
They recommend that medical laboratories work to centralize software updates and create ways to verify data and patches through digital signatures or other secure measures.
Already, genetic researchers take care to avoid synthesizing potentially dangerous sequences, and to contain tests and data. But this study shows that not all threats come from within the research or clinical laboratory environment. Both engineers of sequencing technology and hardware—and the medical laboratories using them—will need to optimize operations and monitor trends closely to see how security issues evolve alongside sequencing capabilities.
—Jon Stone
Related Information:
These Scientists Took Over a Computer by Encoding Malware in DNA
Computer Security and Privacy in DNA Sequencing
Computer Security, Privacy, and DNA Sequencing: Compromising Computers with Synthesized DNA, Privacy Leaks, and More
This Speck of DNA Contains a Movie, a Computer Virus, and an Amazon Gift Card
Researchers Encode Malware in DNA, Compromise DNA Sequencing Software
Biohackers Encoded Malware in a Strand of DNA
The Ultimate Virus: How Malware Encoded in Synthesized DNA Can Compromise a Computer System
Researchers Hacked into DNA and Encoded It with Malware