Microsoft Research Archives

University of Washington and Microsoft Research Encode Data into DNA, Demonstrating Potential New Use for Genetic Sequences

Sep 23, 2019 | Instruments & Equipment, Laboratory Instruments & Laboratory Equipment, Laboratory Management and Operations, Laboratory News, Laboratory Operations, Laboratory Pathology, Laboratory Testing, Management & Operations

The proof-of-concept experiment showed data can be encoded in DNA and retrieved using automated systems, a development that may have positive significance for clinical laboratories

It may seem far-fetched, but computer scientists and research groups have worked for years to discover if it is possible to store data on Deoxyribonucleic acid (DNA). Now, Microsoft Rese a rch (MR) and the University of Washington (UW) have achieved just that, and the implications of their success could be far-reaching.

Clinical pathologists are increasingly performing genetic DNA sequencing in their medical laboratories to identify biomarkers for disease, help clinicians understand their patients’ risk for a specific disease, and track the progression of a disease. The ability to store data in DNA would take that to another level and could have an impact on diagnostic pathology. Pathologist familiar with DNA sequencing may find a whole new area of medical service open to them.

The MR/UW researchers recently demonstrated a fully automated system that encoded data into DNA and then recovered the information as digital data. “In a simple proof-of-concept test, the team successfully encoded the word ‘hello’ in snippets of fabricated DNA and converted it back to digital data using a fully automated end-to-end system,” Microsoft stated in a news release.

The MR/UW team published their findings in Nature Scientific Reports.

DNA’s Potential Storage Capacity and Why We Need It

Thus far, the challenge of using DNA for data storage has been that there wasn’t a way to easily code and retrieve the information. That, however, seems to be changing quite rapidly. Several major companies have invested heavily in research, with consumer offerings expected soon.

At Microsoft Research, ‘consumer interest’ in genetic testing has driven the research into using DNA for data storage. “As People get better access to their own DNA, why not also give them the ability to read any kind of data written in DNA?” asked Doug Carmean, an Architect at Microsoft, during an interview with Wired.

Scientists are interested in using DNA for data storage because humanity is creating more data than ever before, and the pace is accelerating. Currently, most of that data is stored on tape, which is inexpensive, but has drawbacks. Tape degrades and has to be replaced every 10 years or so. But DNA, on the other hand, lasts for thousands of years!

“DNA won’t degrade over time like cassette tapes and CDs, and it won’t become obsolete,” Yaniv Erlich, PhD, Chief Science Officer at MyHeritage, an online genealogy platform located in Israel, and Associate Professor, Columbia University, told Science Mag.

Tape also takes up an enormous amount of physical space compared to DNA. One single gram of DNA can hold 215 pet a bytes (roughly one zettabyte) of data. Wired puts the storage capacity of DNA into perspective: “Imagine formatting every movie ever made into DNA; it would be smaller than the size of a sugar cube. And it would last for 10,000 years.”

Researchers at the University of Washington claim, “All the movies, images, emails and other digital data from more than 600 basic smartphones (10,000 gigabytes) can be stored in the faint pink smear of DNA at the end of this test tube.” (Photo and caption copyright: **Tara Brown/University of Washington**.)

Victor Zhirnov, Chief Scientist at Semiconductor Research Corporation says the worries over storage space aren’t simply theoretical. “Today’s technology is already close to the physical limits of scaling,” he told Wired, which stated, “Five years ago humans had produced 4.4 zettabytes of data; that’s set to explode to 160 zettabytes (each year!) by 2025. Current infrastructure can handle only a fraction of the coming data deluge, which is expected to consume all the world’s microchip-grade silicon by 2040.”

MIT Technology Review agrees, stating, “Humanity is creating information at an unprecedented rate—some 16 zettabytes every year. And this rate is increasing. Last year, the research group IDC calculated that we’ll be producing over 160 zettabytes every year by 2025.”

Heavy Investment by Major Players

The whole concept may seem like something out of a science fiction story, but the fact that businesses are investing real dollars into it is evidence that DNA for data storage will likely be a reality in the near future. Currently, there are a couple of barriers, but work is commencing to overcome them.

First, the cost of synthesizing DNA in a medical laboratory for the specific purpose of data storage must be cheaper for the solution to become viable. Second, the sequencing process to read the information must also become less expensive. And third is the problem of how to extract the data stored in the DNA.

In a paper published in ASPLOS ‘16, the MR/UW scientists wrote: “Today, neither the performance nor the cost of DNA synthesis and sequencing is viable for data storage purposes. However, they have historically seen exponential improvements. Their cost reductions and throughput improvements have been compared to Moore’s Law in Carlson’s Curves … Important biotechnology applications such as genomics and the development of smart drugs are expected to continue driving these improvements, eventually making data storage a viable application.”

Automation appears to be the final piece of the puzzle. Currently, too much human labor is necessary for DNA to be used efficiently as data storage.

“Our ultimate goal is to put a system into production that, to the end user, looks very much like any other cloud storage service—bits are sent to a datacenter and stored there and then they just appear when the customer wants them,” said Microsoft principal researcher Karin Strauss (above), in the Microsoft news release. “To do that, we needed to prove that this is practical from an automation perspective.” Click here to watch a Microsoft Research video on the DNA storage process. (Photo copyright: Microsoft Research/YouTube.)

It may take some time before DNA becomes a viable medium for data storage. However, savvy pathology laboratory managers should be aware of, and possibly prepared for, this coming opportunity.

While it’s unlikely the average consumer will see much difference in how they save and retrieve data, medical laboratories with the ability to sequence DNA may find themselves very much in demand because of their expertise in sequencing DNA and interpreting gene sequences.

—Dava Stewart

Related Information:

With a “Hello,” Microsoft and UW Demonstrate First Fully Automated DNA Data Storage

Demonstration of End-to-End Automation of DNA Data Storage

UW Team Stores Digital Images in DNA—and Retrieves Them Perfectly

Microsoft and UW Demonstrate First Fully Automated DNA Data Storage

Storing Data in DNA Is A Lot Easier than Getting It Back Out

DNA Could Store All of the World’s Data in One Room

The Rise of DNA Data Storage

Forget Silicon—SQL On DNA Is the Next Frontier for Databases