Harvard researchers George Church, Yuan Gao and Sriram Kosuri have figured out how to encode massive amounts of binary data as DNA sequences with “barcodes” so they can be sequenced and easily assembled. This new method allows them to store up to 700 terabytes (that is 700 trillion bytes) of data in a single gram of DNA, pretty incredible stuff!
Abstract from Next-Generation Digital Information Storage in DNA
Digital information is accumulating at an astounding rate, straining our ability to store and archive it. DNA is among the most dense and stable information media known. The development of new technologies in both DNA synthesis and sequencing make DNA an increasingly feasible digital storage medium. Here, we develop a strategy to encode arbitrary digital information in DNA, write a 5.27-megabit book using DNA microchips, and read the book using next-generation DNA sequencing.
Excerpt from Harvard cracks DNA storage, crams 700 terabytes of data into a single gram by Sebastian Anthony
It is only with recent advances in microfluidics and labs-on-a-chip that synthesizing and sequencing DNA has become an everyday task, though. While it took years for the original Human Genome Project to analyze a single human genome (some 3 billion DNA base pairs), modern lab equipment with microfluidic chips can do it in hours. Now this isn’t to say that Church and Kosuri’s DNA storage is fast — but it’s fast enough for very-long-term archival.
Just think about it for a moment: One gram of DNA can store 700 terabytes of data. That’s 14,000 50-gigabyte Blu-ray discs… in a droplet of DNA that would fit on the tip of your pinky. To store the same kind of data on hard drives — the densest storage medium in use today — you’d need 233 3TB drives, weighing a total of 151 kilos. In Church and Kosuri’s case, they have successfully stored around 700 kilobytes of data in DNA — Church’s latest book, in fact — and proceeded to make 70 billion copies (which they claim, jokingly, makes it the best-selling book of all time!) totaling 44 petabytes of data stored.
Excerpt from Writing the Book in DNA by R. ALAN LEO
About four grams of DNA theoretically could store the digital data humankind creates in one year.
Although other projects have encoded data in the DNA of living bacteria, the Church team used commercial DNA microchips to create standalone DNA. “We purposefully avoided living cells,” Church said. “In an organism, your message is a tiny fraction of the whole cell, so there’s a lot of wasted space. But more importantly, almost as soon as a DNA goes into a cell, if that DNA doesn’t earn its keep, if it isn’t evolutionarily advantageous, the cell will start mutating it, and eventually the cell will completely delete it.”
In another departure, the team rejected so-called “shotgun sequencing,” which reassembles long DNA sequences by identifying overlaps in short strands. Instead, they took their cue from information technology, and encoded the book in 96-bit data blocks, each with a 19-bit address to guide reassembly. Including jpeg images and HTML formatting, the code for the book required 54,898 of these data blocks, each a unique DNA sequence. “We wanted to illustrate how the modern world is really full of zeroes and ones, not As through Zs alone,” Kosuri said.