Saturday, January 26, 2013

Can you store data in DNA?

Welcome to the latest Science for Writers post. Last time we discussed the a variety of topics in The Top 4 Questions you Never Knew you Wanted to Ask. In this post I will be investigating a piece of cutting edge technology that could revolutionise data storage this century. Even if it doesn't, it can certainly make a massive impact in your science fiction stories.

I have put important words in bold. These words are important in science and I will refer to them throughout the post. It isn't overly important for you to know the exact meaning, so long as you get the gist of what I'm talking about you will be fine following this post.

Writing Links are in italics and these discuss how the science could be used in writing.

What is DNA?

Image from
DNA stands for deoxyribonucleic acid. It is a double helix that is essentially the genetic instructions that everything in a living organism follows.

Each 'rung' on this ladder like structure consists of two nucleobases. These bases are parts of the genetic code. There are a total of 4 bases that can appear in a DNA molecule: Adenine, Thymine, Cytosine, and Guanine.

There are specific rules to how these bases can combine. A can only connect with T, and C can only connect with G. Each base pair is made up of one of those pairings. A is connected to T by 2 Hydrogen Bonds (that is connected by the Hydrogen atoms in their structure), and C is connected to G by 3 Hydrogen Bonds.

But, DNA is more than just 4 types of bases. Each base is also part of what is known as a Nucleotide. A Nucleotide consists of a Phosphate group, a sugar with 5 Carbon atoms  (called deoxyribose) and the base (A, T, C, or G). The Phosphate group connects to the sugar of another Nucleotide, and one side of the ladder is formed. The other side is formed in exactly the same way. Both sides are then connected to each other by the Hydrogen Bonds between bases I mentioned earlier.

Writing Link:  If DNA is what codes for everything in our bodies, imagine the power a villain would
have if she could control and manipulate DNA? What if in the distant future and Authoritarian government had changed the fundamental DNA of all humans so it could be tracked remotely? What would happen if an evil person got hold of the tracking technology and started taking down national heroes?


It is said that in 2011, we created at least 1.8 Zeta Bytes of data (1073741824 Terabytes). In 2020 it is estimated we will create 50 times that amount. The problem with this quantity of data is the immense amount of space required to physically store it. We physically cannot cope with this increasing demand for storage.

google data centre
A small part of a Google Data Centre
Image from
The term cloud computing is misleading. We don't actually store data in the clouds. When you save something to Dropbox, save a game online, post a Facebook update, upload an image to Flikr, search on Google, post a blog article, or do anything which involves a data transfer that needs to be seen again, you are saving something to a hard drive. It may not be in your house, but Facebook and Google have massive data centres filled to the brim with networked hard drives.

Despite being saved to the 'cloud' it is, in actuality, being stored on a hard drive in a building on the ground.

Writing Link:  Massive data storage solutions are vital for any modern civilisation. If you have a future civilisation where everything is connected you need to think about where all that data is stored. If every street has a government camera connected to an artificial intelligence that picks out important events, and they store every phone call and email permanently for future use, that must be stored somewhere. Perhaps the story could revolve around the protagonist destroying a data centre for freedom. Remember though, a government like that would have multiple backups and redundancies in place.

DNA Data Storage

Ewan Birney and Nick Goldman, both of the European Bioinformatics Institute, were in a bar talking about storing the vast amounts of data that the institute produced. As they are both genomicists, they 'joked that DNA, which is incredibly compact and sturdy, [with] a rather lengthy history of storing [genetic] information' would be a suitable candidate. That joke sparked something and they jotted down hurried ideas onto napkins. (Quoted from

Over the next few years they developed a technique which I think is rather elegant, even if quite slow. First you have what you want to store and you convert it into binary (that's 1s and 0s). This binary code is then converted, using software Goldman wrote himself, into base code. Base code consists of A, T, C, and G. Remember them? They're the bases for DNA. So a string of 1s and 0s is converted into a string of As, Ts, Cs, and Gs making sure to follow the rule that A binds with T and C with G. This code is sent off to Agilnet (a company which prints DNA) where using what is essentially an inkjet printer filled with bases, the DNA is synthetically made then posted back. If all is well then the DNA sent back can be decoded using a standard DNA sequencer.

Nick Goldman with DNA - No bigger than  a speck of dust
Image from EMBL
You'd imagine with multiple conversion steps there would be quite a lot of error in the process. Well, surprisingly there's not. Birney and Goldman encoded the unusual combination of a complete set of Shakespeare's sonnets, a PDF of Watson and Cricks paper describing the double helix of DNA, a 26 second mp3 from Martin Luther King's 'I have a Dream' speech, a text file of a compression algorithm, and a colour JPEG photo of the institute onto DNA. When they read it back they initially found 2 small errors. However after a minor modification, they were able to read back the data with 100% accuracy.

Amazingly, the DNA this data was stored on was no bigger than a speck of dust!

Writing Link:  What would you store in DNA if you had the resources? What uses can you think of for storing data in DNA? Is there a story in any of your ideas?

More data, usage, and the future

In a single cup of DNA you could hold the equivalent of 1 million CDs. Just think about that for a moment...

Another team has managed to store far more information in DNA than Goldman did:  a whopping 5.5 petabits (roughly 700 Terabytes). Put another way they have stored the equivalent of 14000 Blu-ray discs (1 disc is 50GB) in a droplet of DNA. To be honest this is, although amazing, a little misleading. George Church and Sri Kosuri actually stored a 700 kilobyte book (Church's book) 70 billion times. Their technique is pretty much the same as Goldman and Birney's. If you wanted to store this on 3-terabyte hard drives you would need 233 of them and that would weigh 151kg!

Church's technique has a little more information about it online, so I can tell you that the bases A and C code for the binary bit 0 and T and G code for 1. I'm not sure if the bases are combined, though. In order to piece it all back together they include a '19-bit address' at the start of each section. The software finds these parts and puts them in order prior to decoding.

Church's method
Image from

Current technology cannot be stored long term without losing quality in the data stored. The best method at the moment for long term backup is with magnetic tapes, but these need replacing every 20 years or so. DNA, on the other hand could be stored for tens of thousands of years. Don't believe me? Well we have been able to read the genome of the woolly mammoth which has been extinct for at least ten thousand years and that wasn't kept under controlled conditions.

All DNA needs to hold its information is a dark, dry, cool room. No power is needed for the DNA itself. It doesn't need to be frozen, though it could be if you wanted to be on the safe side. Also there is very little chance of it being unreadable in the future. Have you ever tried to look at a really old file only to find your computer no longer opens that type of file? Well DNA isn't suddenly going to stop being unreadable. For as long as humans have DNA, technology will keep developing ways to decode it. Who knows in 300 years time the method for decoding DNA may be so fast that archives we make in the next fifty years could be read in seconds.

At the moment DNA is to expensive to be a mainstream back up solution. It has been calculated that for it to be financially better than the alternatives in its current state you would need to store it for 600 to 5000 years - not something most people need to do. However, costs of writing and decoding DNA are rapidly declining and soon the price may make it suitable for 'sub-50-year archiving'. (Quote from

Writing Link:  Imagine the possibilities DNA Data Storage could bring. Need I say more? If I wasn't currently writing a novel, I could quite easily come up with a plot where DNA data storage is important.

For more information check out these links:

So, there you have it, DNA data storage could well be viable in the near future. That's it for this post. Please comment on this post below; I'd love to here from you. Share this post if you enjoyed it. There are social media buttons at the bottom of the post for your convenience.



Related Posts Plugin for WordPress, Blogger...