Today, January 30th, 2013, my first co-first author primary paper was published online in Genome Biology. A long road from conception to a final product. Center stage for the conception was Simon Chan. Also for him is this a first: his first posthumous paper.
Simon joined UC Davis as a faculty member back in 2006 and not long after he arrived he started talking to Ian Korf. “Would it be possible to identify centromere tandem repeats from whole genome shotgun data?” Ian of replied in typical Ian style: “Sure, we’ll just write a Perl script to do the work for us!” And so the project was born, but at this stage it was not more developed as a human newborn baby would be.
The basic principle of the project was to identify and quantify the most abundant tandem repeats is the prime suspect to be the centromere tandem repeat. In 2006 is was strongly suspected that the centromeres of most animal and plant genomes consist of tandem repeats. Over time more and more evidence came forward that this indeed is true for most genomes, but certainly not all. In some species the centromeres consist of chromosome-specific sequences or unique sequences. In other cases several different tandem repeat sequences occupy the centromeres. And it gets even more bizarre when you think about species such as Caenorhabditis elegans. C. elegans’ chromosomes lack a primary constriction and thus a classic centromere. Rather the spindles bind along the entire length of the chromosome to unique sequences. These holocentric chromosomes are more common than often appreciated, as we review in another paper.
A few rotation students started to make some leeway, and once I did my rotation the very basic principles were just proven. Now it was time to expand it multiple genomes. First we focussed on WGS data obtained via old-fashioned Sanger sequencing. At this time I also had joined both Simon’s and Ian’s labs. We expended to genomes sequenced by Illumina or 454 sequencing. The sequences obtained by these next-generation platforms pose a new problem. We had to assemble those sequences most people throw in the trash, as they care more about genes than about junk DNA. And we explicitly wanted to get our hands on the junk DNA. Help came from DeRisi lab. DeRisi’s postdoc Graham Ruby was writing PRICE, a short-read assembler that allows you to align imperfect reads. Exactly what we needed, as the sequence variation of each individual repeat can be over 30% (on a stretch of 170 bp).
After 3 years of hard work we had data of 282 species. Sure, we did some PacBio sequencing ourselves (and help from PacBio and the sequencing core at UC Davis and Tim Smith at the USDA Meat Animal Research center). This data was very useful for identifying very very long tandem repeat found in bovine species. Now we had to make sense of the data. This would be much easier if we were looking at genes, but we aren’t. We look at repetitive DNA in tandem. To simplify our analysis we focussed on the consensus repeat sequences, but even that did not make things much easier. The problem remained that most sequences are too different to be compared. This observation was in line with published work: centromere DNA is very fast evolving. Even the point centromere of Saccharomyces cerevisiae where its 125 bp large centromere is evolving 3 times faster than non-selected stretches of DNA in its genome. Of course we also looked at various theories on centromere evolution and we found that our data is compatible with both predominant theories: centromere female meiotic drive and the library hypothesis.
We wrote the paper and went through many versions, mainly because Simon decided that writing it different would make it better (and usually did). Sadly Simon was only directly involved in the first submission. Just before he was hospitalized I took care of the second submission. The third and final submission, Simon never knew about as he had already died. That is what makes this his first posthumous paper, accompanied by Luca Comai’s comments. Thrilled it is published, but sad I cannot share the joy with the person who started the project back in 2006.
Melters DP, Bradnam KR, Young HA, Telis N, May MR, et al Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biology 14: R10
UC Davis Energy Institute Seminar, Wednesday, 30 January 2013, 4:10 – 5:00 p.m.
Location: Energy Institute Main Offices, UC Davis West Village, 1605 Tilia St., Ste. 100, Davis CA 95616
Title: Gasification of Biomass in Fluidized Beds – Thermodynamic Modeling and Experimental Analysis of Single and Multi-bed Reactors
Speaker: Malay Karmakar, Ph. D., Senior Scientist, Thermal Engineering Department
CSIR-Central Mechanical Engineering Research Institute (CSIR-CMERI)
Council of Scientific & Industrial Research, Government of India, New Delhi, India.
ATCGTAGACTATCAGAGACATCGA = 01011010110101010101101011101101101101011001010101101100110101 – electricity + magnet safe
In the timely publication in Nature, the field of synthetic genetics has accomplished in archiving Martin Luther King Jr’s “I have a dream” speech, along with a series of William Shakespeare’s sonnets, a pdf file, and a photograph of the lab that accomplished these feats into DNA, the code of all living organisms.
By translating computerised files into DNA similar to that found in plants and animals, the researchers claim it is possible to store a billion books’ worth of data for thousands of years in just a small test tube.
Although the method is expensive, it could still be much more efficient than hard drives or magnetic tape for long-term storage of large sets of data such as government records, the scientists said.
Within a decade, they expect the technique to have become cheap enough that DNA storage could become cost-efficient for the public to store lifelong keepsakes like wedding videos.
Dr Nick Goldman of the European Bioinformatics Institute, who led the study, said: “We already know that DNA is a robust way to store information because we can extract it from bones of woolly mammoths, which date back tens of thousands of years, and make sense of it.
“It’s also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy.”
Please find the entire article here
For those with access to academic journals, find the original scientific publication here, whom the lead author is Nick Goldman and the last author is Ewan Birney, both of which hail from the European Bioinformatics Institute (EBI).
For those who can’t access the journal article, the cost of this technology, according to the authors, is estimated to be $25,000 per megabyte of storage, and $220 to decode each megabyte. Keep in mind the cost of synthesizing DNA continues to decrease as this technology improves. Those who are interested in trends can take a peek at this plot, which gives the cost synthesis per base (nucleotide) over the past years. Remember, although the price costs cash money now (as with all initial technological infancies before they become commercial), DNA is a form information that is stable for thousands and thousands of years (we’re still trying to resurrect the mammoth, and BMCDB anticipates Jurassic Park 4 to be summer blockbuster in 2014), where conventional computer drives tend to degrade at a decade old, and that these drives are physically huge compared to the same amount of data that can be stored into DNA at a nanoscale – think of DNA as a penny and a computer drive as being the size of AT&T Park, the home of the 2012 World Series Champions, the San Francisco Giants.
Are you looking for potential opportunities to use your science knowledge outside of academia? How about an internship as journalist writing about science and technology? The Economist has a call out for applications to be a summer intern and to write about science and technology for 3 months. http://www.economist.com/news/science-and-technology/21569376-richard-casement-internship
Deadline to submit a cover letter and a writing sample is February 1st.
There is an awesome opportunity to see a talk by Garth Lenz TODAY, Tuesday, Jan 15th at 5:10pm.
Garth Lenz <http://www.garthlenz.com> is a photojournalist who has recently documented the changes taking place in the Alberta Tar Sands of the boreal forests of Canada. He has given a TED talk <http://www.youtube.com/watch?v=84zIj_EdQdM> on these issues and his work, and will be visiting UC Davis to share a slideshow of his photos and talk to us about this important environmental issue. It is a pertinent topic, as the country moves toward a final decision about the XL pipeline which would drive the expansion of the mining operations in the Tar Sands. And hearing this story from a world-renown photojournalist like Garth is truly a unique opportunity.
His talk is being sponsored by the Society for Conservation Biology, Davis and the Geology Students. It is titled “The True Cost of Oil: Images of Beauty and Devastation”
Join us on TODAY, Jan 15th at 5pm in 2 Wellman Hall to see his presentation. Free admission to students and the public.
Please bring your own coffee cup if you can!
No experience in wine is required, but an eligible participant needs to be 21+ years old and have no medical reason for not drinking alcohol. This study was approved by the UC Davis IRB Board for Human Research Protection.
Participants will smell and taste a set of commercial red wines, and simply rate overall opinion of each wine in a computer-based questionnaire.
The study will take place on Saturday, the 2nd of February 2013, specific time TBA, at the Robert Mondavi Institute for Wine and Food Science (RMI Sensory, room 2003) on the UC Davis campus in Davis, CA.
Please find exact location using this link: http://campusmap.ucdavis.edu/?b=126
Free parking on Saturdays is available close by at the South Parking Structure – for directions please see http://campusmap.ucdavis.edu/?l=32