A happy first publication. A sad first publication.
Today, January 30th, 2013, my first co-first author primary paper was published online in Genome Biology. A long road from conception to a final product. Center stage for the conception was Simon Chan. Also for him is this a first: his first posthumous paper.
Simon joined UC Davis as a faculty member back in 2006 and not long after he arrived he started talking to Ian Korf. “Would it be possible to identify centromere tandem repeats from whole genome shotgun data?” Ian of replied in typical Ian style: “Sure, we’ll just write a Perl script to do the work for us!” And so the project was born, but at this stage it was not more developed as a human newborn baby would be.
The basic principle of the project was to identify and quantify the most abundant tandem repeats is the prime suspect to be the centromere tandem repeat. In 2006 is was strongly suspected that the centromeres of most animal and plant genomes consist of tandem repeats. Over time more and more evidence came forward that this indeed is true for most genomes, but certainly not all. In some species the centromeres consist of chromosome-specific sequences or unique sequences. In other cases several different tandem repeat sequences occupy the centromeres. And it gets even more bizarre when you think about species such as Caenorhabditis elegans. C. elegans’ chromosomes lack a primary constriction and thus a classic centromere. Rather the spindles bind along the entire length of the chromosome to unique sequences. These holocentric chromosomes are more common than often appreciated, as we review in another paper.
A few rotation students started to make some leeway, and once I did my rotation the very basic principles were just proven. Now it was time to expand it multiple genomes. First we focussed on WGS data obtained via old-fashioned Sanger sequencing. At this time I also had joined both Simon’s and Ian’s labs. We expended to genomes sequenced by Illumina or 454 sequencing. The sequences obtained by these next-generation platforms pose a new problem. We had to assemble those sequences most people throw in the trash, as they care more about genes than about junk DNA. And we explicitly wanted to get our hands on the junk DNA. Help came from DeRisi lab. DeRisi’s postdoc Graham Ruby was writing PRICE, a short-read assembler that allows you to align imperfect reads. Exactly what we needed, as the sequence variation of each individual repeat can be over 30% (on a stretch of 170 bp).
After 3 years of hard work we had data of 282 species. Sure, we did some PacBio sequencing ourselves (and help from PacBio and the sequencing core at UC Davis and Tim Smith at the USDA Meat Animal Research center). This data was very useful for identifying very very long tandem repeat found in bovine species. Now we had to make sense of the data. This would be much easier if we were looking at genes, but we aren’t. We look at repetitive DNA in tandem. To simplify our analysis we focussed on the consensus repeat sequences, but even that did not make things much easier. The problem remained that most sequences are too different to be compared. This observation was in line with published work: centromere DNA is very fast evolving. Even the point centromere of Saccharomyces cerevisiae where its 125 bp large centromere is evolving 3 times faster than non-selected stretches of DNA in its genome. Of course we also looked at various theories on centromere evolution and we found that our data is compatible with both predominant theories: centromere female meiotic drive and the library hypothesis.
We wrote the paper and went through many versions, mainly because Simon decided that writing it different would make it better (and usually did). Sadly Simon was only directly involved in the first submission. Just before he was hospitalized I took care of the second submission. The third and final submission, Simon never knew about as he had already died. That is what makes this his first posthumous paper, accompanied by Luca Comai’s comments. Thrilled it is published, but sad I cannot share the joy with the person who started the project back in 2006.
Melters DP, Bradnam KR, Young HA, Telis N, May MR, et al Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biology 14: R10