by George M Church, live from the 9th International Meeting on Human Genome Variation and Complex Genome Analysis, Sep 6-8, 2007 in Barcelona.
Although Jim Watson’s genome hasn’t been through peer review yet, and Craig Venter’s genome doesn’t have a slick web browser like Jim’s genome yet, we’ve seen enough to ask – what next? Someone at the meeting today got some laughs accidentally when they said that they were comparing Craig’s genome to the human genome. Clearly this is a time requiring great caution. So our first question is: where are we with these first two complete diploid genomes? Well, they’re neither complete nor the first. The Craigome has over 4500 gaps (a bit more than the 341 gaps in the haploid 2004 HGP genome). The first human diploid sequence nod goes to the 269 HapMap genomes published in Oct 2005. Nevertheless we now have the first two non-anonymous personal genomes (hopefully millions someday). Oh, and what is it with press-release that our genomes have higher variation than previously thought? The 0.5% variation observed includes a near-perfect fit to the long-known 0.06% SNP frequency, a 0.08% frequency of smaller indels about twice that seen in 330 genes from Seattle studies, and the remainder being copy-number variants (CNV) 87% of which have been described previously. Just like the number of genes in the genome in 2001, the beauty and the news is in the details not in the summary stats.
We can get from genome variations to systems biology “with focused population association studies, animal models, and functional genomics on the cells from the subjects” (Church 2005). To do genome-wide association studies (GWAS), we must ask where the technology costs are leading? Given the drop in price between the arrivals of the two genomes in the NCBI Personal Genomics directory — Craig on June 27 at a cost of $70M, and Jim nine days later at a cost of $1M, an over-zealous extrapolationist might be disappointed that the $1K genome did not arrive on July 25. Seriously now, the point is that neither study is inexpensive enough to scale to genome-wide association studies. SNP-chips at $250 each are scaleable, but tend to miss new and/or rare SNPs and small indels. Next generation sequencing and short-read-pairs (Shendure et al 2005) may bring down costs by a factor of 10. Read-pairs seem ripe to become the method of choice for CNVs, smaller indels, and even inversions. Enrichment by hybridization for at least one read to be in an exon or cis-regulatory site might bring costs down another factor of 50. Even if these GWAS studies efficiently get us beyond “linked alleles” to “causative alleles”, they will generate gloriously more hypotheses than they test.
So, back to the other routes to systems biology, animal or cell models could be made to test the 4 million variants per genome (and combinations; oh my!) — clearly indicating a need for automated homologous recombination methods and/or prioritization of these tests using the third route to systems biology — “personal functional genomics”. Unlike the HapMap genomes, the Jimome and Craigome are not yet accompanied by extensive phenotypic trait data, nor any cell lines to do so. SNPs and CNVs that affect RNA levels have been elegantly mapped by Spielman et al. 2007 and Stranger et al 2007. Most effects map close to the transcription start sites. Assaying RNA by these standard assays or next-generation sequencing (Kim et al. 2007) from individuals enables comparisons of sum of the two allelic expression levels from the two types of homozygotes (AA & aa) and the heterozygote (Aa) in a variety of different genetic backgrounds and cell-states. In contrast, genome-wide, allele-specific RNA assays would measure the expression from each haplotype in a heterozygote under what is the most ideally identical background state arrangeable. The missing technology is one to gain access to all human tissues (since the list of volunteers for brain biopsies is short). Yet another reason that we will be watching for methods to derive pluripotent stem cells from adult human tissues. Personal functional genomics assays on such personal cell lines are likely to arrive much earlier than (indeed pave the way for) therapeutic applications.
Church GM (2005) The Personal Genome Project. Mol Syst Biol 1:2005.0030
IHGSC et al. (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931-945.
International HapMap Consortium. (2005) A haplotype map of the human genome.
Kim JB, Porreca GJ, Gorham JM, Church GM, Seidman CE, Seidman JG (2007) Polony multiplex analysis of gene expression (PMAGE) in a mouse model of hypertrophic cardiomyopathy. Science 316(5830):1481-4.
Levy et al. (2007) The Diploid Genome Sequence of an Individual Human. PLoS 5:e254.
Shendure, J, Porreca, GJ, Reppas, NB, Lin, X, McCutcheon, JP, Rosenbaum, AM, Wang, MD , Zhang, K, Mitra, RD, Church, GM (2005) Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome. Science 309(5741):1728-32.
Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG. (2007) Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet. 39(2):226-31.
Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavare S, Deloukas P, Hurles ME, Dermitzakis ET. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 2007 315(5813):848-53.