Community consensus criteria

A group of researchers funded by the NIH Post-GWAS Initiative have produced a draft standards document for the cancer genomics field entitled “Principles for the post-GWAS functional characterisation of risk loci” that can be found in preprint form at Nature Precedings archive. This document is intended for publication as a Perspective in Nature Genetics. Because field-specific criteria supported by broad consensus are more likely to be useful, we have decided to invite all stakeholders to become co-authors or peer referees of this document in place of editorially supervised peer review. This is very much an experiment, some editorial judgment may be needed, or formal expert review may or may not be needed in this case.

Background on the journal’s involvement in developing field-specific standards by soliciting community consensus can be found in two Editorials, “Discussing Standards” and “On beyond GWAS”.

To become a contributor you have several options:

1) Comment using the Nature Precedings comment function.

2) Email your comments and text to the corresponding authors at snpfunction(at) whether or not you want to be considered for co-authorship.

3) Post your comments to the Nature Genetics blog freeassociation(at) letting the moderator know whether you want your comments to be attributed to you or anonymous.

4) Although our aim is consensus, substantial distinct and dissenting perspectives may need to be submitted as preprints to Nature Precedings. These can be combined into a themed Collection of preprints that will not preclude their eventual publication in journals, for example as Correspondence.

Substantial conceptual contributions and substantial contributions to the writing of the submitted document will be acknowledged with authorship in the final document if submitted before the community review deadline, December 20th 2010. If you wish to contribute to the document as a referee only, you may stipulate that you do not want co-authorship.

What were you on about again?

Editorials June 2006-May 2009

2004-6 Editorials

The Journal’s Standards

GWAS and CNV associations for neuropsychiatric studies

MIAME compliant transcriptome data

Transparent papers are cited better

Contributor ID and attribution

Incentives to make data public

The Human Variome Project 1, 2

Microattribution reviews

Corresponding authors and consortium authorship

Publisher metadata and PubMed

Advances in fields of genetics

Human methylome

Malaria genomic resources

GWAS for prostate cancer


Transcriptional networks

RNA isoforms

Agricultural genomics

CNV detection

de novo CNV


ASHG 2006

Nephrogenetics 2007



ICG2008 1, 2

Social and Political

Preserving diverse science during NIH budget cuts

US research competitiveness

EU funding for research

Arab genomics

Genetic nondiscrimination legislation

Genetics for developing nations

Encouraging creative research

Personal genomics 1, 2

Darwin’s legacy

History of the journal

Limits of scientific analogies

Extending a successful network of collaboration

My thanks to the delegates of the second Genomics of Common Diseases Conference:

Many thanks to all of you, the participants, a highly interconnected network of collaborators and competitors who form a core area of Nature Genetics’s content. You are our readers, authors and referees. That core, like the journal, is expanding internationally. Just as, when we send your research to review, we try to add a new referee alongside those experienced and calibrated experts we have relied upon to produce papers that are useful research tools; I hope that you will be able to extend your networks of excellence in a similar manner.

It is fascinating to see these collaborative networks built by the NHGRI, the Broad, and Wellcome, extend internationally to address diseases that are of local concern: malaria, HIV, tuberculosis, diabetes and cancer. As the techniques develop, these long-range collaborations can meet with success.

But, I know from lab visits in many countries that building local capacity matters a great deal to developing and emerging nations. As DNA sequencing becomes cheaper and the tools of discovery become reliable commodities, it is populations with particular characteristics and demographics that are crucial. Here I would cite the success of the Pan-Asian SNP initiative from Singapore in mobilizing the participation of regional collaborators, the new genome centers of Mexico, South Africa and several nations in the Arab world, building centers of excellence to address genetic problems of local significance. These patterns of collaboration go beyond the paradigm of long-range collaboration with established experts in developed countries to build sutainable capacity with immediate neighbors.

For a developing nation, making an essential contribution to the global understanding of our shared human genome is an important source of pride. With the new concentration on genomics come beneficial coat-tail effects: education and provision of primary health care. So, I hope you will join me in a toast to international collaboration with a commitment to capacity building, in the hope that we can make some of these common diseases less common.

Microattribution for community annotation of the human genome

Human Variome Project Planning Meeting 25 – 29 May 2008, San Feliu de Guixols, Costa Brava, Spain


Microcitation is a way to incentivize public data deposition by extending the practice of citing journal articles to database entries and by providing quantitative citation for every unique author.


A pilot project, commissioning peer refereed locus reviews as journal articles with microattribution for individual variants was introduced in a recent Editorial and was expanded upon in detail in this blog.

Each journal article should have a publicly accessible Supplementary Table 1 listing all the accessions cited in the article. The accessions must be indexed to a unique sequence indicating a nucleotide position (an ssID in NCBI) and a unique allelic state. Each string must have an author ID and a unique locator for the citing journal. Thus a citation string is formed as a list of parameters carried on a URL that resolves to the appropriate database:

(ss71650991, A, TSC2DB, doi1038/ng.123, NM_000548.2:c.138+1G>A, OMIM191100, Popfreq=ALFRED#XXX,)

used as a URL, this resolves to:

even though it does not cite all of the data parameters related to that accession in dbSNP and the string also carries a parameter pointing to the accession number of population frequency information that was submitted to another database.

Microattribution can operate locally, with journals and databases each reporting quantitative citation of accessions. However, depositing the proposed Supplementary Table 1 in a central registry of cited accessions (at publication) has three great virtues. Firstly, different users can create citation counting interfaces to the same information, secondly, if the site is a proxy, it can record all microattribution (web traffic and vendor information as well as microcitation). Finally, the central site can be mined for citations associated with unique author identities and with each author’s publications and database entries.

To anticipate storage problems, parameter-rich accessions (ssID, allele, phenotype_tableID, submitter, curator, LSDB_ID, PBD_ID, ArrayExpress ID, GeneTests_ID, PharmGKB_ID, local_confidentialrecord) would be stored for frequent online access, whereas less intensively curated accessions (ssID, allele, submitter, platform) might be stored on hard disks for occasional searching.

OpenURL conventions used by publishers in the CrossRef citation system already lay out rules for constructing parameter strings to be carried upon URLs. This group is also developing a publishers’ version of author disambiguation and there are already web-wide projects that could be tapped, like OpenID.

I suggest that parameter sets be nested within existing conventions to allow committees of publishers, microattribution activists, genome annotators, and mendelian mutation curators to define and update parameter forms that work for their communities.

(citer defined)



……………….. | | | … |



(target defined)


Thanks to HVP, HGVS, HUGO, NCBI, EBI, UCSF, SNPedia, Genome Commons, and INSIGHT for their time and ideas. These ideas are not limited to the genome community but we have a unique indexing system in the genome and have an opportunity to demonstrate best scientific practice in accurate citation.


Publication, credit & incentives

Help! I’m becoming more normal.

Association of common variants to diseases is still in a phase of rapid discovery. One immediate consequence is that the relative risk for an individual, predicted from very partial information, can change rapidly as more information is added. For example, three of the 18 risk predictions have changed in my profile on the deCODEme site since November:

Restless legs syndrome: OR 1.94>0.97, was 1, now 4 SNPs

Prostate cancer: OR 1.05>0.77, was 5, now 8 SNPs

Type 2 diabetes: OR 1.45>1.10, still 8 SNPs

In the first two cases, the new information is the association of new variants that can be added to the risk calculation. In the third case I do not know if this is a recalculation or the result of more studies on the same 8 SNPs. I’ll make a rash prediction and be prepared to be proven wrong, but I think the more common variants are added, the more instances of (individual, disease) will tend to OR=1.0. Maybe I am just indulging in the fallacy called “the law of averages”, but it is at least a conservative, testable hypothesis.

Full genotypes are now available for download, so the project is about to become interesting.

Do I need a personal trainer or a personal genetic counsellor?

David Hunter, Muin Khoury and Jeffrey Drazen’s Perspective “Letting the Genome out of the Bottle- will we get our wish?” (NEJM 2008. 358;2 105-107) goes further than we have done (below) in its skepticism of consumer genomics services like deCODEme and Navigenics. While Nature Genetics as a research journal can welcome the opportunities for public research participation and ever larger experiments and longitudinal studies, the doctors worry that they will soon be deluged with patients asking them what all this risk information means for them. The authors have some good points, but largely ignore the unpredictable motivational potential inherent in handing people their genomes and asking them to participate in finding out more about their variation and phenotypes. Sometimes, the best doctor will say “we don’t know yet, let’s find out together”. However, we do applaud efforts to build genetics into every stage of medical education and appreciate the authors making their editorial an opportunity to hammer that point.

I take some quotes from the article as the starting point for a few of my own thoughts:

“..premature attempts at popularizing genetic testing…”

Not so much premature as truly disruptive. Personal genomic testing confers a personal stake in the ongoing research effort and a huge incentive to find out more. A personal stake in finding something that was not previously know is the key to getting students into research and may well be a powerful tool to interest individuals in the details of their own health and functioning.

“transparent quality control monitoring”

It is true that the use of multiplex genotyping platforms for genetic epidemiology gains some buffering from large numbers of subjects and replication. At the individual level, it is hard to check for a single miscalled SNP, so it might be desirable to build some redundancy into the genotyping using haplotype information. For the genetics enthusiast, it would be reassuring to see the SNP calls in cluster plot format together with those of other anonymized participants.

“clinical validity…predictive value…the area in which the data are in the greatest flux.”

This is true for conventional exposures and markers too. Physicians no longer prescribe smoking for its “health benefits”, at least not to their patients. Blood pressure and BMI limits now elicit more vigorous treatment from physicians as better information has been gained. It is to be expected that an individual’s risk profile will change with each new study since new variants will be discovered that moderate or exacerbate their individual risk.

“full accounting of disease susceptibility awaits the identification of these multiple variants and their interaction in well-designed studies.”

I don’t think more retrospective studies will entirely solve this problem. In the meantime, individual genomics might be a great recruiting tool for a longitudinal study. Deliver a detailed genotype to the participants right at the beginning.

“assumption that interventions that have proven successful in the general population will behave the same way in a genetically at-risk population.”

Optimizing generally applicable interventions with the enthusiastic participation of the research subjects themselves may be the best way forward.

“interventions – such as smoking cessation, weight loss, increased physical activity and control of blood pressure – are likely to be broadly beneficial in relation to many diseases, regardless of a person’s genetic susceptibility to a specific disease.”

And we pay a physician for this information……why?

It may take years of personal experimentation with different drugs, doses and dose regimens to achieve a balance of blood pressure control and side effects. Would it not be better to have a few clues as to who is likely to achieve blood pressure control via breathing exercises or salt reduction, who by exercise and weight reduction, and who actually needs the beta blockers? OK, gene tests don’t help at the moment with ACE inhibitors or diuretic dosing, but they could when your atherosclerosis gets to the thrombosis stage and are deciding how much rat poison you want to eat.

“patients who test negative may be falsely reassured”

I don’t think I am “falsely reassured” by normal cholesterol, kidney function and blood pressure readings. Individuals have a remarkable ability to use information in their own self interest, indeed to integrate family history and the evidence of their own eyes with the gene test information. For a stunningly candid example of a family affected by Alzheimer disease without the major genetic risk factor, read the Tangled Neuron.

“but a detailed consumer report may be beyond most physicians’ skill sets.”

It should not be beyond most physicians’ ability to explain the quantitative risks conferred by – and the research underlying – the predictors they currently use: BMI, cholesterol, blood pressure, age and sex. A physician should also be able to relate quantitative risks conferred by a family history containing one or more affected relative. A physician should also be able to advise the patient whether or not to participate in research tests such as that for eg. C-reactive protein, with reasons not including “the insurer doesn’t cover that” or “taking additional tests will remain on your insurance record”.

“More information is needed…potential value that genomic profiles can add to that of simpler tools, such as family health history.”

Genomic profiles of families will be much more informative to individuals than their comparison with epidemiological studies. For many, the personal genomic profile is the stimulus to explore family history. Discussion of personal genetic risk factors may act to release family medical history kept private because it had no perceived use.

“encourage them to enroll in formal scientific studies.”

Maybe participation in personal genomics would have this desirable effect. With no personal stake and no current health problems, it may not occur to many individuals that their participation would be welcomed by scientists and the public alike. It would be interesting to compare the general health and health awareness of eg. Framingham Heart Study participants with nonparticipants, I just haven’t had the time to read any such sociology.

“genomic services will galvanize…translational research for the rational integration of genomic information into medical training and practice.”

Here the authors get to the heart of their worries: this is a disruptive technology that has caught doctors in the middle of their effort to bring genetics to its proper place in medical education and they might get deluged with questions they aren’t yet ready to answer. For more on the change of emphasis in medical education toward genetics and the individual, see David Valle’s article referenced below.

“better off spending their money on gym membership or a personal trainer….follow a diet and exercise regimen that we know will decrease his risk of heart disease and diabetes.”

Why not both? Different people have different motivation. Socially body conscious individuals may prefer the gym, the introverted but vain may prefer the attention of the trainer. Intellectually motivated people may choose their sport, or only exercise out of curiosity to see whether they can fulfil the promise of their “elite athlete” SNPs.

Now, some of my favorite quotes from “A Science of the Individual: Implications for a Medical School Curriculum” Childs, Wiener and Valle. Ann Rev. Genomics Hum Genet 2005. 6;313-330.

“Although hypertension is an elevation in systemic blood pressure, each patient reaches that phenotype by different paths, each determined by a unique combination of genetic makeup and experiences of the environment…..Individuality is also apparent in the treatment of hypertension.”

“contradiction between the singularity of the patient and the generality of treatments and prevention.”

“a tradition of typological thinking about biology and disease that does not accommodate variation.”

Finally, highlights from the “Risky Business” Editorial by my colleague, Alan Packer (Nat Genet. 2007. 39;12 1415).

“The need for both physicians and their patients to be better educated about complex genetics has taken on added urgency of late.”

“With the possible exception of age-related macular degeneration, how much can we say with confidence about the spectrum of risk?”

Variants with low relative risk make poor classifiers. This point was made by Alan, and by the NEJM authors. So, individual genomics may not induce anyone to take a clinical test until the list of risk variants adds up the point where it can identify some particularly unlucky individuals. In the meantime, it will have informed thousands by providing a personal stake in one of the most exciting areas of medical research and it may recruit enthusiastic participants in a massive longitudinal study that they will have funded partially from their own pockets.

That being said, they are participants, not patients, and the experiment will be conducted on their own terms!

Adventures in personal genomicsland


I recall a joke that probably plenty of folks have told; I heard it

from Francis Collins, the head of NIH’s Genome Project.

A previously-married woman heads to bed for the first time with her

new beau, and to his surprise, she admits to being a virgin. When he

wonders why, she says, “Well, I was married to a genome biologist, and

every night, he just sat in bed and talked about how great our sex

life would be someday.”

The Genetic Genealogist

myDNAchoice – Are Your Surfing Habits the Result of Your Genome?

The Gene Sherpa

One of these companies will get sued

The Personal Genome

For example, I share parts of my Y-chromosome with my father (I didn’t

ask his permission to post parts of it online it either).


Googling around, I found that the APOE gene on chromosome 19 is of

particular interest, specifically APOE e2, e3 and e4. In the Genome

Explorer, I can type in APOE, and it takes me to a listing of 19 SNPs

on the APOE gene. Ok, great. But I have no idea which one(s) of those

SNPs are the ones we’re talking about and what the mutations are.

Without this last bit, the Genome Explorer is basically meaningless.

Everything you need to last you two lifetimes

I’m roadtesting personal genomics services, starting with deCODEme, since many of the genotype-phenotype associations they report have been published in a reputable journal. I am the guinea pig and here are the ground rules: I will reveal everything I believe to be useful to future research. If that seems too coy, please comment and I will answer truthfully. I reserve the right to move the detailed discussion elsewhere, since space in Nature Genetics is limited, even in the blog (thanks for the space, Alan, apologies if this seems TMI).

These services offer the opportunity for real people to participate in research and to address for the very first time the question, “I have this genotype, what will happen to me?”. The tests offered are not clinical tests, so insurers, employers, physicians and family, please comment as fellow research participants and don’t try to make more of these information services than they purport to be. By real people, I mean individuals with their own responses and interpretations of the research as it affects them, rather than the anonymized people genetic epidemiology uses to make its predictions.

The first figure below shows my thoughts on the subject before I started to look at the results. My initial impression was that I was not going to pay attention to SNPs that on published precedent suggest my lifetime risk of any condition is less than 30%. I guessed I would research any biological hypothesis in the 30-60% range and possibly seek a clinically approved genetic test and medical advice for any genetic prediction of elevated risk over 60%. Given the predictable response of my fellow commuters to eg. seat belts, anti-lock brakes and airbags, I feared I might compensate behaviorally if I got a hint of protective alleles (eg. ADH2*2, CCL3L1). Impressed by Andrew Niccols’ prescient (if insufficiently palindromic) GATTACA I assumed that there might even be SNPs that would convince Uma Thurman to have my babies.


The first unexpected problem was to identify myself. Since the website is very new and I don’t have the raw Illumina SNP calls or any population samples with which to examine the cluster plots myself, I can’t verify the raw data. Even if I could do so, I have little but consistency with other genotyping services to ensure I am looking at my own genes. From Genographic, I know I have Cambridge mitochondria (H, 16188G, 16311C, 16519C) and a R1b1c Y chromosome and from DNAPrint Genomics’s “proprietary AIMs”, I know only that I am mostly of European ancestry (which luckily tallies with the origins of my great great great grandparents: 12 German, 9 English, 5 Dutch, 2 Irish, 2 Welsh, 1 Swiss, 1 Scottish). Thanks to Daniel Gubjartsson’s recent paper, I also know I am quite likely to have brown hair and brown eyes. The main problem, apart from lack of SNP data to search on my own via Greg Lennon and Michael Cariaso’s wonderful SNPedia site and the underlying literature, is the problem of self recognition.

Individual taste preferences might provide a solution, so I suggested the self-recognition problem might be solved via an olfactory SNP-social-network-wine-club. Another consumer genomics company does report on “taste-related” alleles but they haven’t invited me to the party….. I guess the Icelanders are still running the gas chromatograph on the 70cl of wine the last foreign visitor brought into the country in their duty-free allowance.

Using the information at my disposal, I first plotted my risk ratios against the prevalence of the conditions.


The rheumatoid arthritis results caught my attention since the combination of major and minor contributors and mix of risk and protection alleles pretty typical of the other common diseases for which more than one locus has been implicated. Here are the actual results:


I next tried plotting my risk against the population mean risk, as in my sketch above. A decade of research reveals a several things that I couldn’t have predicted on the back of an index card. My zone of solidarity is – of course- a cone of solidarity, since the variance of the risk increases with the risk.


Seen this way, SNPs associated with larger risks of rarer conditions fall into perspective. So what can I offer Uma? It goes without saying that I wouldn’t kick her out of bed for eating cookies. In the very unlikely event that I were to do so, I would draw her attention to my elevated risk of “restless leg syndrome”. I would not be eating the cookies myself, because of the theoretical worry of type 2 diabetes. Observing a BMI of 27 from a safe distance, she would not be particularly convinced by my predicted genetic chance of resisting the onset of middle-age spread.