“I haven’t got time to read all this, what are you on about?”
Providing incentive for global community annotation of the human genome. Giving database accessions the same citation conventions and indices that journal articles currently enjoy, so that genotype-phenotype-frequency annotations are counted as genotype-author-phenotype-frequency annotations. Microattributions should be complemented by high level peer-refereed reviews and a convenient browser interface. Annotators should form a social network that displays their publications, microattributions, affiliations and credentials. Vendors can sustain free access by sponsoring particular sets of content (eg. loci, pathways, probesets) and via annotator endorsements. These models might be appropriate for a Variome browser, Variome reviews and a network of human genome annotators, HUGONet.
“Who are you anyway?”
We are the world, we are the children. Seriously though, there are a lot of clever, highly motivated people who weren’t around when the human genome project went down and who would like to get involved, perhaps by adopting a neglected region of the genome of local interest. There are also a lot of expert old lags hanging onto lovingly curated real pathogenic human mutation data collected in the pregenomic era that needs to be brought out to create a framework of human variation before the trace archive tsunami hits. Plenty of puzzled physicians who want to know which variants are disease-associated too.
“Are you trying to build your own database?”
No databases will be harmed in the making of the Variome project. We are building a filtering and highlighting site that credits databases and has a feed (Variome track) to Ensembl and UCSC browsers. The Variome server will database only the microattribution counter, the information required to mark variant ssID and annotations from NCBI as peer reviewed or not, it will have the ability to mark a region between genome coordinates as open, under review or published, and it will have a microattribution wiki for community comments. The MTS peer review database can be left up to commissioning journals.
“Won’t Ensembl just pick all the information directly from the LSDB and NCBI?”
Maybe, but even if intensive curation is enough to get the information right, is ending up on a browser track giving enough credit to the data producers to induce them to participate in the data transfer, checking and indexing process? Why not offer them microattribution and a publication as well?
“Why not just use the UCSC wiki browser?”
We would if it had a graphic track showing quantitative microattribution in published literature and tally of wiki comments loaded onto each ssID, links to the appropriate database entries, and the ability to close commenting across a region, and to mark a region as published. I think we would still need a server on which to store a local copy as the annotation committee (authors, editor and referees) worked on the regions to publish.
“But surely, all that is required is for variants to be submitted to NCBI and they will appear on the genome browser”
This has not happened. LSDBs are holding a large number of well-annotated variants (representing years of work and many grants) that have not been moved to a common indexing system. There is no pipeline from the clinical labs and we are about to be deluged with resequencing deposited in sequence trace archives with little human annotation.
“Why doesn’t NCBI just provide citation statistics for every ssID, author handle and annotation?”
They should. The microattribution concept means that every database entry should not only provide the links to, but should provide a current count of its forward citations in papers and database entries. If NCBI did this, we would still need the high level reviews to ensure locus annotation quality and give high level priority to the annotator community.
“Why not just provide browser feeds from the LSDBs?”
LSDB server capacity and maintenance is variable, the databases are in a variety of formats and the variants are not indexed to genome, duplication of NCBI on a small scale is inefficient, LSDB curators lack the resources to convert their data.
“Why is it not sufficient to use existing wiki (SNPedia) or wiki browser (UCSC)?”
Wiki comments, even from an approved annotator community, will vary in credibility and detail and even if in theory they are scrutinized by readers, it is not certain that they will be corrected unless there is a high level incentive (the Variome Review) to encourage correction.
“Why do we need a wiki at all, since anyone can submit variant annotations to NCBI curators?”
The activation barrier to wiki commenting is very low and because the results can be seen immediately, alert students, as well as experienced researchers, can readily make a big difference.
Curators need to sleep, whereas the wiki is automated and can take comments from all time zones.
“Why peer review? Surely the data producers together with NCBI curators can provide authoritative variant reports indexed to the genome?”
This works for the variant report alone, but doesn’t provide consistent quality control or scrutiny across the locus. David Ravine’s review of PKD1 variants (Nature Genetics April 2007 -which is the model for a Variome Review journal article) revealed that 5% of the published variants were wrong. Community scrutiny under a uniform, editorially controlled process not only provides high level reviews to bring credit to the data producers, but highlights which datasets can be used with confidence.
“Is there a better way to view genotype-author-phenotype compound statements and their associated accession numbers than a microattribution wiki browser or a database table?
Undoubtedly. If you have built one, we will use it.
“I have a grant to build/have built/have thought of and am going to build/a better solution and/or I am a hugely influential funding body and we have a plan that doesn’t include you so you are wasting your time.”
Fine, we are providing only what is missing: respect and credit for database builders, curators, authors, data producers, cogs in big teams, funding bodies, research participants (with their permission) and yes, eventually even journals and editors. Maybe you’d like to be appreciated too?
“Microattribution is not even a new idea, it is obvious.”
Yes, isn’t it.
“Isn’t Gogol going to index teh everything anyway?”
Maybe, but the ssID is a pretty cool way to index data (thanks, Donna!) and wouldn’t you rather trust your peers to evaluate genome variants in the first instance, so that the plex will develop their genome tools in ways we will be able to use?
“Won’t your filter become redundant if all NCBI entries are correct and journals provide quantitative attribution via a whizzy interface?”
Yes, I look forward to this day. If this process is catalytic and every database and journal provides microattribution credit, the Variome Browser filter will have served its purpose.
“Why not organize the HVP along the lines of the DECIPHER network?”
The problems are fundamentally different. DECIPHER exists to catalog clinical importance of structural genome variation, starting from scratch. For mendelian mutations, existing annotator communities have done much of the work, but now need credit for reformatting and indexing their variation collections to the genome and scrutinizing the annotations, locus by locus.
These are databases without the intention to create systematic review and journal articles.
None provides quantitative citation credit, although they reference the original sources of the variants.
“Aren’t you just duplicating the effort of HUGENET ?”
HUGENET provides disease-centered meta-analysis of genetic epidemiology, Variome integrates findings of rare variants in rare mendelian diseases, rare variants in common diseases and common variants in common diseases.
“How will you distinguish peer reviewed from entries that were curated but not reviewed?”
We make no distinction at the browser level between curated entries and those added by wiki, only between reviewed and non-reviewed information. The source will be evident upon clicking the link, since these lead to NCBI and to the locally databased wiki respectively.
NCBI entries and wiki comments will be visible on the Variome browser in gray, the information that survived collaborative annotation and peer review will be in black.
“Surely the peer review process will entail rewriting and correcting NCBI entries, rather than merely filtering out those that are wrong?”
Irretrievably wrong or unattributable entries will need to be excluded. If the original data producers are unwilling to reannotate them correctly, new entries will be made. If the original data are largely correct, an annotation (corrigendum) will be appended to the existing NCBI entry.
“If competing journals have commissioned reviews on different regions of the genome, how can we distinguish one annotation group from another (some authors will overlap). “
This happens at the moment with papers. Editors, referees and authors behave remarkably ethically. Authors can be added at the editor and senior author’s discretion. With microattribution, if an author on a collaborative annotation makes less than one annotation between the locus genome coordinates, their author status may be questioned.
“Won’t this confuse journal impact factors? What microcitation measure is the right one?”
Professional bibliometrists, get to work! Since when was more information a bad thing? The existence of the $20 bill does not do away with a need for $1 bills.
“How will authors know if they are citing the right entity?”
This problem exists already in journal articles. Authors will cite a GEO GPL platform accession number rather than the appropriate experiment accession (GSE number). As journals become more database-like, the database part of the article will reference the correct compound (ssID, author, phenotype, frequency) to support the assertion made (this variant is always associated with this disease).
These objectives are on the HVP wish list for Informatics but outside the remit of Publication and Credit. The physician community will probably want to design a pipeline from the clinical sequencing labs to NCBI and a disease-oriented browser interface that starts with the variants for which clincal tests are available, then moves on to a list of variants sorted by whether they are pathogenic or not. These projects will be enabled by – but are are outside the immediate scope of – the microcitation proposal.
“Will other journals provide microcitation statistics?”
Hurry! Nature Genetics already does and we are now looking for better ways to display and use the information (data producer index, annotator index, collaboration index, mentorship index).
“What else could HUGONet members put on their face page?”
If you were the referee who provided the decisive experiment for a highly cited Nature paper, you might like want to reveal that fact (after publication) and post your review on your face page, even if you didn’t want your comments publicly on the Nature web site? If you were sparsely linked within the network, but had a postdoc position you needed to fill, why not pay HUGO to link you up for a month to the highest priority part of the network (or to everyone within the network), HUGO needs money and you need a postdoc.
“What about genes where tests for variants have been patented?”
Companies holding a lot of patents on particular genome regions (eg. BRCA1) might be motivated to adopt that gene within the Variome system, providing information, advertising and funding. I see no reason why their annotations should not be subjected to peer review in the usual way, since journals are happy to publish corporate research so long as methods are transparent and materials are available.
“What kinds of annotation will be counted?”
Ideall all, but the intention was to highlight all forms of annotation that require human effort, so the citation counter highlights papers and reviews separately from annotations with phenotypic information and annotations without phenotypic information. The compound (ssID, author, phenotype, frequency) has more value than the compound (ssID, author, vendor, probe) or the fundamental microcitation particle (ssID, author) even if it is cited less often.
For example, the SNP Consortium will be recognized for its discovery of a SNP every time a user cites a ssID or its rsID synonym. Affymetrix will get a microcitation for every paper that lists a probe used in high throughput genotyping. A mendelian geneticist will be interested in a single clinical report (dbGAP entry) coupled to a sequence report (ssID) and frequency (NCBI annotation ID).
“What happens when the journals are no longer interested in publishing reviews?”
Some loci receive more interest than others. Annotations of genes with many mutations and phenotypes may comprise publications on their own, while others may be annotated mutations affecting a pathway or process. Annotation of neglected regions will accrue progressively more attention via the microattribution process and will be adopted by interested groups for publication.
In the spirit of working with all interested parties, it is not anticipated that Variome will launch a journal to house the reviews, but this is an option to discuss.