Human Variome Project Planning Meeting 25 – 29 May 2008, San Feliu de Guixols, Costa Brava, Spain
PROBLEM BEING ADDRESSED
Microcitation is a way to incentivize public data deposition by extending the practice of citing journal articles to database entries and by providing quantitative citation for every unique author.
SYSTEMS AND PLANS
A pilot project, commissioning peer refereed locus reviews as journal articles with microattribution for individual variants was introduced in a recent Editorial and was expanded upon in detail in this blog.
Each journal article should have a publicly accessible Supplementary Table 1 listing all the accessions cited in the article. The accessions must be indexed to a unique sequence indicating a nucleotide position (an ssID in NCBI) and a unique allelic state. Each string must have an author ID and a unique locator for the citing journal. Thus a citation string is formed as a list of parameters carried on a URL that resolves to the appropriate database:
(ss71650991, A, TSC2DB, doi1038/ng.123, NM_000548.2:c.138+1G>A, OMIM191100, Popfreq=ALFRED#XXX,)
used as a URL, this resolves to:https://www.ncbi.nlm.nih.gov/SNP/snp_ss.cgi?subsnp_id=71650991
even though it does not cite all of the data parameters related to that accession in dbSNP and the string also carries a parameter pointing to the accession number of population frequency information that was submitted to another database.
Microattribution can operate locally, with journals and databases each reporting quantitative citation of accessions. However, depositing the proposed Supplementary Table 1 in a central registry of cited accessions (at publication) has three great virtues. Firstly, different users can create citation counting interfaces to the same information, secondly, if the site is a proxy, it can record all microattribution (web traffic and vendor information as well as microcitation). Finally, the central site can be mined for citations associated with unique author identities and with each author’s publications and database entries.
To anticipate storage problems, parameter-rich accessions (ssID, allele, phenotype_tableID, submitter, curator, LSDB_ID, PBD_ID, ArrayExpress ID, GeneTests_ID, PharmGKB_ID, local_confidentialrecord) would be stored for frequent online access, whereas less intensively curated accessions (ssID, allele, submitter, platform) might be stored on hard disks for occasional searching.
OpenURL conventions used by publishers in the CrossRef citation system already lay out rules for constructing parameter strings to be carried upon URLs. This group is also developing a publishers’ version of author disambiguation and there are already web-wide projects that could be tapped, like OpenID.
I suggest that parameter sets be nested within existing conventions to allow committees of publishers, microattribution activists, genome annotators, and mendelian mutation curators to define and update parameter forms that work for their communities.
(citer defined)
……….(microattribution)
………………..(g,e,n,,m,e)
……………….. | | | … |
………………..(g,e,n,o,m,)
……….(microattribution)
(target defined)
COLLABORATION AND SHARING CAPACITY
Thanks to HVP, HGVS, HUGO, NCBI, EBI, UCSF, SNPedia, Genome Commons, and INSIGHT for their time and ideas. These ideas are not limited to the genome community but we have a unique indexing system in the genome and have an opportunity to demonstrate best scientific practice in accurate citation.
TOPIC SECTION
Publication, credit & incentives