Main

Archive by category: Data availability

Bookmark in Connotea

Nature Cell Biology joins call for microattribution of datasets

Nature Cell Biology (11,1273; 2009) joins in the call for 'microattribution' in its November Editorial, stating that reference datasets should be accessible independently of scientific papers in a citable form. The problem, from a cell biological perspective:

"Scholarly publication remains essential for describing and contextualizing findings, but it is inadequate as the only document of research activity. Most journals require a significant conceptual advance, and format constraints typically allow only for the presentation of representative qualitative, or statistically processed quantitative data. Consequently, the majority of raw data never emerges from lab hard drives, and a wealth of information, hard work and funding is wasted. High throughput platforms generate reams of data that cannot be captured in traditional papers. Moreover, methods sections fail to adequately describe metadata essential for the comparison and reproduction of experiments. Databases are essential for comprehensively archiving both published and unpublished data, but have only become fully integrated into the scientific process in a few cases, such as DNA sequencing and microarray data. For many types of data, including light microscopy, no databases exist at all. "

Prepublication deposition into databases is relatively new to biology, but is essential, according to the Editorial, whether or not some embargo condition is imposed by authors, funders or publishers. Journals, in their turn, need to systematically link online to data and other material in databases, in order to remain relevant. The Editorial concludes that "Large reference datasets that benefit the wider community and that cannot be analysed efficiently by the data producers should enter the public domain without delay, as long as appropriate attribution and credit can and is given. Scientific culture has to change so that data is valued alongside publications."

See also: 'Accreditation and attribution in data sharing' by Gudmundur A. Thorisson of the Department of Genetics, Leicester, UK (Correspondence to Nature Biotechnology 27, 984-985; November 2009).

Bookmark in Connotea

Data producers deserve citation credit, says Nature Genetics

Datasets released to public databases in advance of (or with) research publications should be given digital object identifiers to allow databases and journals to give quantitative citation credit to the data producers and curators, according to the October Editorial of Nature Genetics (41, 1045; 2009) .
After reviewing the arguments for assigning a citable credit to data, particularly those which are released publicly before formal publication in a journal, as is increasingly the case in some fields (and required by some funders), the Editorial asks: "What form should citable data identifiers take? They must work with existing unique resource identifier conventions and with the existing well-funded stable repositories used by research communities. However, these identifiers are not just for locating data but are for stably identifying the data units and versions with particular data producers, curators, funders and affiliations in a citable form. Because publications are currently the main source of scientific credit and because publishers have already developed citable digital object identifiers (DOI), it would seem to be their opportunity to grasp or to fumble. We propose citing DOIs that tag a combination of repository, database, accession, version, contributor and funder.
Of course, precise citation of all research output represents the bare minimum of respect for colleagues and competitors. This journal also endorses communication between data producers and data users. Whereas it is impossible for journals to restrict the use of data already in the public domain, we can show evidence of communication between producers and users to referees. Many funders of large resource projects now require a data release policy and plan for global analysis by the data producers. These parts of the successfully refereed grant should be published as a 'marker paper' or deposited in a citable preprint archive such as Nature Precedings. At very least, the details of the producers' work and intents should be available to users in a citable form in the database holding the data. Data users can submit an email demonstrating that they have contacted the data producers with their plan for use of the data and showing that they have read the producers' data release policy, conditions and plan for analysis."

Please see also the continuing Nature Network online discussions about pre-publication and post-publication data release. We welcome your views there.

Bookmark in Connotea

Nature Biotechnology: Personal genome data on the line

Continuing the theme of yesterday's post about data sharing, Nature Biotechnology is running an Editorial this month (Nature Biotech. 27, 777; 2009), 'DNA confidential', pointing out that as "the cost of human genome sequencing plunges and large-scale genome-phenotype studies become possible, society should do more to reward those individuals who choose to disclose their data, despite the risks". The Editorial continues:

"The genome sequence of Patient Zero is disclosed on p. 847 of this issue. The paper is notable not only because it provides the first description of the performance of a single-molecule platform in sequencing a human genome (90% of it, at least), but also because Stanford professor Stephen Quake (aka Patient Zero) opted to tell the world that it was his DNA that had been sequenced. Like scientific pioneers before him, Quake is heroically self-experimenting—testing the risks in publishing identifiable personal information of the most intimate kind."

The Editorial goes on to weigh up some of the risks and benefits to an individual and to society at large if people's genome sequences are generally available, covering healthcare, privacy issues and costs, concluding that "There will be some individuals, like Steve Quake, who will provide samples and data without an incentive; but when it comes to exploring the basis of being human and moving toward the goal of genomic medicine, society needs to do more to provide personal incentives to those who choose to disclose their data, despite the risks. After all, everybody will ultimately benefit—both those who share and those who choose not to."


Single-molecule sequencing of an individual human genome
Dmitry Pushkarev, Norma F Neff & Stephen R Quake
Nature Biotechnology 27, 847-850; 2009. doi:10.1038/nbt.1561

Bookmark in Connotea

Data sharing discussed at Nature and Nature Network

Sharing data is good. But sharing your own data? That can get complicated. As two research communities who held meetings on this question in Rome and in Toronto in May report their proposals to promote data sharing in biology, a special issue of Nature (10 September 2009) examines the cultural and technical hurdles that can get in the way of good intentions. Some of the authors of these proposals are participating in two online forums (Rome and Toronto) at Nature Network - so please accept our invitation to visit and have your say on these questions.
More details:
The two research communities held meetings with a broad range of stakeholders to discuss ways to promote data sharing in biology. Data producers and users met at a workshop in Toronto to discuss the benefits and best practices of rapid data release prior to publication. Ewan Birney, Tom Hudson and colleagues report the main conclusions of these discussions in a community statement, free to access here.
The Toronto group propose that the principles for early release of genomics data should be extended to other large datasets in biology and medicine. A grace period should be allowed, if requested, to enable data producers to analyse and publish their dataset, but this should be limited to one year. The authors also suggest a set of best practices for funding agencies, scientists and journal editors.
The recommendations are intended to spark community discussion on this subject. Ewan Birney, Tom Hudson and others will be responding to reader comments in our Nature Network forum. Be sure to have your say.
Mouse researchers, along with funding agencies and publishers, met in Rome to address the barriers preventing more effective sharing of data and biomaterials — particularly mouse strains and embryonic stem cells. Their agenda, free to access here, suggests guidelines to enable sharing of materials under the least restrictive terms, avoiding material transfer agreements where possible.
The Rome participants argue that funding organizations, journals and researchers need to work together to encourage better use of public repositories and to promote a ‘research commons’ in mouse biology.
The recommendations are intended to spark community discussion on this subject. Paul Schofield and others will be responding to reader comments in our Nature Network forum. Be sure to have your say.
See also the Editorial (free to access online) in the same issue of Nature (461, 145; 2009): 'Data's shameful neglect', opining that research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.
Nature's special issue on data sharing.

Bookmark in Connotea

Nature Cell Biology on research integrity and accessibility

The cell biology literature contains manipulated data that distort findings, usually in an attempt to 'beautify' and, rarely, to commit fraud, states the September Editorial in Nature Cell Biology (11, 1045; 2009, free to read online) According to the Editorial, a National Academy of Sciences (NAS) report, 'Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age', "arrives at no hard and fast rules; the panel found that different fields have quite different requirements. In the words of panel chairs Phillip Sharp and Daniel Kleppner, "the report provides a framework for dealing with the challenges to the community generated by the onrush of digital technology." Nevertheless, the key tenets that researchers are responsible for ensuring the integrity and accuracy of their data and appropriate training in the management of research data, that all data and experimental details from papers be publicly accessible and carefully archived to allow verification and to facilitate future discoveries, and that field-specific standards have to be developed by researchers, funders, societies and journals, benefit from being spelled out in one document."
Many of the recommendations in the report already are the policy of Nature Cell Biology and the other Nature journals: the Editorial provides further information about these, including references to past Editorials, with particular emphasis on various aspects of data manipulation and plagiarism -- which, although widely unrealised, extends to concepts as well as to copying text and illustrations.

Bookmark in Connotea

Metagenomics analysed at Nature Methods

Metagenomics sprang from advances in sequencing technology, and continued improvements are providing data in quantities unimaginable a few years ago. But without concerted efforts, the amount of data will quickly outpace the ability of scientists to analyse it. The September Editorial of Nature Methods (6, 623; 2009), 'Metagenomics versus Moore's law' draws attention to some articles in the same issue of the journal that illustrate some of the dangers and problems, as well as the solutions that are being sought.
Three years ago, the Editorial continues, the first two second-generation metagenomes were reported at less than 40 megabases each. Now, there are more than 4,000 sequenced metagenomes that would take years or tens of years to analyse (depending on the processing power used). Major initiatives are needed to avoid metagenome-analysis gridlock: according to the Editorial, funding agencies need to increase support for data analysis; and the community needs to improve data-sharing through standards and centralized coordination and by aggregating computationally intensive operations. The conclusion:
"This summer, after discussions at the International Conference on Systems for Intelligent Molecular Biology, community members formed the M5 (metagenomics, metadata, metaanalysis, multiscale-models and metainfrastructure) Consortium under the roof of the Genomics Standards Consortium to devise a solution to the coming gridlock. Their proposed 'M5 Platform'—to be announced later this year—deserves the support of the community, funding agencies and those who hold the keys to the high-performance computing centers. Unless major efforts are taken immediately, researchers will find they have a wealth of data but no way to interpret it."
Readers' comments and discussion of this Editorial are welcomed at Methagora, the Nature Methods blog.


Bookmark in Connotea

No restrictions on tissue distribution

The distribution of human cell lines used in research should not be hindered by restrictions from donors, states an Editorial in Nature last week (460, 933; 2009 ; free to access online). The occasion of the Editorial is a Corrigendum relating to a paper published in the journal last year ('Generation of pluripotent stem cells from adult human testis' by
S. Conrad et al., Nature 456, 344-349; 2009). In the Corrigendum, the authors explain how the original patient consent forms to collect the material used to derive the pluripotent stem cells precluded distribution to third parties under the regulations of the relevant hospital ethics committee. (The authors also explain that they are going to cultivate new cells, under different terms of consent, which they can then distribute upon request.)
As the Editorial points out, failures to distribute cell lines are incompatible with Nature journal policies and with the efficient progression of scientific knowledge. The Corrigendum alerts investigators to this situation and the steps being taken to rectify it. Even when clinicians, researchers and their local ethics board follow internal procedures that promote both donor safety and medical research, serious problems can arise regarding the unhindered distribution of samples.
Here is a slightly shortened version of the rest of the Editorial:
The community was not that surprised by this situation — six of seven researchers contacted by Nature thought this could happen again. Researchers developing cell lines must investigate the restrictions associated with the human tissue they are using, particularly if someone else collected the samples, if the samples come from multiple clinical sources or if they come from several legal jurisdictions. If a scientist needs to create cell lines that might be used for as-yet-unforeseen purposes, only tissue with no restrictions should be used.
Journals can remind authors in their policy guidelines that authors of submissions that involve consent forms must make editors aware of any limits that result from those forms. The Nature journals will be revising their policies to make this clearer.
Most importantly, patients, researchers, clinicians, and review and ethics boards worldwide need to agree on conventions that are acceptable to most parties under most circumstances. Internationally standardized consent forms for the donation of human tissue should cover new uses, genomic comparisons, patents and product development, and should discourage limiting access or lifespan.
Ethics and review boards are set up to protect individuals, but can also go much further to promote research. No one can deny that donors need to understand the risks and benefits of a procedure, trial or donation. However, it seems most ethically responsible, given the value of research, for the boards to explain the consequences that restricted access and time limits can have on the value of a donor's tissue.

Bookmark in Connotea

Not always so simple to share mouse strains

This is the text of a Correspondence by Richard Behringer of the University of Texas, published in the current issue of Nature (460, 324; 16 July 2009).
I was disappointed by the view expressed in your Editorial 'The sharing principle' (Nature 459, 752; 2009 - free to read online) that the mouse community does not share its strains. This is untrue. Most labs are very collegial, spending a considerable amount of time and effort on distributing their mouse strains. Although there are a few labs that withhold distribution, any community may contain such individuals. The fact that a mouse strain is not found in a repository does not mean that it is not being shared.
I was also puzzled by the conclusion of May's CASIMIR workshop, noted in the Editorial, that "the sharing problem urgently needs resolution" with regard to international mouse gene-knockout projects. Such mutant alleles will mostly be archived as embryonic stem-cell lines. Readers should also realize that repositories cannot keep all their mouse strains live 'on the shelf': most strains are frozen. The cost for a user to have a strain thawed is thousands of dollars and it takes many months before the recovered mice become available. This is a big disadvantage for labs on tight budgets. With regard to funding agencies: in grant proposals to the US National Institutes of Health, for example, applicants are required to write a 'resource-sharing plan' that includes genetically modified mice.
It was suggested that sharing avoids duplication of effort. But it is essential that more than one group generates mutations in the same gene as a crosscheck. No two labs generate the same allele, and every geneticist knows that the expression of different alleles can lead to very distinct phenotypes.
Your claim that sharing mice "has never been easier" is questionable, considering all the paperwork, health certificates, veterinary screens, special serology screens, costs, time and logistics involved. This is quite different from uploading DNA sequences in the comfort of your office.
It would be great if funding agencies supplemented grants involving the generation of mouse strains to cover the costs of sending the strains to a repository. In these tough financial times, that seems unlikely.

Department of Genetics, University of Texas, M. D. Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, Texas 77030, USA.

Nature journals' policy on data and materials availability.

Bookmark in Connotea

Nature Biotechnology calls for better data-sharing practices

A universal tagging system that links data sets with the author(s) that generated them is essential to promote data sharing within the proteomics and other research communities. The July Editorial in Nature Biotechology (27, 579; 2009) reports the results of the journal's survey of author compliance in depositing proteomics and molecular-interaction data underlying the papers they published. The editors found that even authors who are proponents of data deposition are not making data available in all of the papers they publish. Inhibitory factors include data quality and the user-unfriendliness of some databases. The Editorial concludes:
"One option would be to provide researchers who release data to public repositories with a means of accreditation. This would take the form of a universally standardized tag for data that could be searched and recognized by both funding agencies and employers. An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. In essence, the tag would be a digital object identifier (DOI), currently best known for its use in unambiguously identifying papers online.
Similar to citation information about publications, citation information about a researcher's data DOIs could be gathered by funders assessing future support and used by institutions in performance evaluation. Researchers who disclose data sets that subsequently prove particularly useful to the community would end up with highly cited data DOIs, and could thereby be rewarded accordingly.
Such a system would not solve all the problems slowing data disclosure in proteomics and elsewhere. But it would provide greater incentive than the present system of evaluation, which is skewed almost exclusively to publications in high-profile journals and citation metrics. Data DOIs would not only enhance a researcher's reputation but also establish priority of data generation. Most important of all, they would provide a way to acknowledge the time and effort individuals must invest in sharing data, which ultimately benefits the scientific community as a whole."

See also a Correspondence in the same issue of Nature Biotechnology (27, 597-598; 2009): PRIDE Converter: making proteomics data-sharing easy, by Harald Barsnes, Juan Antonio Vizcaíno, Ingvar Eidhammer and Lennart Martens, a collaboration between the University of Bergen and the European Bioinformatics Institute.

Nature journal policies on data and materials availability.

Bookmark in Connotea

US scientist jailed for sharing sensitive data

From Nature News (Nature 460, 163; 8 July 2009):
A former University of Tennessee professor has been sentenced to four years in prison for sharing sensitive technologies with his Chinese and Iranian graduate students.
J. Reece Roth, an emeritus professor of electrical engineering, was sentenced on 1 July by a Tennessee district court for violating the Arms Export Control Act. He had been developing ways to reduce the drag on unmanned planes, and employed two research assistants without obtaining the required licence (see Nature 442, 232–233; 2006). Roth plans to appeal the verdict.
In a separate case, a Chinese-born scientist who has lived in the United States for 23 years is suing the US government for rights violations for expelling him last year from the NASA Ames Research Center, California.
Haiping Su, a US citizen who received his doctorate in 1991 from Kansas State University in Manhattan, alleged in a case filed on 24 June in a San Jose federal court that a 2007 security badge-issuing process led to his illegal ousting.
Su was working on airborne systems for imaging forests. His attorneys say he had no involvement with classified material.

Bookmark in Connotea

Chemical biologists could help accelerate drug discovery

This month's (July) Nature Chemical Biology includes two articles describing how access to the highest quality chemical probes will ensure their prominent position in the biological and drug discovery toolboxes.
Aled M Edwards, Chas Bountra, David J Kerr and Timothy M Willson, in their Commentary (Nature Chemical Biology 5, 436 - 440; 2009) Open access chemical and clinical probes to support drug discovery, say that drug discovery resources in academia and industry are not used efficiently, to the detriment of industry and society. Duplication could be reduced and productivity increased, they write, by performing basic biology and clinical proofs of concept within open access industry-academia partnerships. Chemical biologists could play a central role in this effort.
The authors' main argument is that the development of new medicines is being hindered by the way in which academia and industry advance innovative targets. By generating freely available chemical and clinical probes and performing open-access science, the overall system will produce a wider range of clinically validated targets for the same total resource, arguably the most effective way to spur the development of treatments for unmet needs.
In a related article in the same issue of the journal, 'A crowdsourcing evaluation of the NIH chemical probes', Tudor I. Opera et al. (Nature Chemical Biology 5, 441-447; 2009) write that between 2004 and 2008, the US National Institutes of Health Molecular Libraries and Imaging initiative pilot phase funded 10 high-throughput screening centres, resulting in the deposition of 691 assays into PubChem and the nomination of 64 chemical probes. The authors 'crowdsourced' the Molecular Libraries and Imaging initiative output to 11 experts, who expressed medium or high levels of confidence in 48 of these 64 probes. Crowdsourcing is a cross-disciplinary alternative way to assess confidence for both chemical probes and drug leads: it pools multiple levels of expertise from translational disciplines, providing a rigorous chemical-probe evaluation process.

Nature Chemical Biology website.
Nature Chemical Biology guide to authors.
Nature Chemical Biology focuses and supplements.
Nature Chemical Biology symposium 2009: Chemical biology in drug discovery.

Bookmark in Connotea

New rules for presentation of statistics in cell biology

New rules for the presentation of statistics in the Nature journals are described in the June Editorial of Nature Cell Biology (11, 667; 2009). From the Editorial:

Thanks to advanced imaging technologies and better integration with molecular and systems approaches, cell biology is undergoing something of a renaissance as a quantitative science. Robust conclusions from quantitative data require a measure of their variability. Cell biology experiments are often intricate and measure complex processes. Consequently the number of independent repeats of a measurement can be limited for practical reasons, yet the variability of the measurements can be rather high. Cell biologists have developed good intuition to guide their analysis of such constrained datasets. Biological complexity and the reliance on intuition can cause culture shock to physical scientists crossing over into cell biology (a kind of extension of the celebrated 'two cultures' concept of C. P. Snow).
With the arrival of quantitative information and '-omic' datasets, statistical analysis becomes a necessity to complement instinct. The problem is that statistical tools are built on basic assumptions such as the independence of replicate measurements and the normality of data distribution. Usually, sizeable datasets are prerequisite for statistical analysis. Alas, these can be as hard come by as a biostatistician (n is typically well below 5). The result is that all too often statistics (frequently undefined 'error bars') are applied to data where they are simply not warranted.
There are no easy solutions to rectify the prevalence of poor statistics in cell biology studies. However, an obvious recommendation is to consult a statistician when planning quantitative experiments. Consider whether n represents independent experiments (you may actually be publishing a measure of the quality of your pipette!) and whether it is large enough for the test applied. Avoid showing statistics when they are not justified; instead, show 'typical' data or, better still, all the measurements. Importantly, displaying unwarranted statistics attributes a misleading level of significance to the data. Always describe and justify any statistical analysis applied. We have updated our guidelines to reflect these recommendations. One key rule: if the number of independent repeats is less than the fingers of one hand, show the actual measurements rather than error bars. If you wish to present error bars, include the actual measurements alongside them.
Finally, please remember that you are interrogating a complex system — be careful not to discard 'outlier' data points on a whim, as they may well be as relevant as clustered measurements. One is naturally inclined to ignore data that does not match the hypothesis tested, but biology is rarely as black and white as we would like. Do not make 'hypothesis driven' research become 'hypothesis forced'!

Bookmark in Connotea

Genetically modified mouse strains must be made available

This is the text of one of the Editorials in the current issue of Nature, The sharing principle (Nature 459, 752; 2009):
Back in 1996, human-genome scientists signed up to the Bermuda agreement to share their data without delay. Since then, the sharing principle has entered the mainstream — it now applies to all genomic data generated using public funding, as well as to all the relevant resources cited in publications.
But this principle is not universally observed for genetically modified mice, designed as vital resources in the quest to unpick basic biological mechanisms or to model human disease. The size of the problem is unclear, but existing surveys, combined with extensive anecdotal evidence, suggest it is substantial. In April 2006, for example, scientists at the US National Institutes of Health found that nearly 4,000 unique mice strains had been created, yet barely 700 had been placed in a repository.
Some scientists say they do not have the time nor money to breed and distribute their mice, or even to send the animals to publicly funded mouse repositories such as the European Mouse Mutant Archive, the Jackson Laboratory in Bar Harbor, Maine, and RIKEN BioResource Centre in Ibaraki, which would do those chores for them. Others claim that the careers of young and vulnerable researchers (or old and vulnerable researchers) could be harmed if they lost their exclusive access to a resource they made for their own research projects. Or, they say their institution's technology-transfer offices or companies will not let them part with mouse strains that could perhaps be made to turn a profit.
Such attitudes were noted with concern last month at a workshop in Rome hosted by CASIMIR, a European Union project to coordinate and sustain mouse resources internationally (see background documents). The workshop brought together representatives from funding agencies, publishers and the mouse repositories from Europe, the United States and Australasia. They concluded that the sharing problem urgently needs resolution — not least because international projects to systematically generate mouse lines deficient in each gene in the genome will generate thousands of new strains in the next five years or so.
To solve the problem, however, journals and funding agencies must take a tougher line. The Nature journals are among the very few actually requiring that authors use established public repositories wherever possible as a condition of publication. Most journals simply 'encourage' their authors to make mice used in their publications freely available to other laboratories, or 'suggest' that the mice be deposited in repositories. Funding agencies similarly prefer such cajoling terms as 'encourage' in their policies on sharing mouse resources, and rarely police the outcome.
Journals should now require researchers to place their mice in repositories as a condition of publication. And funding agencies should require repository plans to be included in all grant applications that are likely to generate new mouse strains. Part of the grant money should be reserved for this task and final reports or evaluations of the grants should refer to the repository used. The repositories themselves should help the journals and funding agencies by finding a way to generate a unique accession number for each mouse strain.
The sharing principle allows biology to progress efficiently. It avoids duplication of effort and allows different laboratories to use the same tools. It is essential that scientists sign up to it. Sharing mice has never been easier — the repositories around the world are efficient and professional, and they are coordinated. Just a few changes in the modus operandi of key institutions could ensure that the makers of mice will have no possible excuse not to use them.
Nature journals' policy on availability of data and materials.

Bookmark in Connotea

Nature Methods announces online methods

Nature Methods follows in the footsteps of Nature by ushering in an online methods section, fully integrated with the paper, for all original research articles. Details of the service described in the journal's current (May) Editorial (Nature Methods 6, 313; 2009), and the editors welcome comments on the service at Methagora, the Nature Methods blog.
Daniel Evanko, Chief Editor of Nature Methods, writes: "We are relieved that we will no longer have to relegate important methodological details to Supplementary Information and we expect our authors will appreciate being able to include more citations in their papers. A potential downside of this change is that the print and online versions of papers have quite different levels of methodological detail. What do you think? Those of you who are online readers may not have very strong opinions on this, but what about our print readers? If anyone who regularly receives a print copy of the journal is reading this, we would like your feedback as well."
From the Editorial: "We expect that our readers and authors will appreciate the advantages that Online Methods bring to Nature Methods. With this change effectively increasing the length of Nature Methods papers—and more than doubling the length of Brief Communications—our authors will have far more space to communicate their new methodologies and cite previous work. But by limiting the increase in length to the methods section we continue to emphasize the value of succinct scientific reports. The body of the paper will remain short enough that casual readers can easily obtain the important information. The details required for more in-depth understanding or reproduction of the work will be easily accessible if needed. We hope our authors and readers are as excited by this change as we are."

Nature Methods journal website.
Nature Methods guide to authors.
Nature's formats for methods.
Methods in full, the Editorial announcing Nature's introduction of this service (Nature 445, 684; 2007).

Bookmark in Connotea

Nature Methods on "big data" and the scientific method

The rise of 'omics' methods and data-driven research presents new possibilities for discovery but also stimulates disagreement over how science should be conducted and even how it should be defined. Is the ability of these methods to amass extraordinary amounts of data altering the nature of scientific inquiry? These are the issues dicussed in the April Editorial of Nature Methods (6, 237; 2009).
"Methodological developments are now making it possible to obtain massive amounts of 'omics' data on a variety of biological constituents. These immense datasets allow biologists to generate useful predictions (for example, gene-finding and function or protein structure and function) using machine learning and statistics that do not take into account the underlying mechanisms that dictate design and function—considerations that would form the basis of a traditional hypothesis.
Now that the bias against data-driven investigation has weakened, the desire to simplify 'omics' data reuse has led to the establishment of minimal information requirements for different types of primary data. The hope is that this will allow new analyses and predictions using aggregated data from disparate experiments."
The Editorial goes on to ask whether the generation of parts lists and correlations in the absence of functional models is, in fact, science? "Based on the often accepted definition of the scientific method, the answer would be a qualified no. But the rise of methodologies that generate massive amounts of data does not dictate that biology should be data-driven. In a return to hypothesis-driven research, systems biologists are attempting to use the same 'omics' methods to generate data for use in quantitative biological models. Hypotheses are needed before data collection because model-driven quantitative analyses require rich dynamic data collected under defined conditions and stimuli.
Correlations in large datasets may be able to provide some useful answers, but not all of them: 'omics' data can provide information on the size and composition of biological entities and thus determine the boundaries of the problem at hand. Biologists can then proceed to investigate function using classical hypothesis-driven experiments. It is still unclear whether even this marriage of the two methods will deliver a complete understanding of biology, but it arguably has a better chance than either method on its own."

Comment on this Editorial at Nature Methods' Methagora blog.

Bookmark in Connotea

Raising the bar for micorarray standards

"Investigating the compliance of our publications with MIAME standards (minimum information about a microarray experiment; Editorial, Nat. Genet. 38, 1089; 2006), we found that even when authors and referees are aware of community standards and even with editors mandating both data deposition and accession linking as a condition of publication, a proportion of microarray datasets were at that time unavailable or incomplete." So starts the Editorial in this month's (February issue) Nature Genetics (41, 135; 2009).
In an Analysis article in the same issue of the journal (Nat. Genet. 41, 149-155; 2009), John P. A. Ioannidis and collaborators (four teams) treated the findings of a number of microarray papers published in Nature Genetics between 2005 and 2006 as their gold standard, and attempted to replicate a sample of the analyses conducted on each of them, with frankly dismal results.
According to the Editorial, "the findings of this Analysis should be used to improve practice rather than to critize the authors and referees of these publications. A certain amount of both skepticism and initiative must of course be assumed on behalf of all readers and users of research publications. Equally, there must be enough goodwill and professionalism in the research community to permit critical reanalysis of research findings at any and every moment without this core scientific practice implying any personal criticism. Any scientist should be prepared to reexamine published work, one's own and one's colleagues' alike. In doing so it always helps to make clear one's needs and assumptions, and the Analysis in this issue does indeed explain the limits of the analysts' requirements and critical aims."
The journal, and other Nature journals that publish papers describing microarrays, now insist that authors deposit their data to GEO or ArrayExpress before the submitted paper is sent for peer-review.

Nature journals' policies on data and materials availability, including microarray deposition.
MGED website, specifying MIAME standards necessary to interpret and reproduce microarray data.

Bookmark in Connotea

Incentives needed for genome annotation

Roy Welch and Laura Welch of Syracuse University, New York, examine why researchers seem reluctant to be more directly involved in the annotation of microbial genomes in the February issue of Nature Reviews Microbiology (7, 90; 2009). They write:

"To annotate an organism's genome, biological information about the organism must be matched to the genes and genetic elements in the sequenced genome. The process is iterative and open-ended: new information is constantly incorporated into the annotation. It can also be recursive: analysis of the annotation may provide insight about the organism that in turn leads to changes to the annotation. Unfortunately, the generation of new information and annotation of the genome are at present completely separate processes. Often new information does not become incorporated into the annotation in a timely manner, a costly loss for those who rely on it to advance their research.
The community of expert researchers who study an organism produce most of the information that becomes part of the annotation and are also the primary group of end-users. It is therefore curious that the annotation process is circuitous and inefficient: researchers communicate new information not as direct updates to the annotation, but as research papers that must later be interpreted and incorporated into the annotation separately — most often by a third party! Indeed, some information never finds its way into the annotation. It would be far more efficient for the research community to contribute directly to genome annotation. Yet the life science community as a whole remains stuck in the old, inefficient paradigm."

The authors go on to argue that technology is not the impediment, given the wide availability of wikis (collaborative editing websites) and the databases that have been created using these technologies, including EcoliWiki, GONUTS, Myxopedia and Wikipathways. Rather, state the authors, the impediment seems to be sociological: until contributions to a genome-annotation collaborative information repository can be credited by inclusion in a PhD thesis, curriculum vitae, tenure application or grant proposal, direct collaborative annotations are unlikely to fulfil their promise and potential to accelerate scientific achievement.

Bookmark in Connotea

Call for authors to deposit microarrays in public databases

In a Correspondence to Nature Methods (5, 991; December 2008) responding to an Editorial in the March 2008 issue of the journal (Nat. Meth. 5, 209; 2008) , Scott A Ochsner, David L Steffen, Christian J Stoeckert, Jr and Neil J McKenna report a study showing that researchers are not routinely depositing supporting raw microarray datasets into a public database.
The Correspondence authors surveyed papers from the 2007 issues of 20 journals, searching the text for reference to deposition of a microarray dataset. They find that the rate of deposition of datasets was less than 50 per cent. The authors note the effort required by authors to deposit these complex data in public microarray repositories, even though repositories are simplifying submissions while encouraging compliance with MIAME (minimum information about a microarray experiment) standards. They write: "Although microarray datasets are most useful to bioinformaticians in their raw, unnormalized forms, which facilitate cross-comparison with other datasets, processed datasets are more useful to the bench scientist. Moreover, unless a description of the experimental details is available, neither form of the data are biologically interpretable." They urge repositories to require deposition by authors and propose journals require a statement in the manuscript identifying a repository and accession number at the time of submission, with the record embargoed until acceptance of the paper. (Of the 16 Nature journal papers that were part of the survey, such accession numbers were provided in 15 cases.) They conclude: "Seven years after the elaboration of the MIAME principles, the emerging discipline of microarray meta-analysis, exemplified by the cancer gene expression resource Oncomine, continues to be hobbled by the mundane, time-consuming and often fruitless exercise of tracking down annotated full datasets. We call for a renewed collective effort from researchers, publishers and funding organizations to redress this situation and secure these data-rich research resources for posterity."
The full text of the Nature Methods Correspondence, with supporting data, is here.
Policy note: the Nature journals have for some years required authors to submit MIAME-compliant microarray data to the GEO or Arrayexpress public repository. Details of the journals' polices can be found here.


Bookmark in Connotea

Structural genomics - December update

The Structural Genomics Update for December reports a centralized system created by the Protein Structure Initiative (PSI) that allows investigators an easy way to submit protein target suggestions to the ten PSI structural genomics centres. These proposals are evaluated for feasibility and consistency with the overall goal of the programme. The four large-scale production centres are pursuing structural studies of more than 1,400 community-nominated targets. The six PSI specialized centres, which focus on various structure-determination bottlenecks, also consider target nominations. For further information, guidelines and submission service, see the news article.
In the rest of the December update, see the featured molecule (scavenger decapping enzyme DcpS), selected free-to-access research articles from across Nature Publishing Group journals, as well as other articles and news, including an events calendar.
The PSI-Nature structural genomics knowledgebase is a free service, designed to turn the products of the Protein Structure Initiative into knowledge that is important for understanding living systems and disease. Use the site to explore the PSI's work, and stay informed about advances in structural biology and structural genomics by signing up to the monthly e-newsletter.

Nautilus post announcing launch of the structural genomics knowledgebase.
Previous Nautilus posts about structural genomics.

Bookmark in Connotea

Nature Methods, looking back and moving forward

The fourth anniversary of Nature Methods' arrival on the publishing scene and a change in leadership offer an opportunity for reflection and editorial fine-tuning, as described in the journal's November Editorial (5, 911; 2008).
From the Editorial: "When Nature Methods made its debut in October 2004, just over 4 years ago, it was an anxious but exciting time for our founding chief editor Veronique Kiermer and manuscript editors Nicole Rusk and Daniel Evanko. We were all novices at scientific publishing and more comfortable calibrating a pipette than editing a fledgling journal." The Editorial goes on to outline developments and other changes at the journal since then. Veronique is taking on the role of publisher for Nature Methods and Nature Protocols, and Daniel is taking over as Chief Editor of Nature Methods. Reviews, Perspectives and Research Highlights are to be expanded, while the Protocols section is closing. (Authors are encouraged to submit their protocols to the online publication Nature Protocols.) The Editorial concludes: "We hope that our journal has helped dispel the notion that methods are less important than results and deserving only of small print at the end of a paper. Debunking this myth has been and will continue to be our main mission. We will persist in our efforts to bring you, every month, a journal that allows methods to be featured prominently in their own right—as the cornerstones upon which results are based."
Nature Methods guide to authors.
How to submit to Nature Methods.
Aims and scope of the journal.
Methagora, the Nature Methods blog.

Bookmark in Connotea

Historical microbiology archive made free to all

In its November Editorial, Nature Reviews Microbiology (6, 794; 2008) reports that the archive of the International Journal of Systematic and Evolutionary Microbiology (IJSEM) has been made available free online: a boon for scientists, historians and the public. The Society for General Microbiology publishes IJSEM on behalf of the International Committee on Systematics of Prokaryotes of the International Union of Microbiological Societies. The society has now provided funding for the entire back archive of the journal to be made freely available worldwide without a journal subscription. (The current content, or past two years, remains subject to access controls.)
From the Nature Reviews Microbiology Editorial: Systematics is the foundation for studies of all types of organisms, because it helps us to understand how one organism relates to another. The value of systematics is often underappreciated, however, for bacteria and viruses. For example, there is a huge imbalance between the 7,000 named bacterial species and the 1,000,000 named insect species. This is particularly important given that it is now well-known that bacteria and viruses are the most populous organisms on Earth, and furthermore, that more than 99% of bacteria have yet to be cultivated. Why should we be interested in naming and characterizing different species of bacteria? The advent of metagenomics has swelled the literature with ever-increasing estimates of numbers and types of bacteria and viruses in the biosphere. An important adjunct to genomics-based approaches is the detailed characterization of these myriad species and investigation of the relationships between them. The availability of the IJSEM archive will hopefully spur renewed interest in this area.
Jean Euzeby, the IJSEM list editor, maintains an incredibly useful web resource that details all those species that have been ratified — the List of Prokaryotic names with Standing in Nomenclature. Another useful site named Bacterial Nomenclature Up-to-Date has an up-to-date list of bacteria and is based on the work of Norbert Weiss, who maintained the database until his retirement in February 2003. The current database is maintained under the supervision of Manfred Kracht. Finally, a comprehensive taxonomy of the Bacteria and Archaea can be found in the Taxonomic Outline of Bacteria and Archaea (TOBA) Release 7.7, which was last updated in 2007.
Other useful resources are described in the Editorial.

Bookmark in Connotea

To show or not to show data

'Data not shown' is an outdated caveat that obscures the transparency of a scientific report and weakens the peer review process, according to Nature Chemical Biology (4, 575; 2008).
"Technology and competition perpetually raise the bar for the quality and quantity of experimental data that authors must include to publish a high-impact manuscript. Almost uniformly, journals have amended their formats to accommodate the increased volume of data while maintaining page restrictions by providing the supplementary information option online for authors and readers. Despite these changes, many authors still rely on the caveat of 'data not shown'. At Nature Chemical Biology, we discourage the use of this phrase and the omission of important data for two major reasons. First, the exclusion of essential data undermines the peer review process, and second, readers need access to data to form independent opinions about and to replicate the results of published papers. Thus, we suggest that the time for 'data not shown' has passed."
Read the rest of this Editorial here.
For additional details on presenting and consolidating methods, see the journal's Guide to Authors.
See the NPG authors' and referees' website for more details on our data sharing and database deposition policies.

Bookmark in Connotea

Launch of Protein Structure Initiative-Nature Structural Genomics Knowledgebase

Last week, Matt Day announced Nature Publishing Group (NPG)'s latest website: the Protein Structure Initiative (PSI)-Nature Structural Genomics Knowledgebase. Matt writes that the new addition to NPG's existing collection of gateways and databases is "a collaboration with the Protein Structure Initiative, a large scale NIH-funded consortium to develop and apply high-throughput techniques for protein structure determination. They've been highly successful in generating new technologies that are available for others to use, and they've shown that structure determination work can be scaled up significantly.
Now that the site is launched, we'll be providing monthly editorial updates that put developments in structural work into context for a wide range of biomedical researchers....The website is hosted at Rutgers University by the same team that hosts one of most significant and long-established databases, the Protein Data Bank".
The Structural Genomics Knowledgebase (SGKB) offers researchers and others an easy way of keeping abreast of developments both by the PSI and more generally in the fields of structural genomics and structural biology. It is a regularly updated portal to research data and other resources from the PSI, with NPG providing a monthly update with synopses of important research advances, recent additions to a categorized library of research articles, as well as news and events in structural biology. You can register to receive a monthly email newsletter and subscribe to RSS feeds. NPG resources and publications relevant to the Protein Structure Initiative can be accessed here.

Bookmark in Connotea

Video journal to be indexed in Medline and PubMed

The Journal of Visualized Experiments (JoVE) has announced that its online video protocols will be indexed in the popular US National Library of Medicine repositories MEDLINE and PubMed.
Founder and chief executive Moshe Pritsker views the MEDLINE–PubMed listing as a sign that the scientific community has accepted video-based publications. "It was a very important decision for us, and for scientific publishing," he says.
Since JoVE was founded in 2006 with support from an angel investor, the journal has published more than 200 videos, most produced by professional videographers. It aims to improve the reproducibility of scientific results by using videos to clarify subtle experimental details. The journal was itself an experiment in video publishing and remains the only video-based scientific journal.
From Nature 455, 13 (2008).

Bookmark in Connotea

Nature's special issue on 'big data'

The Big Data special package of articles in this week’s issue of Nature (4 September 2008) looks at how massive influxes of data are changing the way science is done in many fields, and includes a feature story on ‘Wikiomics’ that might be of particular interest to the scientists who work with "web 2.0" tools. Coping with floods of data is now one of science's biggest challenges, so the Nature special issue assess the need to complement smart science with smart searching; looks at what the next Google will be; interviews the pioneering biologists who are trying to use wiki-type web pages to manage and interpret data; and recalls that the first mass data crunchers were not computers, but the remarkable women of Harvard's Observatory. All the articles, as well as downloadable PDFs of the print versions, are free online for two weeks from the publication date. We encourage you to download everything you are interested in—and then to spread the word to friends and colleagues about what you like (and don’t like!) via email, blog, by commenting online at the Nature website, or other means. And of course, Nature always welcomes Correspondence submissions.
The contents of the Big Data 'special' in full:
Editorial: Community cleverness required
Researchers need to adapt their institutions and practices in response to torrents of new data — and need to complement smart science with smart searching.
Special Report: The next Google
Ten years ago this month, Google's first employee turned up at the garage where the search engine was originally housed. What technology at a similar early stage today will have changed our world as much by 2018? Nature asked some researchers and business people to speculate — or lay out their wares. Their responses are wide ranging, but one common theme emerges: the integration of the worlds of matter and information, whether it be by the blurring of boundaries between online and real environments, touchy-feely feedback from a phone or chromosomes tucked away on databases.
Party of One column: Data wrangling
Collecting and releasing environmental data have stirred up controversy in Washington, says David Goldston, and will continue to do so.
Features: Welcome to the petacentre
What does it take to store bytes by the tens of thousands of trillions? Cory Doctorow meets the people and machines for which it's all in a day's work.
Features: Wikiomics
Pioneering biologists are trying to use wiki-type web pages to manage and interpret data, reports Mitch Waldrop. But will the wider research community go along with the experiment?
Commentary: How do your data grow?
Scientists need to ensure that their results will be managed for the long haul. Maintaining data takes big organization, says Clifford Lynch.
Books & Arts: Distilling meaning from data
Buried in vast streams of data are clues to new science. But we may need to craft new lenses to see them, explain Felice Frankel and Rosalind Reid.
Essay: The Harvard computers
The first mass data crunchers were people, not machines. Sue Nelson looks at the discoveries and legacy of the remarkable women of Harvard's Observatory.
Review: The future of biocuration
To thrive, the field that links biologists and their data urgently needs structure, recognition and support. Doug Howe, Maria Costanzo, Petra Fey, Takashi Gojobori, Linda Hannick, Winston Hide, David P. Hill, Renate Kania, Mary Schaeffer, Susan St Pierre, Simon Twigger, Owen White & Seung Yon Rhee
Podcast Extra: Big Data
As Google celebrates its 10th anniversary, we find out how science is coping with massive datasets generated by unprecedented computing power. BoingBoing blogger Cory Doctorow tells us about his visits to the LHC data storage facility and the genome sequencing Sanger Centre.

Bookmark in Connotea

Method of the Year 2008: cast your vote!

When the Nature Methods editors sat down last year to select a Method of the Year for 2007, it was with the firm intention of initiating a yearly tradition. This year, the editors are are asking for your opinion, so please nominate candidate methods as well as vote and comment on posted suggestions.

From the editorial in the September issue of Nature Methods (5, 749; 2008):
The Method of the Year event is a celebration of methods development and innovation because we think that methods developers should have their share of the limelight. It is also a fun opportunity to assemble Commentaries, technical information and news items about a method we consider particularly important among the developments that we, as editors, continuously observe across a broad range of disciplines. But we also wanted to take the pulse at the bench and see what you, with firsthand experience, think of recent methods developments. This online one-click voting and nomination process is your opportunity to speak up.
We are interested in methods that have come into their own in 2008 and have had a proven impact, but also in your views on burgeoning methods which, while they are not quite ready for prime time, are worth watching.

Bookmark in Connotea

Cultural media at Nature Reviews Microbiology

Chris Condayan, manager of the Public Education Outreach Initiative at the American Society for Microbiology, writes an Editoral in the September issue of Nature Reviews Microbiology (6, 646; 2008) about how self-created audio and video content enable more microbiologists to share knowledge and news online. From his article:

As the science audiences for newspapers, radio and television decline, the future for audio and video podcasts, blogs and social networking looks bright. On the horizon we are starting to see the emergence of science-related social networks and a movement towards 'open science' that allows scientists and researchers to collaborate on projects, communicate results, share data and publish papers with the same recognition that is afforded to colleagues who publish in print journals. Specific details of how open science will work are still murky, and concerns over citation, peer review, accuracy, scooping and accountability resound even among its strongest supporters. But this has not stopped microbiologists from engaging with one another on wikis, such as EcoliWiki, TOPSAN or Proteopedia, or prevented thousands of scientists from sharing their poster presentations, lectures or laboratory methods through iTunes or video destination sites, such as YouTube, SciVee and JoVE. Podcasting for audio or video is generally defined as episodic content that listeners or viewers can subscribe to for free and which they can consume at any time and on any device, whether it be a computer, iPod, mobile telephone or television set. Audio podcasts are easy and affordable to create, and can be used to make a radio-style show or to make lectures available to a wider audience. Uploading and sharing videos on websites such as YouTube, or creating a video podcast show, is more time consuming and requires more equipment and a video-editing software program.

Chris Condayan goes on to provide some examples of microbiologists who produce online resources to educate and promote microbiological research, and the MicrobeWorld sevice of the American Society for Microbiology, which includes the popular resources MicrobeWorld Radio and MicrobeWorld Video.

Bookmark in Connotea

Changing the publication landscape with Nature Precedings

Nature Cell Biology wishes Nature Precedings a happy birthday in its July Editorial (Nature Cell Biology 10, 753; 2008), in the process taking stock of the usefulness of "web 2.0" publishing ventures to cell biologists (and other scientists).
Nature Precedings allows rapid posting of unpublished (and unreviewed) manuscripts, conference posters and slide presentations. Entries are subject-tagged, searchable and citable. Postings are screened by curators for scientific legitimacy, plagiarism and scope, but not peer-reviewed for novelty or data quality; commenting and voting by readers is encouraged.
The immediate question for most cell biologists and other scientists is whether posting material on a preprint sever is worth the risk of being scooped. Nature Cell Biology concludes that although many cell biologists are still wary of preprint servers, they could consider posting solid data that are not likely to be published in traditional journals because they are confirmatory or negative. Well-controlled negative data are immensely useful to colleagues, so documenting them in a citable form on a preprint server is a valuable community service.

Related posts:
Peer to Peer on the first year of Nature Precedings.
Nature Network post by Hilary Spencer of Nature Precedings.
Nature Network online discussion forum for the Nature Cell Biology editorial.
Futher information about Nature Precedings is available here.


Bookmark in Connotea

Preservation of content in electronic journals

Via Knowledgespeak press release:
Two years after a meeting calling for urgent action to preserve scholarly e-journals, the results of a survey of 1,371 library directors of four-year colleges and universities in the United States have been released.
Most library directors who responded believe their own institution has a responsibility to take action to prevent intolerable loss of scholarly records. But although larger libraries support one or more e-journal preservation initiatives, most respondents from smaller libraries are yet to support any preservation effort and secure permanent access to e-journals for their institutions.
The survey, conducted by Portico and Ithaca, raises questions about how the responsibility for preservation of critical electronic resources should be supported by the community, even as electronic resources expenditures expand substantially at libraries across the spectrum. The organizers hope that the report will be a catalyst for leaders of libraries, consortia, and other organizations to provide a mechanism for digital preservation. The full report is available for download as a PDF. (A summary is available here.) Readers are also invited to share comments and reactions in the provided online discussion space.

Bookmark in Connotea

Proposal for a centralized grant repository

Noam Y. Harel of Yale University writes in Nature's Correspondence page (Nature 452, 409; 2008):

Writing grant proposals is difficult enough; keeping track of different deadlines makes for an endless cycle of procrastination and frantic preparation. The added stack of bureaucratic forms, with arcane variations from agency to agency, can tip one over the edge as a deadline nears.
Is it almost too obvious to wish for a centralized proposal repository? Investigators could submit proposals at any time, in a common format that highlights the science rather than obliterates it with red tape. Funding agencies could search the repository for proposals matching their interests. A minimum of bureaucratic information would be required up front. Budget details could be worked out between funding agencies and investigators as necessary.
Ideally, all proposals would be publicly accessible. However, most of the scientific community has not yet accepted the inevitable dawn of truly open science. Submissions to a central repository could therefore be made accessible only to funding agencies that agree to keep proposals private (unless a submitting investigator indicates a willingness to share his or her proposal publicly).
The repository would make life easier for scientists by eliminating the hassle of searching for suitable grant mechanisms and the stress of meeting various deadlines. It would make life easier for funding agencies by expanding the pool of applications from which to choose. Of course, the best proposals could attract offers from multiple agencies. Rather than forcing investigators to choose non-overlapping sources of funding for each project, why not use the repository to mediate shared funding agreements that could benefit everyone involved? In effect, it would serve as the mediator between grant-seekers and grant-providers.
In a world where eBay, Facebook and Google powerfully demonstrate the communal nature of the Web, it is a pity that scientists and funding agencies don’t have a similarly modern forum for matching their interests and offers.

Bookmark in Connotea

Consistent guidelines for clinical interventions

An Institute of Medicine report recommends that the United States government create a programme to provide consistent guidelines for clinical interventions. The reliability of the guidelines will depend on the availability of the clinical data to be assessed, according to this month's (March 2008) Editorial in Nature Medicine (14, 223; 2008).
The problem is that "Widespread regional variation in how health care providers treat some conditions in the United States reflects the sobering fact that, for many interventions, there is no consensus about what constitutes effective clinical care. Physicians and health care providers must try to make sense of innumerable and conflicting guidelines in order to choose the best available intervention for their patient. Scientific, systematic review of data from medical literature and clinical trials is crucial to forming a reliable evidence base of what actually works in health care. With this in mind, professional medical organizations, patient advocacy groups, government agencies and others have synthesized available data on the efficacy of particular interventions and have produced guidelines recommending certain courses of action for specific conditions. The problem is that there is no consensus among the approaches to systematic review, and, more troublesome, no clear understanding of the best methods for assessing the evidence."
The Institute of Medicine has stepped in to recommend a plan to help resolve conflicting medical advice (reported in a news story at Nature Medicine 14, 226; 2008) by three methods: first, identify interventions that are priorities for evaluation; second, develop standardized and reliable methods for performing systematic reviews of all the available data about a given intervention; and third, develop standards for producing clinical guidelines. The Editorial discusses some of the practical difficulties, concluding that the Institute of Medicine report is an important step forward but will require legislation if it is to work.

Bookmark in Connotea

Nature Methods recommends deposition of proteomics data

Starting this month (March 2008), Nature Methods strongly recommends deposition of proteomics data to public repositories before manuscript submission. From the Editorial in the March issue of the journal (Nat. Meth. 5, 209; 2008):
"Several proteomics data repositories are now available that differ in terms of their goals, structure and the formats they accept. They include PRIDE, PeptideAtlas, Global Proteome Machine Database (gpmDB) and the file distribution system Tranche. The newest addition, Human Proteinpedia, is a community-based annotation tool that hosts experimental data (Nat. Biotechnol. 26, 164; 2008).
Importantly, the major database administrators have shown their willingness to work with users and with each other to facilitate data deposition. At this stage, the process can still be labor-intensive, but a repository like PRIDE provides extensive technical assistance. Under the umbrella of the ProteomExchange consortium, the major repositories are also devising ways to share their data in a collaborative fashion, capitalizing on their complementarities to minimize submission hassle while maximizing benefits.
We support these efforts and consider it premature to recommend a particular repository. Rather we will rely on community experience to determine which database or combination of databases emerges as the most useful. However, there are specific features that editors favor. In particular, we like the possibility currently offered by PRIDE and Human Proteinpedia to provide peer reviewers with access to datasets associated with a manuscript before public release, in an anonymous fashion, and to coordinate public release of the data with publication. "

Nature Methods welcomes comments on this Editorial, and the recommendations it makes, at the journal's blog Methagora.
The Nature journals' policies on data and materials availability, including links to editorials on these policies, can be found at the author and reviewers' website.

Bookmark in Connotea

Non-traditional publishing choices for biologists

Zeba Wunderlich and Kishore Kuchibhotla of Harvard University write in Nature's Correspondence page (451, 887; 2008):
The paramount importance of publishing in biology dissuades many young scientists from making non-traditional choices with regard to where and how we publish our work. My colleagues and I believe it is in our own interests to identify the shortcomings of traditional publishing and to explore other publishing possibilities that are free of those problems.
What can we do? First, learn about our options. There are several innovative developments poised to change the publishing landscape dramatically. Video publications, preprint archives and high-throughput online journals are but a few that have recently surfaced (for a discussion, see Nature Network's Publishing in the New Millennium forum).The onus is on all of us to investigate these resources and to consider how they might enrich our science.
To make a difference, we also need to contribute. Frustrated by technical difficulties in reproducing published experiments? Then publish a video protocol in the Journal of Visualized Experiments. Have you benefited from a colleague's comments at a conference? Then extend the experience, and comment on articles published by PLoS One and posted on Nature Precedings. These initiatives will take hold and achieve their full potential only with strong support from the scientific community.
If we collectively embrace these ideas, publishing will become more effective. Although the psychological and social barriers to submitting a contribution initially are surprisingly high, becoming involved has proved to be rewarding. Ultimately, scientific progress and the published record have a symbiotic relationship — improved communication will enhance the pace, progress and efficiency of research.
[Note added by Maxine: In addition to the resources mentioned above, Nature Protocols is an online resource which welcomes the upload of protocols, in video or written form, and provides users with an interactive network for comments and additions.]

Bookmark in Connotea

Research Information Network on data stewardship

The UK Research Information Network (RIN) has produced a framework of key principles and guidelines on the stewardship of digital research data for research institutions, libraries, publishers, societies and funders, produced after more than a year of wide consultation among these groups. The summary of the framework is available as a two-page PDF, and the full report as a 16-page document (PDF).
The framework is not only addressing the basic issue of the preservation of research data because it is essential to evaluate and re-assess results, but is identifying new approaches to managing and providing access to the data in an era of digitization, new technologies, aggregation and "adding value" to data by re-use.
The framework document identifies five key principles, in abbreviated form:
1. The roles and responsibilities of researchers, research institutions and funders should be defined and have codes of practice to ensure that creators and users of research data are aware of and fulfil their responsibilities.
2. Digital research data should be created and collected in accordance with international standards.
3. Digital research data should be easy to find, and access should be provided in an environment which maximises ease of use, and which provides credit for and protects the rights of those who have gathered or created data, and/or who have legitimate interests in how data are made accessible and used.
4. Models and mechanisms for managing and providing access to digital research data must be both efficient and cost-effective.
5. Digital research data of long-term value arising from current and future research should be preserved and remain accessible for current and future generations.
The full details are available at the RIN website.


Bookmark in Connotea

Protein structures in the public domain

Aled Edwards of the Structural Genomics Consortium, University of Toronto writes in a Correspondence in this month's issue of Nature Structural & Molecular Biology (15, 116 ;2008):
The Structural Genomics Consortium (SGC) is a public-private partnership that places the three-dimensional structures of proteins of relevance to human health into the public domain without restriction on use. Over the past 3 years, the SGC has deposited the structures of more than 550 proteins from its Target List into the Protein DataBank (PDB); this accounts for about one-quarter of the new structures of human proteins in the PDB over this period ('new' is defined as <95% sequence identity to proteins whose structures were already available in the PDB) and the majority of the new structures from the human parasites that cause malaria, cryptosporidiosis and toxoplasmosis. Over the next 4 years, the SGC is committing to determining the structures of another 600 proteins from its Target List, including eight human integral membrane proteins.
The SGC has been releasing the coordinates for all the SGC structures into the PDB immediately after they meet the SGC quality criteria, even if the ultimate intention is to describe the work in the peer-reviewed literature. This data release policy, which has often meant that coordinates were available for several months before the manuscript was even written, has not limited the ability of our scientists to publish.
In keeping with our policy to make our data available as soon as possible, the SGC is now also providing 'pre-released' coordinates on its website when a new SGC structure is submitted to the PDB, allowing scientists to access the structural information while the deposition files are being processed. Scientists should ensure that the revised coordinate file is downloaded once it is released by the PDB.

Bookmark in Connotea

A 'third way' for privatizing biomedical research

Ron A. Bouchard of the University of Alberta, and Trudo Lemmens of the University of Toronto, write in a Commentary in this month's Nature Biotechnology (Nat. Biotechnol. 26, 31-36; 2008) that the allocation of risks and benefits of publicly sponsored biomedical research is becoming increasingly skewed toward for-profit entities and against the public interest. A legitimate solution to this imbalance would be to levy compulsory government royalty fees on commercial products made possible by public efforts.
The authors argue that "public–private partnerships can be particularly valuable in circumstances involving large transaction costs associated with novel biomedical inventions aimed at the global public good. That said, a combination of self-interest and anxiety in the face of globalization has led to wide swings of the pendulum of S&T policy and scholarship in recent years, with argument for expansive IPR rights on the one hand and their abolition in favor of a completely open source model on the other. Neither position is likely to be balanced or workable over the long term, as both may skew too far to private or public interests." A compulsory government royalty on technologies commercialized using public money, they argue in their Commentary, is a necessary 'third way' to protect the interests of for-profit entities and those of the public.

Bookmark in Connotea

Where did the scientific method go?

Michela Noseda of Imperial College, London and Gary R. McLean of the University of Texas Health Science Center write in this month's Nature Biotechnology (26, 28 - 29; 2008) a response to the Brief Communication published by Mazor et al in the May issue (Nat. Biotechnol. 25, 563–565; 2007). What bothers Noseda and McLean is not the article itself, but that it contains, they write, "a lack of documented methodology and information that is essential to faithfully reproduce the science claimed in the manuscript. Surely, the aim of scientific publication is to disseminate scientific information to further advance our knowledge and to allow others to use such information for expansion and possible improvements to the work. Mazor et al. are clearly not the only authors being forced into abbreviated paper formats that follow this trend, which suggests the problem goes significantly deeper.
Admirably, Nature has recently implemented new guidelines for the addition of methods to their published research articles and letters. Authors are given multiple options for the appropriate presentation of methods within their manuscripts, avoiding the demotion of Methods to the supplementary section. This approach should be commended and we hope adopted universally by additional scientific periodicals. Aside from these rules, we should all make an extra effort as authors and reviewers to ensure that scientific methodology resumes its rightful position as the foundation of basic scientific research."

The Nature Biotechnology editors respond (Nat. Biotechnol. 26, 29; 2008):
"Noseda and McLean raise interesting points. With regard to the ability to reproduce a paper's methodology and findings, the fact that descriptions of methods in Supplementary Material online are not copy edited for grammar or clarity at Nature Biotechnology (or at any other Nature monthly journal) could be argued to potentially compromise the lucidness and ease with which a reader can repeat a published experiment. As the authors also point out, Nature's new guidelines for the addition of methods to its published papers provide authors with flexibility in how to present their methods within the final printed issue and online. One additional benefit to Nature's approach, not mentioned by Noseda and McLean, is that references to methods or protocols that appear in the Methods section remain in the printed paper rather than being relegated to online only (where they are less likely to be cited). We would welcome feedback from our readers as to whether they feel Nature Biotechnology should follow a similar model to Nature."

Bookmark in Connotea

Expanded licence for reuse of genome papers

From an Editorial in today's (6 December) issue of Nature (450, 762; 2007):
Although Nature and the Nature journals are built on a business model funded by subscribers and other sources of revenue, various initiatives have been implemented to enhance the accessibility of the research papers published in these journals.
They have long been freely available to researchers in the 100 or so poorest countries through the World Health Organization's Hinari initiative and others like it. Machine access is being enhanced by the open text-mining initiative of the Nature Publishing Group (NPG). Preprints of original versions of papers can be deposited in arXiv and Nature Precedings without compromising their acceptability for publication. And final authors' versions of papers can be deposited in PubMed Central and other public servers from six months after publication. Authors retain copyright of their work, whereas NPG retains the licence to publish it.
For many years, a more generous arrangement has been made for papers reporting full genome sequences. (The paper reporting the sequence and analysis of 12 species of Drosophila is the most recent example, see Nature 450, 203; 2007). These papers are freely accessible on NPG's website from the moment of publication. This recognizes a consistent character of 'genome' papers: they represent the completion of a key and fundamental research resource, describing and reflecting on what has been revealed but not usually providing insights into mechanism. Although some papers in other disciplines might also be characterized in this way, the fundamental character of the genome has led NPG to make a systematic exception.
In the continuing drive to make papers as accessible as possible, NPG is now introducing a 'creative commons' licence for the reuse of such genome papers. The licence allows non-commercial publishers, however they might be defined, to reuse the pdf and html versions of the paper. In particular, users are free to copy, distribute, transmit and adapt the contribution, provided this is for non-commercial purposes, subject to the same or similar licence conditions and due attribution.
In 1996, as human genome sequencing was getting under way, leading players stated: "It was agreed that all human genomic sequence information, generated by centres funded for large-scale human sequencing, should be freely available and in the public domain in order to encourage research and development and to maximise its benefit to society". These principles have continued to guide the field, and NPG has consistently made genome papers freely available in keeping with them. This new licence allows us to formalize the arrangement.

Bookmark in Connotea

Public accessibility of scientific databases

Last year, Nature Biotechnology ran an Editorial about the failure of a biological database:

Six weeks ago, the rights to one of biology's premier public databases were quietly sold to an informatics startup. The database in question, the Biomolecular Interaction Network Database (BIND), is arguably the most comprehensive freely accessible protein-protein interaction database available to the research community. Yet through a combination of bureaucratic delays, Canadian government fiscal nitpicking and a lack of community consensus, this important resource now finds itself on life support, its survival precariously linked to that of Unleashed Informatics, a private venture founded last April with little more than $1.0 million in seed funding from Sun Microsystems. BIND is a database of molecular associations that collates high-throughput data submissions and hand-curated information from the scientific literature……
(From Nature Biotechnology 24, 115; February 2006.)

One correspondent disagreed with the Editorial's assessment and wrote that in his opinion the enterprise had been a waste of taxpayers' money.

Rather than arguing for the importance of long-term database funding by granting agencies, BIND's saga in fact argues for greater caution and more demanding oversight when these agencies elect to fund a database's initial development.
(W. Busa, Nature Biotechnology 24, 1095; September 2006).

Now, some months later, the journal is able to publish a response from one of BIND's creators, and from another correspondent in support of the database:

On March 20 this year, Thomson Scientific (Philadelphia) acquired the BIND database together with a stable of software and services through the purchase of Unleashed Informatics (Toronto). These products were originally created by my laboratory using public funds. They were the intellectual property of my former host institution, Mount Sinai Hospital, in accordance with its employment contracts and policies. Confidentiality constraints from the outset of the discussion with Thomson Scientific, which predated Busa's letter, prevented me from addressing Busa's comments at the time. I would now like to address several misapprehensions and inaccuracies in his comments..........BIND has always had the broadest scope of any interaction database (all organisms) as well as the deepest annotation (down to atomic three-dimensional structures). BIND curators extracted information from figures—a feat no text mining tool can do and 85% of hand-curated BIND records have information arising from figures. It is the breadth, depth and quality of BIND that led to its commercial acquisition. And this was pursued only after having exhausted all possible means for continued public support.......
(C. Hogue, Nature Biotechnology 25, 971; September 2007.)
Researchers may not mind paying for the luxury of specialized databases, but data registries that cater to a broad set of users should be broadly and freely accessible to the research community. Although the initial development of databases, such as BIND, requires caution and close oversight of budgets, an equally important aim should be to ensure that data repositories of particular utility to the research community remain sustainable and publicly accessible. Databases, such as BIND, should not be left to the private sector. Ensuring public accessibility to data essential for research progress is the responsibility of the central planner, not Adam Smith's invisible hand in the marketplace.
(K. Wang, Nature Biotechnology 25, 971-972; September 2007.)
Bookmark in Connotea

Nature journal policies on proteomics data

The August editorial in Nature Biotechnology, 'Time for leadership' (Nat. Biotechnol. 25, 821; 2007) describes how the example set by leading proteomics laboratories will be a major factor in determining the successful implementation of new reporting guidelines in the wider community.
The August issue of the journal includes two perspectives that propose reporting guidelines for proteomics and molecular-interaction data sets (p. 887 and p. 894). The "minimum information about a proteomics experiment" (MIAPE) and an associated module on molecular interaction experiments (MIMIx) were developed by the Proteomics Standards Initiative of the Human Proteome Organization with the aim of standardizing the reporting of proteomics research.
The editorial goes on to state: "Whether Nature Biotechnology ultimately elects to require compliance with the MIAPE guidelines will depend on their reception by the scientific community. This March, we began recommending (not requiring) that proteomics and molecular-interaction data sets be deposited in a public repository before the associated manuscript is submitted to this journal (Nat. Biotechnol. 25, 262, 2007). But we would not consider enforcing the MIAPE guidelines until such time as the proteomics community has reached a consensus that the benefits of compliance outweigh the burden.
Before this can happen, at least two critical pieces of infrastructure must be in place. First and foremost, appropriate software tools must be developed and made freely available to all. Second, databases must improve their capabilities for transferring and storing MIAPE-compliant data sets."
We welcome your comments as the Nature journals further develop their policies in this area.

Bookmark in Connotea

August editorials on sharing, naming and credit

The Nature journals this month (August) feature several editorials on the publishing process. A short round up (with links) follows:

Nature Genetics (39, 931; 2007), in 'Compete, collaborate, compel', calls for procedures for microattribution to be established by journals and databases so that data producers have an overwhelming incentive to deposit their results in public databases and thereby to receive quantitative credit for the use of every published data accession.

In 'Got data?', Nature Neuroscience (10, 931; 2007 ) points out that data sharing is not only good citizenship for researchers, but is also required by funding agencies and many journals. The scientific community needs to develop better incentives to encourage compliance and reward those who share.

And in 'Name that gene!', Nature Structural & Molecular Biology (14, 681; 2007) warns that scientists coin new terms, or neologisms, at a tremendous pace, but name choice can have unforeseen results.

Bookmark in Connotea

Automated structured abstracts

Udo Hahn and colleagues add to the discussion "making data available to all" by describing the benefits of automated, as opposed to manual, structured abstracts (see Nature 448, 130; 2007). They write:

Mark Gerstein and colleagues in Correspondence (Nature 447, 142; 2007) propose that journals should require authors to manually provide structured abstracts to facilitate text mining of biological information. There are three main difficulties in implementing such a proposal.
First, life-science terminologies are huge, diversified and complex. This means that identifying the correct content descriptors is almost impossible for inexperienced users of online term repositories. For example, Medical Subject Headings , the International Classification of Diseases and Gene Ontology are high-volume — tens of thousands of terms — and structurally complicated terminological systems, each with different design rationales, naming conventions and principles of structural organization. Even human indexers, search specialists and database curators with routine exposure to these resources have to invest much effort in understanding and keeping track of their content as well as terminological updates and revisions. Will scientists find the time to dive so deeply into this alien terminological territory, and be capable of finding exactly what they are looking for?
Second, the coverage of existing terminologies for the many subdomains in the life sciences is incomplete. The two main terminological umbrella systems for the life sciences, the Unified Medical Language System and the Open Biomedical Ontologies, contain impressive numbers of individual terminologies, but their coverage of the life sciences is still fragmentary and suffers from varying depths of description. The size of the terminology gap is likely to be even more pronounced if authors were required to encode relational descriptions, for example indicating a binding relation between two specific proteins, P1 and P2, by Bind(P1, P2), because such a vocabulary has not yet been determined.
Third, the quality and reliability of author-supplied content descriptions is quite a hurdle. Even if the first and second problems were to be solved, human indexers, even professional ones, are liable to error as well as to the possibility of intrinsic subjective bias (M. E. Funk and C. A. Reid Bull. Med. Libr. Assoc. 71, 176–183; 1983). This is not to say that authors of a structured abstract would consciously cheat, but rather there is a grey area of overstatement and overestimation of one's own results in a highly competitive scientific environment. If authors' structured entries were subject to peer review together with the submitted article, this would be more work for the reviewers as well as the authors — neither of them likely to have been trained as terminologists.
As an alternative, we suggest automated procedures for knowledge capture in which neither the authors nor the reviewers are in the loop. There has been significant progress in automatic text mining and information extraction as well as in the methodological foundations of life-science terminologies in terms of ontologies, knowledge representation languages and semantic encoding standards. These efforts in automating the generation of content descriptions and linking them directly to biological databases are strongly experimentally founded and would help to avoid additional workload and subjectivity — see, for example, the BioCreAtIvE competition results. Once automated mechanisms for content analysis are applied, this also increases the coverage and the recency of the literature entered into biological databases, as human input is complemented by computationally generated content.
Udo Hahn, Joachim Wermter
Friedrich Schiller University Jena, Germany
Rainer Blasczyk & Peter A. Horn
Hannover Medical School, Germany

Bookmark in Connotea

What is "open science"?

Frank Gibson, a Research Associate at Newcastle University, UK currently working on the e-neuroscience project CARMEN, has written an essay Do scientists really believe in open science? , in which he collects current opinions of “Open Science”. He was stimulated to write the essay because of his role in the CARMEN project which, he writes, has exposed him to a domain of the life-sciences to which "data sharing and publicly exposing methodologies has not been readily adopted, largely it is claimed due to the size of the data in question and sensitive privacy issues."

The essay is available here. It addresses definitions of "open science" and summarizes the standards used in disciplines other than neuroscience. You can see the Nature journals' policies on data availability here, which apply to all the original research articles our journals publish. Via this web page, you can provide us with your comments and views on recent journal editorials about emerging policies on data availability in a range of disciplines and circumstances.

Among other aspects of "open science", Dr Gibson discusses the "open notebook" approach pioneered by J-C Bradley. He also notes that Postgenomic produces an "up-to-the minute list of the open science discourse". (Postgenomic is a website that tracks hundreds of science blogs and "does interesting things with that data".) "Although early days", continues Dr Gibson, "maybe even the "open science group" on Scintilla (still undecided on Scintilla) will be the place in future for fostering the open science community".
Scintilla is one of Nature Publishing Group's very latest products. It collects data from hundreds of news outlets, scientific blogs, journals and databases and then makes it easy for you to organize, share and discover exactly the type of information that you're interested in. For example, you can keep track of life science podcasts, or the latest papers on schizophrenia, DNA methylation, physics or immunology. It is free to join, so take a look at what it has to offer and, if you wish, contribute to the open science group, or join one of the many other interest groups there.

Bookmark in Connotea

Corrigendum for Nature paper on stem cells

The authors of a controversial paper on stem cells publish a correction of their work in this week's issue of Nature (447, 880-881; 2007) but state in it that the errors do not affect the conclusions of the article. A News story also in this week's issue (Nature 447, 763; 2007) describes how the paper in question, published in 2002, claimed to find evidence for so-called 'multipotent adult progenitor cells', or MAPCs, in mouse bone marrow (Y. Jiang et al. Nature 418, 41–49; 2002). The work was led by Catherine Verfaillie, now director of the Stem Cell Institute at the Catholic University of Leuven.
From the News story: The paper challenged the prevailing idea that only stem cells derived from embryos were highly flexible. Some of its results have been reproduced by other labs, but no one has been able to replicate the work independently in its entirety. "I believe that despite the hype over the mistake, we and Nature made the conclusion that the final findings of the paper still stand," says Verfaillie.
This February, an investigation convened by the University of Minnesota — Verfaillie's former institution — found that her group had used incorrect procedures in the Nature paper, and that some of the data contained in it might be flawed. The investigation was a response to questions from a reporter from the magazine New Scientist, who pointed out that the figure from the Nature paper that has now been corrected was partly reproduced with different labels in another paper in another journal, Experimental Hematology (Y. Jiang et al. Exp. Hematol. 30, 896–904; 2002).
In response to the investigation, Nature convened a peer-review panel to analyse the data from the 2002 paper. According to Nature, the experts concluded that although the figure data were flawed, the paper's conclusions are still valid. No allegations of fraud or misconduct have been levelled at Verfaillie or anyone from her group. Verfaillie says her group cannot explain how the errors in the Nature paper occurred: "Why this happened, we have not been able to determine," she says.

Bookmark in Connotea

Patent information could aid replication

Harry Thangaraj of the Oxford Centre for Innovation writes in Nature's Correspondence this week (Nature 447, 638; 2007):
Your News Feature 'The hard copy' (Nature 446, 485–486; 2007) accurately highlights the limited availability of information on stem-cell research methodologies — owing to competition among labs, the commercial value of such information and space restrictions in high-quality journals — which contributes to other labs' inability to replicate and verify the results.
It might sometimes repay scientists to look beyond conventional journals for information, in this or other disciplines, particularly to patents or patent applications. Thanks to the strict enablement requirements of patent law and patent offices in relation to inventions, one can often find more detailed methodology in patent documents than in journals with severe page limits.
A very good example of comprehensive detail in certain non-embryonic stem-cell methodologies is a PCT application WO/2006/028723 (Non-Embryonic Totipotent Blastomer-Like Stem Cells and Methods Therefor), which includes surgical procedures in organ removal, isolation of cells, and composition and preparation of culture media. In this instance, the level of detail and volume of text relating to methodology far exceeds that which many peer-reviewed journals can accommodate.
Some journals publish methodology and protocols online as Supplementary Information to the main paper or in separate publications (an example is Nature Protocols, which encourages user comments). Often, though, journals are only starting points in complex paper trails related to methods. In these circumstances, patent documents could contain the most methodology related to an invention in a single document.

Bookmark in Connotea

A new look for chemical information

In its June Editorial, which is freely available, Nature Chemical Biology (3, 297;2007) reports on new online features to enhance interdisciplinary communication and to increase the accessibility of chemical information for readers.

Most published chemical content is traditionally contained in the schemes, figures and tables of scientific papers. Authors also use abbreviations, acronyms or numbering schemes to identify specific molecules. Though these shorthand notations simplify the presentation of chemical information, they tend to make chemical papers less accessible to the general reader. This is a concern for chemical biology articles, which are intended to attract an interdisciplinary audience. Moreover, since the advent of the Internet, the way by which scientists acquire scientific information has changed. Though some scientists continue to read journal articles in print, most turn to the online HTML and PDF versions of published manuscripts. This expanded use of electronic resources offers an excellent opportunity to make chemical information more accessible and user-friendly to readers of scientific papers.

The Editorial provides details of the resources now available to authors and readers, and asks for your evaluation of what has been done so far, and your 'wish list' for new chemical or biological functionality that will foster communication and collaboration between researchers at the interface of chemistry and biology.

Bookmark in Connotea

Integrating scientific cultures

In a meeting report in the current issue of Molecular Systems Biology (3, 105; 2007), Trey Idecker, Vineet Bafna and Thomas Lemberger write that "a key challenge of systems biology is that it must integrate several disciplines, each with a very different culture for disseminating results. Within biology, manuscripts describing new work are almost always published in peer-reviewed periodicals. In contrast, within computer science and the engineering fields, new methods and results are typically presented as full-length papers at meetings and workshops. Just as journals have editorial boards that handle review of manuscripts, such conferences assemble large and reputable programme committees, which fulfill the same purpose. Publication in the best conferences, as for the best journals, is highly competitive.

This past December, several hundred scientists convened in La Jolla, California for the Second Annual RECOMB Workshop on Systems Biology (December 1–3, 2006). The meeting, which was held jointly with the RECOMB Workshop on Computational Proteomics, took place at the California Institute for Information Technology and Telecommunications in the University of California San Diego campus. RECOMB, which stands for Research in Computational Biology, has for a decade sponsored conferences that attract high-quality papers in bioinformatics, primarily from computer science.

In an effort to integrate the computational and experimental biology communities, RECOMB and Molecular Systems Biology entered into a partnership by which original, peer-reviewed papers are presented orally at the Workshop on Systems Biology and then appear as full-length manuscripts in the pages of the journal. The precise publication model was formulated after much discussion between the editors of the journal and the organizers of RECOMB. It is original and, we hope, will serve as a case study for future conferences."

See the Molecular Systems Biology website for more news of this project.

Bookmark in Connotea

Share your lab notes

Here is the full text of an Editorial in today's Nature (447, 1-2 ;3 May 2007), which is freely available. For further details of the Nature journals' policy on fraud and fabrication, see the Author and Reviewers' website. Comments on this editorial are welcome.

The use of electronic laboratory notebooks should be supported by all concerned.

Too often when errors or cases of fraud occur in science, the lab data required to reconstruct what happened have gone astray. And too often, the co-authors failed to exert due scrutiny on their colleagues' activities in order to prevent such misfortunes. The damage to personal and institutional reputations can be severe and, in rare high-profile cases, public trust can be eroded.

It is therefore in everyone's interest to pre-empt such cases as far as possible. Electronic laboratory notebooks offer a partial solution — and have other advantages too. This is despite the fact that maximizing their benefits will require a change in culture that many researchers will no doubt initially resist.

Continue reading "Share your lab notes" »

Bookmark in Connotea

Nature publishes full methods sections

For most journals, adequate space for methods is taken for granted. Nature now presents a new format to its papers that removes a longstanding shortcoming in this respect. From now on, all Nature papers requiring methods sections will be able to include all the necessary detail.
The full methods are published online only. The printed version contains a summary of up to 300 words, with a reference to the full online version. A key point is that the new online methods sections are not only sufficient for researchers wishing to replicate the work (a longstanding complaint about past Nature papers) but are also integral to the HTML (full text) and online PDF versions of the paper. (For completeness, both online versions also contain the methods summary in the print version.)
One of this week’s (5 April issue) Articles, an exciting paper on targeted fast optical interrogation of neural circuitry, represents the inauguration of this format. If you are thinking of submitting your own work to Nature, you might like to take a look at how these "methods" are displayed in the three versions of the Article: full-text online, PDF online and PDF print. Here is the full-text (HTML) version, in which the full methods run on after the end of the main paper (the paper's references are all together in one list and indexed). Here is the online PDF version, in which the full methods appear at the end of the main paper with their associated references. And if you look at the printed issue: 5 April vol 446, pages 633-669 (2007), you can see that the “full methods” are not there (but readers are directed to the online version).
We are delighted to be able to offer this service to authors. We hope you will be pleased, too.

Bookmark in Connotea

Nature Methods on sharing of software

"An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available to readers promptly on request." This excerpt from our guide to authors may seem obvious, but judging from the number of discussions we have had with authors and referees, we would like to clarify one specific point: at Nature Methods, the definition of "materials, data and associated protocols" includes custom-designed software necessary for the method's implementation. Yet there are several ways of making software available, with various degrees of disclosure and in a choice of formats.
The details are provided in this month's Editorial in Nature Methods.


The Nature Methods editors welcome comments on this policy at Methagora, the Nature Methods blog. We also welcome your views on the application of this policy to other Nature journals.

Bookmark in Connotea

Nature Biotechnology: democratizing proteomics data

Beginning this month (March 2007), Nature Biotechnology is recommending that authors deposit raw data from proteomics and molecular-interaction experiments in a public database before manuscript submission to the journal. The reason for this recommendation? "The lack of raw data sets associated with proteomics and molecular-interaction papers is a long-standing and pernicious problem. It not only stymies the exchange, comparison and reanalysis of experimental results, but also inhibits the development of new algorithms and statistics that could improve the confidence in data and conclusions. In addition, it undermines the ability of referees to fully evaluate the quality of data supporting a manuscript's conclusions, sometimes forcing them to assess results simply on 'good faith'. Contrast this with the situation in genome research and structural biology, where there is an abundance of public data sets from DNA microarrays, genome sequencing and X-ray crystallography studies, and it is not difficult to understand why progress in proteomics has lagged."

For further information and links to the public databases, see the full text of this editorial: Nature Biotechnology 25, 262 (2007); doi:10.1038/nbt0307-262b

See the authors' and peer-reviewers' website for a description of the Nature journals' policies on data and materials availability, with links to relevant editorials.

As ever, we welcome your feedback and comments.

Bookmark in Connotea

Methods in full

From now on, Nature authors will be able to include more experimental details in their papers.

When in 1960 Theodor Maiman reported the creation of the laser, he did so in about 300 words. Most of these were about the principles. The experiment was described in two sentences (see Nature 187, 493–494; 1960).

Until now, Nature's style of research papers — although more generous in the space allowed than it once was — has been grounded in this telegraphic tradition, allowing comparatively little space for experimental detail. Consequently, with the advent of the Internet, the supplementary material published online has grown voluminous, and nearly ubiquitous — appended to every Article and Letter in this week's issue, for example. And some of it isn't supplementary at all — it is essential for anyone trying to replicate the work.

We have now taken steps to do better justice to what authors have to say, by letting them present full experimental methods as an integral part of their paper. It is clear that more and more people read papers only in their online versions. So we are expanding the online versions of our Articles and Letters, while condensing some of the technical detail in the printed version.

To be specific: in those papers requiring a separate methods section, the online version of the paper will allow authors to include enough detail to satisfy their peers. This is not a 'supplementary' methods file, but a component of the paper, with all the virtues of full-text linking and functionality. It will appear in all online versions, including the authors' versions of papers that can be loaded into PubMed Central and other open-access repositories six months after publication.

But Nature also rejoices in being a print publication. We have no wish to leave print readers lacking sufficient understanding of what was done to appreciate the authors' achievements. Accordingly, the print version will include a 300-word summary of the methods. This will also appear in the online version.

Norman Lockyer, the founding editor of Nature, might well deplore the loss of brevity in today's scientific reports. But our authors should bear in mind that readers still value succinctness — and that Nature's editors and copy-editors will continue to insist on it.

Republished from: Nature 445, 684 (15 February 2007) | doi:10.1038/445684a; Published online 14 February 2007

We welcome comments on this format for methods sections.

Bookmark in Connotea

Biology databases go wiki

Jim Giles reports in Nature's news pages this week about a collaborative wiki approach for sharing biological information. From the article: "Barend Mons's first objective would be ambitious enough for most people: to meld some of the most important biomedical databases into a single information resource. But that's just the beginning. Mons, a bioinformatician at the Erasmus Medical Centre in Rotterdam, the Netherlands, also wants to apply the Wikipedia philosophy. He's inviting the whole research community to help update a vast store of interlinked data. If he and his colleagues can pull it off — and even the project's advocates are not sure they can — they could transform the databases that are central to the work of many life scientists.
A test version of the project, provisionally dubbed Wiki for Professionals , is due to launch in the next month."
The rest of the story is at the news @ nature.com site, where readers can add their comments online.

Bookmark in Connotea

For how long should data be archived?

Q. Dear Nature editors

I am a graduate student pursuing masters in biotechnology at a university. I am taking "Ethics and Professionalism Course" which is dealing with ethics, record keeping, laboratory notebook and paper publishing. I have been assigned a course work to inquire from one of the NATURE editors "What would be the reasonable time limit for keeping the data of a published paper ?".

Could you please spare some time to reply me and give me your opinion of what would be a reasonable time limit for keeping the data (record keeping) of a published paper in your journal.

I really appreciate your help in doing my course work.

A. Dear graduate student

Permanently.

Yours sincerely
Nature

Bookmark in Connotea

Community consultation at Nature Biotechnology

Following the MIAME (minimal information about microarray experiments) standards for reporting microarray data, various scientific communities are engaged in producing similar guidelines. Some of these standards papers are under consideration for publication in Nature Biotechnology. Because data-reporting standards are only as useful as the community finds them, we want to know what you think. The papers are freely available for at least a month, and your comments are welcome. New papers are added to the list on the Nature Biotechnology website as they become available, with a commenting facility for each one. The papers cover standards for 'minimum information about a genome sequence', proteomics, protein modification data, mass spectrometry, mass spectrometry informatics, gel electrophoresis, functional genomics, molecular interactions, and in situ hybridization and immunofluorescence experiments.

The Nature journals' policy on data availability, including MIAME standards, is available at the author and referees' website.

Bookmark in Connotea

The figure police

Juan-Carlos Lopez, Chief Editor of Nature Medicine, discusses the question of data integrity of figures in his post on Spoonful of Medicine: The figure police.

Juan-Carlos discusses an editorial in the Journal of Cell Biology, which takes the line that "the progress of science depends on the reliability of the entire published record, and journal editors must do their part to ensure that reliability", urging editors to "participate in this dialogue with the scientific community, to help devise effective and practical standards that can be applied to the published literature".

Should scientific journals screen every image in every paper, as the Journal of Cell Biology editorial recommends? Or is a spot-checking system, such as used by the Nature journals, preferable, on the grounds that the vast majority of the papers published are not fraudulent, and that the journal could invest more usefully in other author and publication services? (This last point is particularly critical for small, society-owned journals that have limited resources.) Or is the responsibility that all research work is honest, and that the papers produced accurately reflect the work done, that of the scientific institution and/or the funder?

The Nature journals' policies on image integrity can be found at our Author and Referees' website.

Bookmark in Connotea

RSCC plots

In tomorrow's Nature (14 December issue), Bernard Rupp suggests in Correspondence that peer-reviewers can judge the quality of structures by using the "RSCC" plot, rather than requiring the full coordinates. Prof Rupp says: "Generated as a part of validation during structure deposition, these plots can be produced without any additional work by authors. The plots can be provided with the manuscript or as supplemental material to convince reviewers of the model quality in critical areas, without forcing authors to reveal coordinates and structure factors prematurely."
Nature welcomes comments and opinions from authors and readers about the suggestion to make provision of RSCC plots mandatory at the submission stage. Please let us know your views in the comments section to this post.
Prof Rupp's letter is reproduced in the full version of this post.

Continue reading "RSCC plots" »

Bookmark in Connotea

Policies on data fabrication

There is much comment in Nature and elsewhere this week about the two fraudulent stem-cell papers published in and retracted from the journal Science. Science's editors commissioned an external committee to report on its handling of the papers, in the light of which the journal is likely to begin extra scrutiny for "high risk" papers. For more details, see Nature's news story in the current issue (vol 444, pp 658-659; 7 December 2006), and this special feature.

Other opinions about the panel's report can be seen in a statement by Don Kennedy, Editor-in-Chief of Science, at the journal's website; at the weblog Nobel Intent; and in The Scientist's online news service. The report itself can be seen here.

Inevitably, much of the discussion centres on the role of journals and peer-reviewers in their combined ability to detect fabricated results. Everyone would agree that published papers have to be entirely above suspicion. But what of the authors' perspective -- how much data, methodology or calculation is necessary to provide a convincing case for a conclusion? Especially in fast-moving fields such as stem-cell research, how much time and effort is needed to accumulate such evidence and submit it to a journal -- a journal that may have to decline 90 per cent of submitted papers?

The Nature journals' policies on data availability can be seen here. As we scrutinize these policies in the light of current events, we welcome suggestions from authors, past, present or future, as to what you believe to be reasonable for a journal to demand to ensure that conclusions are solid. Please make your suggestions in the comments to this posting.