Editorial: Beyond the printed page

[‘Cross-posted’ from issue 3, this is our editorial that explains some of the innovations in the HTML versions of our articles. We’d really like as much feedback as possible, so comment away! Apologies for any formatting weirdness]

The publication of scientific discoveries remained tied to ink and paper for over 300 years, but the rise of the internet over the past few decades has transformed scholarly communication. Just how far this revolution can go depends not just on publishers, but on authors and readers too.

Although much has changed since 1665, when the first issue of The Philosophical Transactions of the Royal Society was published, the basic unit of scientific communication is not all that different. The role of an editor at Nature Chemistry is one that Henry Oldenburg, the first editor of Phil. Trans., would probably recognise. But now that most people read journal articles that they have downloaded from the web, rather than pulled down from a shelf, the article itself — not just its delivery — is on the verge of major changes. Some of these are linked to the vision of Tim Berners-Lee and others for the future of the internet — the semantic web[1, 2]. In this concept, information is labelled in such a way that computers can understand what it is, rather than just humans, as is generally the case at the moment.

Rather than the HTML version of the article being a narrow reflection of the printed page, it can offer enhancements beyond clicking to bring up figures or references — which simply mimics how people can flick through hardcopies anyway. Enhancements that further enrich articles are already being offered by other publishers, including the Royal Society of Chemistry with Project Prospect3. Among other services, this highlights words that are terms in the IUPAC Gold Book and links to their definitions. Beyond simply being an educational tool, this also means that papers on similar concepts are linked together. Apart from the fledgling ChemSpider Journal of Chemistry4, there are few other publishers exploiting the full potential of their online articles. Although the American Chemical Society are testing several interesting and useful innovations on the JACS-beta website5, such as downloadable PowerPoint and ChemDraw files, none of these so far enriches the text of the articles.

Nature Chemistry also offers a number of online enhancements that complement the traditional paper journal. For the large majority of numbered chemical compounds in research articles, a separate compound information web page is available (an example can be found here) that can be accessed by clicking on the bold compound number — even in the PDF file. These compound pages include information such as molecular weight and synonyms, as well as an interactive 3D model of the molecular structure that readers can manipulate. Chemical identifiers, such as InChIs[6] and SMILES[7] strings, are also included on these pages. These alphanumeric identifiers are machine-readable and can be used in databases and by publishers and chemists to identify and search for chemical compounds. Each numbered chemical compound for any given article is also deposited in the National Institutes of Health PubChem database and a link to the relevant record is included on the compound information page.

Downloadable ChemDraw files for the structures are available for each individual compound, and the compound pages are grouped together on an article-by-article and issue-by-issue basis. A single ChemDraw file containing all of the structures from a particular paper can be downloaded; see an example. An example of a compound round-up page for an issue can be found here. All of this gives the reader quick access to much more information, as well as making the individual compounds more visible to the wider chemistry community. As with Supplementary Information, the chemical compound pages are freely available on our website.

A number of enhancements to the text of research articles have also been implemented. A pop-up box containing the chemical structure appears when the mouse cursor hovers over a bold compound number. Other, non-numbered chemical names can be highlighted by clicking the ‘Show compounds’ link in the right-hand navigation section of an article page. By clicking on a highlighted compound, a pop-up box appears that displays links to free databases where more information about the compound can be found. At present, Nature Chemistry is linking to the PubChem[8] and ChemSpider[9] databases. These databases not only display the structure and predicted properties of chemical entities but provide access to other chemical literature by either linking directly to articles or via other databases. There is potential for this process to be used to highlight other families of entities, such as compound classes and reaction types.

Of course, this is only the beginning. Where, and how far, this endeavour can go also depends on the authors and readers of the articles. Individual compound pages already link to compound data such as CIF files for X-ray structures (an example). We encourage authors to submit other types of primary data associated with techniques such as NMR spectroscopy or mass spectrometry so that it can be displayed or linked from their papers. Although ‘flat’ images of experimental data in supplementary material associated with a paper certainly serve an important purpose, the actual data is far more searchable. This is not an effort to detect scientific fraud (data can be faked just as images can), but an opportunity to enrich scientific publications.

How readers will use such resources remains to be seen, but your feedback is welcome in helping us guide our efforts. A blog post that mentions these thoughts is housed on ‘The Sceptical Chymist'[10] and we encourage you to leave comments.


    Tobias Kind said:

    Hi Neil,

    I like the three posted Nature compound annotations:

    My comments:

    1) The PubChem links and InChI and InChIKey enrichments from the three NATURE links above need to cover all Nature journals, not only Nature Chemical Biology and Nature Chemistry.

    I like the multi-format options.

    2) Reactions are not covered in this way, this could be done using SMARTS or CDX, CML or any other common RXN format as the documented MDL RDF format.

    3) Forget SMILES. I don’t need to make a case, SMILES are good for inhouse use, nothing else. I use SMILES often, but not for external communications outside my cheminformatics package.

    MDL MOL, CML, INCHI, PUBCHEM CID are the way to go.

    4) Besides your epic fight against PLOS, see our PLOS ONE article, How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry; where we mention the substance annotations as positive examplesfor semantic (or at least chemical structure) enrichment.

    The problem with PLOS is that it is not a chemistry journal, therefore few at PLOS are interested in strucuture enrichment or chemical semantic enrichment as in RSC or Nature. The PLOS editors and staff are friendly and helpful, we had a very good experience, but in terms of chemistry annotations and semanticd RSC and Nature are far far ahead.

    5) I don’t care about PowerPoint and ChemDraw files, assume I dont use PPT and ChemDraw under LINUX. Open exchange formats are the way to go. PowerPoint and ChemDraw formats are convinient but not open.

    6) Open Access does not mean better semantic chemistry enrichments, RSC and Nature still lead here. Journal of Cheminformatics or Beilstein Journal of Organic Chemistry both OA have zero annotation of 1) structures 2) reactions 3) molecular spectra.

    Very sad.

    7) Chemists in general (99%) do not care about SEMANTICS. Prove me wrong. If you (the external reader) care, please check the power law and the long tail

    Meaning, if you read this blog and this comment: “You care”.

    8) Nature clearly has the power to lead by example, Nature does not need to beg “encourage authors to submit other types of primary data associated with techniques such as NMR spectroscopy or mass spectrometry”

    Nature journals have the power to mandate the submission of machine readable structure and reactions data and all molecular spectra in machine readable format.

    WHY? Due to fact, that reputation and Impact Factors play a big role in chemistry. Please prove me wrong.

    9) I like and read several ACS journals and their new features but in terms of semantic and structure and spectra annotations are mediocre compared to RSC and Nature.

    NMR and MS spectra in text based format and the whole stuff packed into bitmap PDF. Its destroyed information, hamburger to cow. The SciFinder links just put a second barrier in, first the subscription based article and second the subscription based SciFinder.

    I am a user and fan of SciFinder, but assume you don’t have SciFinder and you need the structures and reactions and spectra of an ACS paper, you are basically stranded. Due to increased number of publications the SciFinder compound annotations also dropped tremendously, I find many journals without structure annotations in SciFinder. Again ACS should follow the inevitable and submit structures to PubChem/Chemspider and mandate submission in machine readable format for their own sake and for the advancement of science and chemistry.

    Well lets talk about Don Quixote’s Sancho Pancha.

    10) It basically sucks that except for the RSC all major chemical societies stumble behind in terms of semantic resesarch. That includes the chemical societies of the leading chemistry nations, Germany, USA, Japan, France, China. Please prove me wrong.

    Please be reminded that the RSC (Royal Society of Chemistry) is based in the UK and Nature Publishing group is part of Macmillan in the US which is a group of publishing companies in the United States held by Verlagsgruppe Georg von Holtzbrinck (Germany).

    What an irony…

    11) There are multiple projects and researchers across the globe which are actively involved in semantic research for chemistry, please read my comments with care and yes I am aware of the power law and the long tail.

    Kind regards

    Tobias Kind

    Neil said:

    Hi Tobias,

    Many thanks indeed for your comments – definitely some food for thought! Some of the points you raise we’re aware of and working on, others we’ll have to think about.

    April Lorier said:

    From reading words on slivers of trees to interpreting digits through pixels of light, our viewing of data has certainly evolved! I pray for the continued movement of nature’s elements, and the subsequent energy produced, without which we would all be staring at black screens. Informative post, Neil, and great, comprehensive comment, Tobias!

    Noel O'Boyle said:

    The RSS feed for the Project Prospect enhanced articles contains the InChIs for the molecules. It would be great if Nature could include chemical information similarly in their RSS feed. Making a summary HTML page is all very well, but if you put it in the RSS feed then it’s much more machine-readable.

    I look forward with interest.

    Neil said:

    Thanks for the suggestion Noel – and for the link to your blog.