Crystal clear data

A change in one of our publication policies had been brewing for a while at the journal — and I’m happy to say that it has now been implemented: we have updated our requirements regarding the crystallographic characterization of small molecules. This is reflected in our guide to authors.

Until now we had been asking authors to provide a standard crystallographic information file (CIF) for each new structure characterized by X-ray diffraction analysis. This file doesn’t represent the complete story though; other experimental and refinement information such as the structure factors (HKL or FCF files) and RES files also exist.

A few years ago, in an article on their website entitled ‘Publication standards for crystal structures’, the International Union of Crystallography (IUCr) recommended that for each newly determined structure, not only should the CIF file be provided, but also the corresponding structure-factor information — this has long been a requirement for the IUCr’s own journals. The structure factors are important for the structure determination and so should be available during the peer-review process, and may also be used by readers interested in the refinement of the structure once the paper has been published.

Endorsing this stance, the Cambridge Structural Database (CSD) has enabled the deposition of structure factors together with the main CIFs, and has also been making crystallographic information increasingly easy to access. One now only needs to be armed with a CCDC number, a CSD code, or the DOI of the paper in which the structure was reported, to acquire the desired crystallographic details (through https://summary.ccdc.cam.ac.uk/structure-summary-form).

We’re happy to have now adopted this practice. We are asking that manuscripts reporting new crystal structures be accompanied by CIF files and associated information about structure factors.

The best — or easiest — way to do so is to use an up-to-date version of SHELXL (2014 or later) which now embeds the HKL and RES files into the generated CIF (other programs may also do this, I’m not sure). If another program has been used, then the CIF and the structure-factor file (in HKL and/or FCF format) can be handled as two separate files.

Our other requirements haven’t changed; we’re still asking that an ORTEP-style illustration of the structure, with probability ellipsoids, appears in the main Supplementary Information. And CIF files — including structure factors — should be run through the IUCr’s free online CheckCIF routine, the output submitted with the manuscript files (these are just used during the reviewing process, we don’t host them with the published paper), and any A- or B-level alerts that come up be explained in the Supplementary Information file.

In due course we host all this crystallographic raw data with the Supplementary Information of the associated paper — see, for example, this recent paper (Nature Chemistry, 6, 1079–1083 (2014) here) — and hope that it proves useful to the crystallographic community.

Nature Chemistry’s 2014 impact factor citation distribution

As pointed out yesterday in a blog post by Stephen Curry (and indeed in at least one previous blog post), some journals publish their citation distributions (this has also been blogged about by Steve Royle too – and probably by many others that I’m not aware of, I’m sure). I’ve been interested in doing this for Nature Chemistry for a while now, but have never quite found the time – but after a brief exchange on Twitter this afternoon, I figured I should run the numbers… (what better way to spend a Friday evening?!).

So, according to Journal Citation Reports (JCR) from Thomson Reuters, the 2014 impact factor (announced in 2015) for Nature Chemistry was 25.325. How do they arrive at this number? Well, they count up how many times articles published in the journal in 2012 and 2013 were cited in 2014 and then divide that total by the number of ‘citable items’ (more on that later) that the journal published in 2012 and 2013. So, according to JCR, 2012/2013 content in Nature Chemistry was cited 6,458 times in 2014 and we published a grand total of 255 citable items in 2012/2013. Divide 6,458 by 255 and you get 25.325. Simple, eh?

Well, no. If you do a Web of Science (All Databases) search for Nature Chemistry for 2012-2013, you find that we actually published 451 items in those 2 years. There were 239 research papers (we call them ‘Articles’), 16 review-type articles (long ones we call ‘Reviews’ and shorter ones we call ‘Perspectives’), as well as Editorials, Commentaries, Research Highlights, News & Views articles and other ‘front-half’ material – all adding up to a total of 451 articles. It is only the Articles, Reviews and Perspectives (255 items) that count as citable items, however. What does this mean? It means that although the bottom half of the impact factor equation described above only includes these article types, citations to any of the journal content (including News & Views, Editorials, Commentaries, etc.) get counted in the top-half of the equation.

If you look at those 451 items in Web of Science, in 2014 they received a total of 6,402 citations (that’s already 56 fewer than the 6,458 used in the JCR impact factor calculation – so those extra 56 must be being pulled in from some other database by JCR). Of those 6,402 citations that are in Web of Science, Articles received 4,852 citations, Reviews/Perspectives received 1,206 citations and all the other front-half articles garnered a total of 344 citations, so the distribution of citations between different content types breaks down like this:

citation_breakdown

Now, just looking at the Articles and Reviews/Perspectives, we have a total of 255 items with 6,058 citations (we’re ignoring those 344 citations to other stuff) in 2014. That gives you an average of 23.8 citations (6,058 divided by 255) per Article/Review/Perspective. Of course, that is an average, and this is where citation distributions come in. If you list the articles in order from most cited to least cited and then plot article number versus citations, you get something that looks like this:

citable_items_decay

The most cited article is a Review (that was published in 2013) with 354 citations in 2014. Article number 2 on the list is a 2013 Perspective with 171 citations in 2014… and then we head to the end of the list where the 253rd, 254th and 255th articles all received 0 citations in 2014. That’s one way of plotting the data, but perhaps not the most useful. Another way to do it is shown below, whereby articles are put in bins defined by the number of citations received in 2014.

citable_items_distrubution

So that the graph is still meaningful, I lumped all of the 100+ citation papers into one bin at the end (a breakdown of what is included in there is shown on the graph). The official 2014 impact factor (25.3) is highlighted, along with the mean number of citations these article types actually received (23.8 – i.e., not inflated by the 344 citations included in the impact factor calculation that were actually cites to other content) as well as the median value too, which is 16. Only 29% of Articles/Reviews/Perspectives (that’s 73 of the 255) received more citations (26 or more) in 2014 than the calculated impact factor of the journal (25.3). The vast majority of articles received fewer citations (no more than 25) than the impact factor.

It’s well known that review-type articles are typically cited more than research papers (in chemistry at least and probably in other subjects too, I imagine) and so I repeated the analysis with just the research papers (the Articles) and left out the Reviews and Perspectives. The article number vs citations plot now looks like this:

articles_decay

The shape looks quite similar to the graph further up this post, but note that the scale on the y-axis is quite different. The highest-cited research paper was cited 151 times in 2014, with the 2nd, 3rd, 4th and 5th-placed Articles receiving 114, 113, 105 and 84 citations, respectively. If we plot the citation distribution, we get the following:

articles_distribution

After removing the Reviews and Perspectives from the equation, the mean number of citations received by just the research papers is now 20.3 rather than 23.8 (a drop of 3.5) and the median has dropped from 16 to 15. Only 61 of the 239 Articles (that’s 26%) received more citations (26 or more) in 2014 than the calculated impact factor of the journal; roughly three-quarters of all research papers received fewer. If you consider 20.3 to be the pure Article ‘impact factor’, this is still a very skewed metric, however. Of the 239 Articles, 86 of them (36%) received 21 or more citations in 2014 and the rest were cited 20 times or fewer.

When the 2015 impact factors get released in 2016, we’ll run the numbers again and compare the data to what’s above to see if anything has changed all that much.

Avoiding redundant tautologies in scientific writing

This is a guest post from Reuben Hudson at Colby College in response to one of Michelle Francl‘s recent Thesis columns.

—————

Chemists communicate with a lexicon rife with double endendres [ref. 1]. Some of our words take on new meanings after appropriation from general vocabulary and certainly our words cross into the public sphere with a similar alteration of the intended meaning, often resulting in humorous or nonsensical interpretations. Despite our urge for vigorous [ref. 2], concise [refs 3,4], and clearly understandable prose [ref. 5], Michelle Francl [ref. 1] suggests that we not avoid all ambiguous language ‘for it gives chemists a rich set of images to draw on, and as such, we shouldn’t discourage it, for we can’t look for what our language doesn’t let us imagine.’ I agree whole-heartedly with her encouragement to use, when appropriate, single phrases with multiple meanings, and take this opportunity to point out the equally common, seemingly opposite practice in the chemical literature of incorporating multiple, redundant inferences of the same meaning in a single phrase.

Redundancies are a part of quality science. Elegant reproduction can build a compelling argument. Reiteration of a thesis strengthens rhetoric. Unintentionally repeating again the same point, however, is a sign of ineptitude and detracts from effective communication.

Tautologies (redundancies for lack of style) can arise as a result of an incomplete understanding. Such is often the case with bilingual acronyms, where the acronym itself is retained, but the meaning clearly lost in translation, a laughable and excusable miscue. Consider ‘le protocol IP’ from French computer science (internet protocol protocol). Without the crutch of an improper translation, other redundant acronyms become more laughable and less excusable. Biologists first introduced the term, ‘HIV virus’ (human immunodeficiency virus virus), while physicists brought us LASER light (light amplification by stimulated emission of radiation light). Chemists are perhaps the worst when it comes to tautological acronyms. Any student of organic chemistry will remember one of the cornerstone reactions: the SN2 substitution (guess what ‘S’ represents). The CDC coupling reaction (cross dehydrogenative coupling coupling), a new innovation rolled out by green chemists, is a halogen-free means of carbon–carbon bond formation.

To this point, the discussion has ostensibly focused only on redundant acronyms. The careful reader will have also noticed the equally egregious use of tautological phrases within this very post, several of which see frequent use in scientific publications. An innovation is, by definition, something new. It is therefore tautological to say, ‘new innovation.’ An introduction is the first time something is presented. Thus, ‘first introduced’ is redundant for lack of style. Repeat means to say again, so it is superfluous to say, ‘repeat again.’ The title of this post is also tautological.

References

1. Francl, M. Nature Chem. 7, 533–534 (2015). [LINK]
2. Patience, P. A., Patience, G. S., Boffito, D. C. Can. J. Chem. Eng. 93, 2095–2097 (2015). [LINK]
3. Hudson, R. J. Chem. Educ. 90, 1580 (2013). [LINK]
4. Carr, J. M. J. Chem. Educ. 90, 751−754 (2013). [LINK]
5. Stewart, A. F. et al. J. Chem. Educ. doi:10.1021/acs.jchemed.5b00373 (2015). [LINK]