Alberto Cairo on storytelling in science communication

Alberto Cairo responds to a Correspondence criticising the use of storytelling techniques in scientific research articles and journalism.

Nature Methods’ August Points of View article by Alberto Cairo and Martin Krzywinski described how to use techniques of storytelling to design better scientific figures. That article prompted a passionate response from Yarden Katz arguing that storytelling has no place in scientific articles. Cairo and Krzywinski respond that their article was overinterpreted. This exchange prompted us to argue in the November Editorial that storytelling serves an important role when used properly.

In this guest post, Alberto Cairo expands on their printed response.

Alberto CairoYarden Katz’s thoughtful response to our short column about visual storytelling techniques in science communication makes many cogent observations. We will use them as a starting point for a deeper discussion of the contents of the column itself.

First of all, Katz sees too much in our words. As explained in our published response to the Correspondence by Katz, we didn’t advocate for the use of storytelling to drive experiments. That is a very legitimate concern, but it was not our goal to promote this idea, so we won’t comment further on it.

Second, Katz presents an incomplete image of what storytelling and journalism are. He says that “great storytellers embellish and conceal information as necessary to evoke a response in their audience. Inconvenient truths are swept away while marginalities are amplified or spun to make a point more spectacular.” This is a rather bold claim that may be guilty of the same malady it denounces. It highlights the worst and obscures the best to be emotionally powerful.

It is true that many journalists begin with a preconceived idea—a narrative structure—and then choose the data which better fit it. They cherry-pick evidence to make a stronger and clearer point. They magnify outliers without mentioning the overwhelming prevalence of average values. This is the problem Christopher Chabris has identified in the work of famous journalist Malcolm Gladwell, in a recent long article (1).

This is not the approach we were trying to explain in our column. Proceeding this way, as Katz wrote, is wrong, and it is as wrong in science as it is in journalistic storytelling.

Moreover, we would like to remind Katz that there’s a long-rooted tradition in journalism that tries to stick to standards of truth which are close to those used in science. It was defined forty years ago by professor Philip Meyer, from the University of North Carolina at Chapel Hill, as “precision journalism”. Precision journalism consists of the use of social science research techniques in news reporting: Surveys, statistics, data analysis, visualization, etc. In the best of the worlds, all journalism should be based on a careful evaluation of data and evidence, but precision journalism tried to elevate the standards of what proper evidence really is, even considering the pressures and tight deadlines journalists need to endure and meet.

That tradition has mutated into different branches of journalism that overlap greatly: Computer-assisted reporting (CAR), and data-driven journalism (2) among them. The most famous exponent of this tradition nowadays is Nate Silver, author of the blog FiveThirtyEight who, using mainly Bayesian techniques, correctly predicted the results of several elections (3).

What is the method of journalists—storytellers—in these areas? They don’t pitch an idea and then try to find the best data to support and embellish it. Ideally, they may begin with a fuzzy notion of what they want to focus on, and then they collect evidence systematically and let stories emerge from it. These stories may be completely opposite to the notion they had in mind at the beginning. Finally, they write those stories or, as we suggested in our column, they visualize them, in many cases with the close advice of experts in the areas they are covering (4). This is the storytelling tradition we were thinking about when writing our column.

Another point that we made is that the techniques described are helpful mainly when researchers need to communicate with non-specialized audiences. Journalists and storytellers are aware that people cannot absorb large amounts of information at once, and that in many cases they lack the background necessary to understand complex scientific research. As we wrote, “inviting readers to draw their own conclusions is risky because even simple messages can hide in simple data sets.”

However, and this is a critical point, nothing impedes researchers or journalists to present two or more competing interpretations when they are equally founded on evidence or there’s great uncertainty. Or to first present their main conclusions in the form of an evidence-based visual story, a narrative or, at least, a compelling composition—not all information can be framed as a story, after all—and then let those readers interested in exploring the multiple nuances or angles of an investigation access the data gathered and analyzed for it. This is something data journalists do today (5).

Any of those approaches would help avoid the challenge correctly pointed out by Katz: “complex experiments afford multiple interpretations and so such deviances from the singular narrative must be present somewhere.” Indeed. Just not at the first level of the presentation. To communicate effectively, information needs to be layered and sequenced in a way that can be processed correctly by audiences (6) while respecting all its nuances. For good examples of journalistic work that is both engaging and evidence-based, see the books by David Quammen, Carl Zimmer, or David Dobbs.

And it’s not just journalists who embrace this particular kind of storytelling technique. Many scientists do, too. As a recent example, take Michael E. Mann’s The Hockey Stick and the Climate Wars: Dispatches from the Front Lines, a book that presents the evidence for global warming in the form of a narrative that is deep, rich, and captivating at the same time.

I’d like to conclude by quoting the words by the Yale University professor Robert P. Abelson that we included in our column. In his most popular book, Statistics as Principled Argument (1995), Abelson wrote that he used to ask his students “If your study were reported in the newspaper, what would the headline be?” That doesn’t mean that this headline is the only element that should be reported. Rather, it means that it should be the first element to be reported, followed by a discourse based on—to borrow Katz’s beautiful description—”evidence and arguments that are used—with varying degrees of certainty—to support models and theories.” This would be a discourse that is interesting to read and that thoroughly respects the integrity and the complexity of the underlying data. Therefore, we believe that storytelling, if carefully handled, can be compatible with the framing for presenting scientific results Katz outlines.

Footnotes
(1) See https://blog.chabris.com/2013/10/why-malcolm-gladwell-matters-and-why.html

(2) The academic literature in communication studies and journalism has not reached an agreement on how these categories should be defined. Basically CAR focuses on the use of data and databases to inform traditional reporting work (writing and speaking). Data-driven journalism expands the scope to include also the design of tools for readers to explore data, such as visualizations, mobile apps, etc.

(3) Silver’s blog used to be hosted by The New York Times. It has recently moved to ESPN.

(4) Journalists are, by tradition and training, jack-of-all-trades, even those who specialize in research, statistics, and computing.

(5) ProPublica and Texas Tribune, for instance, are two independent, non-profit investigative journalism organizations which frame their projects as stories, but then they usually let readers access the databases they put together and analyzed.

(6) Multiple recent books warn against the dangers of storytelling, cognitive biases, and patternicity, the tendency to see patterns where none exist. Arguably, the most popular ones are Kahneman (2011) and Shermer (2012). However, both authors also concede that we humans love stories, and we understand complicated information better if it can be presented as a story. So why not take advantage of that feature if we are conscious of its possible shortcomings?

REFERENCES
Abelson, Robert P. (1995) Statistics as Principled Argument. Psychology Press.

Kahneman, Daniel (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux

Mann, Michael E. (2012) The Hockey Stick and the Climate Wars: Dispatches from the Front Lines. Columbia University Press.

Meyer, Philip (1973). Precision Journalism: A Reporter’s Introduction to Social Science Methods. Indiana University Press.

Shermer, Michael (2011). The Believing Brain: From Ghosts and Gods to Politics and Conspiracies—How We Construct Beliefs and Reinforce Them as Truths. Times Books.

Silver, Nate (2012). The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t. Penguin Press.

The dos and don’ts of communicating with editors and reviewers

Some thoughts and advice from the editors at Nature Methods on communicating with us and our reviewers, particularly on matters of disagreement.

In the over nine years that we at Nature Methods have been interacting with authors and reviewers we have experienced a great variety of communication strategies. Some work well…others don’t. In our October Editorial we discuss how neglecting to word criticism productively can undermine the value of the criticism and short-circuit this critical aspect of scientific discourse.

In the three posts that follow we provide practical advice for communicating with editors and reviewers during three critical steps of the publication process. These are: the cover letter, the rebuttal letter and the appeal letter. We hope you find these guides useful and encourage readers to comment on the points made and suggest dos and don’ts of their own.

How to write a cover letter
How to write a rebuttal letter
How to write an appeal letter

Update: It has been suggested that we write a dos and don’ts for reviewers. We agree this could be just as useful for improving the peer review process, possibly more so, and hope to be able to provide this soon.

A retraction resulting from cell line contamination

After nine years in print, Nature Methods today published its first retraction; one that could have been prevented by cell line authentication. What does this mean for journal-mandated cell line testing?

Gliomasphere image

Two-photon fluorescence image of live primary gliomasphere from retracted manuscript.

In a Nature Methods paper published in 2010, Ivan Radovanovic and colleagues described a method to isolate cancer-initiating cells in human glioma without the need for molecular markers. Based on morphology and on a green autofluorescence, the authors reported they could use FACS to sort cancer-initiating cells from gliomasphere cultures (which had been derived from primary tumors). They also detected autofluorescence in cells from fresh glioma specimens, but at a much lower level.

Cells from the autofluorescent fraction could self renew clonogenically in vitro and were tumorigenic when transplanted into mouse brains, the authors reported, and in both cases performed better than non-autofluorescent cells from the rest of the culture or tissue. The origin of this autofluorescent signal was not understood at the time. The authors speculated it may have been related to the unique metabolism of the cancer-initiating cells.

It turns out that most of the primary gliomasphere lines (7 out of 10) were contaminated with HEK cells expressing GFP, leading to retraction of the paper. Using short-tandem-repeat (STR) profiling of two of the lines the authors determined that the contamination occurred over the course of culture in the lab: samples taken from early passages match the original tissue from which the lines were derived, but later passages no longer do so.

It is hardly surprising that the first retraction in Nature Methods is due to cell line contamination, a well acknowledged problem. A 2009 Editorial in Nature pointed to the disturbing results of cell testing by repositories which indicated that 18-36% of cultures were misidentified. It called on repositories to authenticate all of their lines, and for major funders to provide testing support to grantees. At that point funders could require cell line validation for investigators to retain funding, and Nature would require that all immortalized lines used in a paper were verified before publication. Unfortunately, it is now 2013 and we are still far from this goal.

But progress is being made. Community-based efforts are alerting researchers to this problem and providing resources to help them avoid being misled by erroneous results caused by cell line contamination. A 2012 Correspondence in Nature by John R. Masters on behalf of the International Cell Line Authentication Committee (ICLAC) pointed to the following resources available to researchers:

Please go to the ICLAC website for the most recent version of each of these documents.

Meanwhile in early 2013, at the publication end of the process, the Nature journals published coordinated editorials announcing a reproducibility initiative and stating that “…authors will need to […] provide precise characterization of key reagents that may be subject to biological variability, such as cell lines and antibodies.” In practice, the Nature journals are currently requiring all authors to state whether or not testing was done but are only requiring testing in cases where it makes particular sense.

Advocates for mandatory testing have cogent arguments for a uniform mandatory testing policy. First, it would avoid sending a confusing message; second, researchers can’t be certain that cell identity or mycoplasma contamination aren’t affecting results; and finally, continued publication of inaccurate species and tissue designations of misidentified cell lines continues to propagate misinformation.

In the work described in the retracted 2010 manuscript from Radovanovic and colleagues mandatory testing would certainly have been beneficial. However, for probably the majority of work published by Nature Methods there is no question that testing would have no impact on the reported results. For example, in 2011 and 2012 we published at least 17 manuscripts reporting new fluorescence microscopy methods and using imaging data from cell lines to assess the performance of the techniques in measuring fundamental cell properties such as the appearance and width of actin or microtubule filaments, membrane vesicles or other universal cellular structures. Cell line identity and even mycoplasma contamination would not impact the efficacy or conclusions of these measurements. This same situation exists for the validation and testing of many methods in other research disciplines such as proteomics, genomics and biophysics.

Even if these labs should be doing cell validation and mycoplasma testing as a matter of course as part of proper cell culture procedure, mandating that all these studies include such testing as a requirement for publication is unjustified.

But clearly even our most recent efforts at improving compliance with good testing practice will not be sufficient to eliminate cell contamination as a problem in work published in Nature journals. A possible solution may be to require testing by default but authors would be permitted to argue why, in their case, testing is clearly unnecessary. Editors (possibly with reviewer input) would be the final arbiters and would need to ensure that although the lines must be named and sourced, no species or tissue identifiers should be included in the manuscript in the absence of proper validation.

Technology development labs or others that only use cell lines for purposes distinct from biological investigation could continue to avoid testing. But any lab that might potentially use their cell lines to obtain biological results would know that they should institute a proper testing regimen or risk their work not being publishable in a Nature journal.

At this point this is only an idea based on our experience at Nature Methods. We encourage the community to comment and let us know what they think.

Let’s give statistics the attention it deserves

This month we launch a new column ‘Points of Significance’ devoted to statistics, a topic of profound importance for biological research, but one that often doesn’t receive the attention it deserves.

For the past three years Nature Methods has been publishing the Points of View column, one page a month dedicated to practical advice for researchers on how to create accessible and accurate visualizations of their data. The response to the column articles has been fantastic and most recently we organized them by topic here on our blog.

Unfortunately, a truth about data visualization is that no matter how good the visualization, if the experiment wasn’t appropriately designed and the data wasn’t analyzed correctly, the resulting visual depiction of the data will be inherently flawed. Nature Methods and the other Nature journals recently made changes to improve data and methods reporting as part of a reproducibility initiative. We feel this is an important first step in improving experimental reproducibility and repeatability, but unfortunately by the time work is submitted for publication it can be difficult to correct shortcomings in experiemntal design and analysis.

A population distribution and a distribution of sample means.

A population distribution and a distribution of sample means.

In our September issue readers will find a new column, Points of Significance, that we hope will be as useful as the column that preceded it, perhaps more so. Martin Krzywinski, who has been writing the visualization column, is now joined by Naomi Altman, Professor of Statistics at The Pennsylvania State University. Among other things, Naomi will be responsible for ensuring that the information and advice we provide about statistics in every Points of Significance article is accurate.

The column has been expanded from one to two pages and will often have an Excel spreadsheet associated with it. This expansion will help us better communicate information that is less well served by display items. However, as illustrated by the figures in the first article of the column and the accompanying spreadsheet, visual displays will continue to play a vital role due to their strength in providing easily interpretable examples that can often be more readily grasped than mathematical or narrative descriptions.

We will strive to present the material so that each article in the column builds on prior ones. In this spirit the first article discusses populations and sampling, a foundation for nearly all topics to follow. The accompanying spreadsheet allows readers to play around with sampling and see for themselves how often values obtained from samples deviate substantially from the real population. It can be disconcerting to see just how often ‘bad luck’ can give a ‘wrong’ result in one set of measurements while in another set of measurements the ‘right’ result is obtained but statistical measures would suggest that the former is more likely to be ‘correct’ than the latter. This excellently highlights how statistics is unable to tell you if you are right. But this doesn’t suggest statistics has limited value. Instead, readers of scientific articles reporting statistical results need a healthy grasp of the limitations of statistical analysis and users of statistics can always learn ways to improve the power of their analysis.

The “aura of exactitude” that often surrounds statistics is one of the main notions that the Points of Significance column will attempt to dispel, while providing useful pointers on using and evaluating statistical measures. We expect that readers will find the upcoming October Points of Significance article on error bars and confidence intervals with its practical tips on interpreting these graphical elements to be particularly useful almost every time they read a manuscript containing these popular visual representations of uncertainty.

We hope readers enjoy Points of Significance. It is appropriate that the column is debuting during the International Year of Statistics. To allow readership by a wider audience each article will be free to access for a period of one month after it is published.

Update: All Points of Significance articles are now free access and have been collected together on a dedicated page in the nature.com “Statistics for biologists” resource.

For more on statistics, and particularly statistics training, don’t miss this September’s Editorial.

. . . . . . . .

Update: Below is a continuously updated list of the Points of Significance articles.

Importance of being uncertain – September 2013
How samples are used to estimate population statistics and what this means in terms of uncertainty.
Error Bars – October 2013
The use of error bars to represent uncertainty and advice on how to interpret them.
Significance, P values and t-tests – November 2013
Introduction to the concept of statistical significance and the one-sample t-test.
Power and sample size – December 2013
Using statistical power to optimize study design and sample numbers.
Visualizing samples with box plots – February 2014
Introduction to box plots and their use to illustrate the spread and differences of samples.
Comparing samples—part I – March 2014
How to use the two-sample t-test to compare either uncorrelated or correlated samples.
Comparing samples—part II – April 2014
Adjustment and reinterpretation of P values when large numbers of tests are performed.
Nonparametric tests – May 2014
Use of nonparametric tests to robustly compare skewed or ranked data.
Designing comparative experiments – June 2014
The first of a series of columns that tackle experimental design shows how a paired design achieves sensitivity and specificity requirements despite biological and technical variability.
Analysis of variance and blocking – July 2014
Introduction to ANOVA and the importance of blocking in good experimental design to mitigate experimental error and the impact of factors not under study.
Replication – September 2014
Technical replication reveals technical variation while biological replication is required for biological inference.
Nested designs – October 2014
Use the relative noise contribution of each layer in nested experimental designs to optimally allocate experimental resources using ANOVA.
Two-factor designs – December 2014
It is common in biological systems for multiple experimental factors to produce interacting effects on a system. A study design that allows these interactions can increase sensitivity.
Sources of variation – January 2015
To generalize experimental conclusions to a population, it is critical to sample its variation while using experimental control, randomization, blocking and replication to collect replicable and meaningful results.
Split plot design – March 2015
When some experimental factors are harder to vary than others, a split plot design can be efficient for exploring the main (average) effects and interactions of the factors.
Bayes’ theorem – April 2015
Use Bayes’ theorem to combine prior knowledge with observations of a system and make predictions about it.
Bayesian statistics – May 2015
Unlike classical frequentist statistics, Bayesian statistics allows direct inference of the probability that a model is correct and it provides the ability to update this probability as new data is collected.
Sampling distributions and the bootstrap – June 2015
Use the bootstrap method to simulate new samples and assess the precision and bias of sample estimates.
Bayesian networks – September 2015
Model interactions between causes and effects in large networks of causal influences using Bayesian networks, which combine network analysis with Bayesian statistics.
Association, correlation and causation – October 2015
Pairwise dependencies can be characterized using correlation but be aware that correlation only implies association, not causation. Conversely, causation implies association, not correlation.
Simple linear regression – November 2015
Linear regression is a flexible way to predict the values of one variable using the values of the other to find a ‘best line’ through the data points.

Data visualization: A view of every Points of View column

We’ve organized all the Points of View columns on data visualization published in Nature Methods and provide this as a guide to accessing this trove of practical advice on visualizing scientific data.

As of July 30, 2013 Nature Methods has published 35 Points of View columns written by Bang Wong, Martin Krzywinski and their co-authors: Nils Gehlenborg, Cydney Nielsen, Noam Shoresh, Rikke Schmidt Kjærgaard, Erica Savig and Alberto Cairo. As we prepare to launch a new column in our September issue we felt this would be a good time to collect and organize links to all the Points of View articles together in one place to make it easier to navigate this wonderful resource that the authors have provided us. For the month of August we will be making all the columns free to access so everyone can benefit from this practical advice on data visualization.

This should not be the end of the Points of View column though. We will be inviting new visualization experts to author articles on new topics that have not been covered so far or which can be expanded on. This page will be continuously updated whenever a new article is published so stay tuned. If you have a suggestion for a topic you would like to see covered in a future points of view article please comment below.

Update of March 28, 2015: A PDF eBook of the 38 Points of View articles published between August 2010 and February 2015 is now available at the Nature Shop for $7.99 under the title “Visual strategies for biological data: the collected Points of View”. The article summaries below provide a nice overview of what is contained in that eBook collection.

. . . . . . . .

Introduction
Visualizing biological data – December 2012
Data visualization is increasingly important, but it requires clear objectives and improved implementation
The overview figure – May 2011
An economic overview figure to convey general concepts helps readers understand a research study

. . . . . . . .

Composition and layout
The design process – December 2011
Use good design to balance self-expression with the need to satisfy an audience in a logical manner
Figure design and layoutLayout – October 2011
Proper layout reveals the hierarchical relationship of informational elements
Gestalt principles (Part 1) – November 2010
Gestalt principles (Part 2) – December 2010
Exploit perceptual phenomena to meaningfully arrange elements on the page
Negative space – January 2011
Whitespace is a powerful way of improving visual appeal and emphasizing content
Salience to relevance – November 2011
Ensure that viewers notice the right content by making relevant information most noticeable
Elements of visual style – May 2013
Translate the principles of effective writing to the process of figure design
Storytelling – August 2013
Relate your data to the world around them using the age-old custom of telling a story

. . . . . . . .

Using colorUsing color in data visualizations
Color coding – August 2010
Choose colors appropriately to avoid bias and unwanted artifacts in visuals
Color blindness – June 2011
Make your graphics accessible to those with color vision deficiencies
Avoiding color – July 2011
Improve the overall clarity and utility of data displays by using alternatives to color
Mapping quantitative data to color – August 2012
Color is useful for compact visualizations of large data sets but must highlight salient features
Heat maps – March 2012
Color, clustering and parallel coordinate plots are essential for using heatmaps effectively

. . . . . . . .

Elements of a data figureElements of a figure
Typography – April 2011
Choose typefaces, sizes and spacing to clarify the structure and meaning of the text
Axes, ticks and grids – March 2013
Make navigational elements distinct and unobtrusive to maintain visual priority of data
Labels and callouts – April 2013
Figure labels require the same consistency and alignment in their layout as text
Plotting symbols – June 2013
Choose distinct symbols that overlap without ambiguity and communicate relationships in data
Arrows – September 2011
Use well-proportioned arrows sparingly and consistently as a guide through complex information

. . . . . . . .

Plot types
Bar charts and box plots – February 2014
Choose the appropriate plot according to the nature of the data and the task at hand
Sets and intersections – July 2014
Euler and Venn diagrams are appropriate for up to three sets but for greater numbers use more scalable plots
Heat maps – March 2012
Color, clustering and parallel coordinate plots are essential for using heatmaps effectively
Temporal data – Feb 2015
Use inherent properties of time to create effective visualizations
Unentangling complex plots – July 2015
Carefully designed subplots scaled to the data are often superior to a single complex overview plot
Pathways – January 2016
Apply visual grouping principles to add clarity to information flow in pathway diagrams
Neural circuit diagrams – March 2016
Use alignment and consistency to untangle complex neural circuit diagrams

. . . . . . . .

Improving figure clarityImproving figure clarity
Simplify to clarify – August 2011
Simplify your presentation to improve clarity
Design of data figures – September 2010
Improve figure decoding by using strong visual cues to encode data
Salience – October 2010
Use salience to differentiate graphical symbols and speed up figure reading
Points of review (Part 1) – February 2011
Examples of figure redesigns
Points of review (Part 2) – March 2011
Simple tips to improve pie chart, scatter plot and color scale data displays

. . . . . . . .

Multidimensional data
Visualizing multidimensional dataInto the third dimension – September 2012
3D visualizations are effective for spatial data but rarely for other data types
Power of the plane – October 2012
Combine 2D plots for effective visualization of multivariate data
Multidimensional data – July 2013
Visually organize complex data by mapping them onto familiar representations of biological systems

. . . . . . . .

Data exploration
Pencil and paper – November 2012
Quick sketches and doodles of data or models aids thinking and the scientific processVisualization for data exploration
Data exploration – January 2012
Create ‘slices’ of data to enhance the process of pattern discovery
Networks – February 2012
Choose your network visualization based on the patterns you are looking for
Heat maps – March 2012
Color, clustering and parallel coordinate plots are essential for using heatmaps effectively
Integrating data – April 2012
Combine visualizations of multiple data types to find correlations and potential relationships
Representing the genome – May 2012
Limit what is displayed based on the question being asked
Managing deep data in genome browsers – June 2012
Compaction and summarization help find patterns in overwhelming data
Representing genomic structural variation – July 2012
Use arcs, color, dot plots and node graphs to show relations between distant genomic positions

. . . . . . . .

Serial dilution woes

A recent report adds further evidence that assays relying on serial dilution and tip-based dispensing could be a source of irreproducibility, particularly in pharmacological assays.

A few days after I wrote the methagora entry below about our efforts to improve the reproducibility of published research, somebody pointed out a paper published last week in PLOS ONE that compared the results of automated serial dilution and plastic tip-based dispensing using a robotic sample processor to results obtained by an acoustics-based liquid dispenser. The latter is a technique using sound for noncontact liquid dispensing and is implemented in instruments such as those sold by Labcyte Inc., the employer of one of the authors on the manuscript. The dose-response data comparing the results of these two liquid handling methods, however, was previously published in patents by AstraZeneca on pyrimidine derivatives for inhibiting Eph receptors. The AstraZeneca results showed that data obtained on the 14 reported compounds via acoustic dispensing showed activities that were 1.5 to 276.5 times higher than data coming from serial dilution and tip-based dispensing.

What the PLOS ONE authors added to this story, besides promoting the research results to the press, was the computation of pharmacophores based solely on the two sets of activity data. The pharmacophore computed from the acoustic data was structurally similar to pharmacophores computed from x-ray crystallography data (for example, all these compounds contained hydrophobic binding domains) and was able to predict the activity of subsequent chemicals. In contrast, the pharmacophore computed from the serial dilution and tip-based dispensing data was very different, contained no hydrophobic domains, and was non-predictive.

What should one make of this? Well, it seems logical that hydrophobic domains could influence the results of serial dilution and dispensing through plastic tips via adsorptive or other effects. As one person commenting on the PLOS ONE paper states, such effects have been well documented and proper analytical technique calls for experiments to detect them.

This all reminds me of marketing for HP’s high performance dispenser that also forgoes serial dilution and instead uses inkjet printing technology to dispense undiluted reagents, presumably also via acoustics. HP promotes the increased reliability of this technique for generating dose-response curves but they don’t highlight the kind of effect documented by the authors of the PLOS ONE paper.

If these results are indicative of differences observed between these two types of liquid dispensing it seems that drug companies must be aware of them and are adapting their assays and protocols as necessary. But even if this is the case, there appears to be little evidence that academic researchers are worried about this.

In theory, one can certainly see the appeal of contactless dispensing but more hard data is needed to draw firm conclusions. This will require extensive side-by-side testing of different sample dispensing methods with many different compounds.

At a minimum, researchers need to be cognizant of this potential problem and report how they dispensed their reagents when reporting results from these kinds of pharmacological assays. Better yet, they should repeat key experiments on different days and with different equipment.

Update: I just found out that Derek Lowe has a nice post about this paper over at In the Pipeline

Reporting standards to enhance article reproducibility

Beginning May 1st Nature Methods will be requiring authors of manuscripts being sent back to peer review to fill out a checklist to disclose technical and statistical information about their submission.

The May Editorial briefly describes why we are using this checklist and provides some details of what is included. Authors can find the checklist that Nature Methods will be using at https://www.nature.com/nmeth/pdf/sm_checklist.pdf and there is a link to it on the journal homepage. Our checklist is identical to that of most of the other Nature journals except for an added item asking authors to “Identify all custom software or scripts that were required to implement the methodology being described and where in the procedures each was used.” Based on feedback we have received, a missing software or script seems to be the item most often mentioned by people commenting on challenges in reproducing a method we have published. This reporting requirment is an important step in trying to address this deficiency.

We expect that the addition of these reporting requirements will elict some grumbling by authors. But based on the experience of Nature Neuroscience, which has been requiring authors to fill out a methods checklist before even the first round of review, we expect authors will come to appreciate the role it serves.

The checklist is only one part of the efforts the Nature journals are making to improve reproducibility. The other journals are also removing formal limits on the length of the methods section. But since Nature Methods has long had no limits on the length of our Methods section, the checklist is the most prominant change for us and our authors.

The May issue also contains other articles relevant to reproducibility. The Correspondence section has a discussion about analyzing the reproducibility of animal experiments. And the May Technology Feature discusses reproducibility in quantitative PCR, a methodology that has suffered from serious problems in this regard due to poor experimental technique and reporting.

For those not tired of reproducibility at this point Nature also has a Special Focus on Challenges in irreproducible research.

As has been said in the editorials on the subject, this is only a first step toward improving the reproducibility of our published research and we welcome feedback from the community on our efforts.

Return of the Points of View column

Our popular “Points of View” column returns this month after a brief hiatus. Here is a bit of history of the column and an introduction to its new author.

On this day four years ago Sean O’Donoghue contacted Nature Methods about a workshop he was organizing on visualizing biological data. This culminated in a Nature Methods Supplement on Visualizing Biological Data published one year later that coincided with the first VizBi meeting in Heidelberg, Germany.

During this meeting Bang Wong and I hatched the idea of a Nature Methods column that would provide practical advice on the visual presentation of data for researchers. Later that year our August issue featured Bang’s very first Points of View column, “Color coding“. What followed was a labor of love by both Bang and I, with plenty of stress over deadlines, that extended over two years.

The column seemed to fill a need in the community and generated considerable positive feedback, including from authors and reviewers who would sometimes refer to advice from Bang’s columns. At the end of 2012 Bang took a needed break and the column went on hiatus. But in the meantime I had again met someone at a meeting in Germany who was passionately interested in the visual display of data.

The Points of View column returns in our March issue authored by Martin Krzywinski (staff scientist, creator of the visualization software Circos, and former fashion photographer).

I decided we couldn’t let someone with Martin’s varied experiences debut as the new Points of View columnist without learning a bit more about him so I asked our Technology Editor, Vivien Marx, to see what she could dig up.

Martin Krzywinski

Martin Krzywinski

Current mode: Makes cancer research and genome analysis visual.
Introduction to genomics: Built computing infrastructure at Genome Sciences Center
Past activities (incomplete): fashion photography, computer security, particle physics.
Published information graphics (incomplete): Book covers, American Scientist, EMBO Journal, PNAS, The New York Times, Wired, Conde Nast Portfolio.

Alex the rat

Alex the rat

Q: You photographed Alex (2000-2002) and helped her become the poster rat for genome sequencing. For example, she was Genome Research’s rat cover-girl. She frequently rode on your shoulder and seems like a groovy friend.

M.K.: Don’t be fooled by Alex’s visual presentation. She bit me countless times. But what do you expect from a rat? Maybe it is I that never learned.

Q: In addition to photo-shoots with Alex, you have had human fashion models in front of your lens. Fashion is pretty. Why should science be pretty?

Continue reading

Nature journals provide a CC license for community experiments

Nature Methods has long been an advocate of the value of community experiments (or competitions/challenges) to assess and compare the performance of algorithms and software tools. In 2008 we discussed the value of these competitions and advocated that they also be used to assess the performance of less widely used algorithms such as those used for single particle tracking. Such an experiment for assessing single particle tracking was run in 2012, although the results are still awaiting publication.

Publication of such work has often been confined to more specialized journals but in 2012 Nature Methods started publishing manuscripts emanating from these competitions with a manuscript assessing the performance of gene regulatory network inference methods based on results of one of the DREAM5 challenges.

In recognition of the profound value such challenges provide to the wider scientific community the Nature journals will now be publishing manuscripts describing the results of these challenges under a Creative Commons attribution-noncommercial-share alike unported license. This is the same license we use for publishing first genome papers, standards papers and white papers. The first example of this is an Analysis article published in Nature Methods yesterday describing the results of the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment.

Publication of such community experiments will necessarily be highly selective and likely increasingly so as such challenges become more prevalent, as illustrated by the explosion in the number of Grand Challenges in Medical Image Analysis. But these community experiments provide invaluable information on the performance of methods that are otherwise difficult to objectively compare. We hope that the potential for publication in a Nature journal and the open access provided by a creative commons license helps encourage broader participation in these efforts and visibility of the results.

Update: February 12
We just published another manuscript describing a community experiment. This Analysis article presents the results of the first FlowCAP challenge that assessed the performance of flow cytometry automated analysis methods.

A different kind of Method of the Year for 2012

Our choice of Method of the Year in prior years has tended to be methods that generally didn’t even exist only a few years earlier but which had quickly bounded onto the scientific stage and attracted the attention of a large portion of the scientific community. Targeted proteomics, our choice for 2012, on the other hand has existed for years in scaled-down forms using methods based on antibodies. Western blotting, immunofluorescence, antibody arrays, etc. can all be used to detect and measure targeted subsets the proteins expressed in cells and tissues.

During this time the workhorse of proteomics, the mass spectrometer, has been used mostly for shotgun proteomics experiments in which the goal was to analyze all the proteins in a sample. But the means to use these machines for targeted detection of defined subsets of proteins and obtain more reproducible measurements than shotgun experiments can typically provide have been around for decades.

Shotgun methods have been mostly confined to specialist laboratories as many biologists have been intimidated by the complexity of implementing and analyzing these experiments properly. Targeted proteomics on the other hand offers a tantalizing opportunity to bring a sampling of the power of mass spectrometry to the wider community of biologists. The assays are simpler, easier to run and well suited to the hypothesis-driven experiments that are the mainstay of biological research.

The ubiquitous Western blot has long filled a central role or functioned as a crucial control in many research studies. Unfortunately performing a high-quality Western blot can feel a bit like roulette. Sometimes you get a fantastic looking blot with an accurate antibody but other times either the blot is blank, the bands may look like they ran through some carnival ride or it might suffer from any number of other problems. This might prompt people to either look for a goat to appease the Western blot gods or take unscientific liberties with the presentation of the data in order to make it look like they are believe it should. It also lessens the likelihood that important replicates are performed or reported.

Targeted mass spectrometry offers the possibility for thousands of labs to move away from, or supplement, Western blots; and improve the quality and quantity of their protein measurements. This is not as sexy as next-generation sequencing, super-resolution imaging or optogenetics, some of our prior choices of Method of the Year, but the potential for revolutionizing an arguably mundane but indispensable technique was compelling enough that it played no small role in our decision. Only time will tell what impact the method has and we eagerly look forward to the answer.