Bring on the box plots

Box plots are excellent for visualizing important core statistics of sample data. We hope that a new online plotting tool BoxPlotR will help encourage their wider use in basic biological research.

The same three samples plotted by bar chart (left) and box plot (right).

The same three samples plotted by bar chart with s.e.m. error bars (left) and Tukey-style box plot (right). The box plot more clearly represents the underlying data.

A bar chart is often a person’s first choice of plot type when they want to compare values. This is appropriate when the values arise from counting. But when the value is a mean or median of data points taken from a sample, a bar chart is usually inappropriate. As discussed in our March Editorial and the accompanying Points of View and Points of Significance columns, a “mean-and-error” scatter-type plot or a box plot are more appropriate for sampled data. In summary, we strongly recommend that box plots be used when you have at least five data points, but for samples with 3-5 data points mean-and-error plots are more appropriate.

Box plots are heavily used in biomedical research in which statisticians have historically had considerable input into study design and analysis. But although similar types and quantities of sample data also appear in basic research (such as that published in Nature journals) box plots are much less common than bar charts in these manuscripts. Last year in Nature Methods for example, ~80% of sampled data was plotted using bar charts.

Discussions we had with the community suggested that an impediment to using box plots instead of bar charts to graph sample data was due to limited support for box plots in plotting programs commonly used by researchers. It also became apparent that some software that did support the box plot was deficient in communicating to users what the different elements of the plot represented. As a result, strangely labeled box plots were showing up in published papers. At NPG we thought it would be useful to provide authors with a simple online tool they could use to generate basic box plots of their data for publication.

The origin of BoxPlotR
At the VizBi 2013 conference in Cambridge Massachusetts I mentioned NPG’s desire for such a tool at a breakout session chaired by Martin Krzywinski in which the participants, including a young researcher named Jan Wildenhain, discussed what the community needed to create better figures. I also happened to mention our interest in this to Michaela Spitzer while visiting her poster from the Juri Rappsilber and Mike Tyers labs showing how the R-package ‘shiny’ by RStudio can be used to easily convert R code (a popular scripting language for statistics) into a visual application for exploring data.

Later at the conference Jan approached me and said he was intrigued by our desire for someone to design a webtool to create box plots and that he was interested in working on such a project. I happily told him to get in touch with me after the conference so we could discuss it further.

Three weeks after the conference concluded I still hadn’t heard from Jan and was beginning to worry that he had decided not to pursue this. Then… a few days later, I received an email from Jan. Much to my surprise he provided a link to a highly functional tool that he and Michaela, through their own initiative, had gone ahead and created using shiny and R. What followed was a productive and rewarding period of discussion and development during which time Michaela incorporated additional functionality and made selected design changes. The tool appeared so well designed and functional that I encouraged them to submit it to Nature Methods for publication as a Correspondence. After incorporating additional functionality and changes based on comments brought up during peer review BoxplotR was ready for publication.

Sample BoxPlotR plots

Sample BoxPlotR plots. Top: Simple Tukey-style box plot. Bottom: Tukey-style box plot with notches, means (crosses), 83% confidence intervals (gray bars; representative of p=0.05 significance) and n values.

Launch of BoxPlotR
To accompany the publication and launch of BoxPlotR we thought it would be useful to provide some information and practical advice about box plots to our readers. Nils Gehlenberg, a former author of several Points of View articles with Bang Wong, agreed to resurrect that popular column for our February issue with an article on bar charts and box plots. Similarly, Martin Krzywinski and Naomi Altman agreed to delay our planned Points of Significance article on the two-sample and paired t-test and instead devote an article to box plots.

Seeing how the community responded to our interest in creating an online box plot tool and then working with them on this project has been a great experience. This never would have been possible without the initiative and talent of Jan and Michaela or the support they received from their PIs Mike and Juri. We hope both our authors and others find BoxPlotR useful and we encourage feedback. General comments can be made here on our blog or by emailing the journal. For specific bug reports and feature requests please see the contact information at https://boxplot.tyerslab.com.

People, publishing, and policy: Q&A with Janet Thornton, director of the European Bioinformatics Institute.

Janet Thornton has been named Dame Commander of the Order of the British Empire. She feels it is an important recognition of bioinformatics.

Janet Thornton has been named Dame Commander of the Order of the British Empire. She feels it is an important recognition of bioinformatics.{credit}EMBL-EBI{/credit}

The scientist profiled in the February issue of Nature Methods (the Author File) is Janet Thornton, the director of the European Bioinformatics Institute.

Here, she shares some additional insight about publishing, science policy, and mentoring. What follows is an edited excerpt of her conversation with Nature Methods. Read more here.

VM: In an era of not-so-plentiful funds, ELIXIR (interviewer looks up acronym…)—the European life-sciences Infrastructure for biological Information—and other initiatives takes you deep into policy-making. Which tends to not resemble a picnic on a sunny Nottinghamshire day. What motivates you?

JT: ELIXIR was launched Dec 18 and now has its own director. It does feel a bit that it’s my child. But it’s a child that has grown up and is really on its way to becoming independent and moving forward to being an independent adult. It’s still got a long way to go. It’s a bit like a teenager, actually. (laughs)

I honestly believe that these initiatives are the best way forward because, despite the setbacks, everyone broadly agrees. So it is a case of getting through the politics and making the science happen. As we know, science has no borders—and all scientists agree with this—so in the end, common sense will win and we can go forward.

VM: You have published around 400 papers. What does a paper mean to you?

JT: Probably for me the most important part of the process of science is publishing a paper. Because it’s the time when you really sort out what matters, why you did it, what you discovered and then you try and make it understandable for other people. And I have to say I get really upset when my papers are rejected.

VM: What types of papers do you enjoy reading?

JT: I love reading good solid papers, which are logical and explain how the results are obtained and why they are important. I used to spend hours in the library, like a detective tracking down information and knowledge.

VM: Rumor has it, you still present posters.

I don’t often present posters but there was one particular occasion when the University of Cambridge organized an event and they asked all the senior staff throughout the university to present posters. That was the last sort of official poster presentation. Of course, my students and post-docs have posters all the time. And I do man those posters as appropriate. It’s fun. You talk about your work.

VM: What is the best way for a scientist to select members most suited to his or her lab?

JT: Five things I look for: a) Bright/clever, b) Committed and interested in a project or area of research, c) Relevant expertise – though this is not the most important thing, d) What does the lab think? e) Would I like to have a meeting at 9am on a Monday morning with this person?

VM: Computational resources in the life sciences are not always appreciated. What do you recommend to scientists keen on being and staying tool-builders and resource-providers?

JT: Find a good place to go to follow your dream; find someone you want to work with and prepare yourself for the future. Not all scientists can be principal investigators (PIs), nor indeed want to be, so the key is to find your own niche.

VM: You studied physics at the University of Nottingham, then shifted to biophysics for your PhD at the National Institute for Medical Research. What do you advise when students of any stripe wonder: ‘Shall I choose physics? Computer science? Biology?’

JT: I am afraid I am biased—go with biology—it is amazing, beautiful, complex, but still an open book with lots to discover. And even if this were not enough, it has so many really important applications —many of the so-called grand challenges that will literally affect the future of this planet and everyone on it.