African astronomy and how one student broke into the field

Africa is investing in a future of astronomy research, but students need access to inspirational lecturers, says Gina Maffey.

Mutie at the Ghana Radio Astronomy Observatory (GRAO) at Kuntuse, Ghana

Isaac Mumo Mutie

What do you do when the degree you want to study is not offered by your university?

You study it anyway.

“I did a lot of personal research online, looking for answers” says Isaac Mumo Mutie, an astronomy student who studied at the Technical University of Kenya. While studying for a Bachelor of Technology in Technical and Applied Physics, Professor Paul Baki introduced Mutie to astronomy, and Mutie would consult with him in his spare time.

“He would ask me ‘why are you interested? This is not part of the curriculum.’ But I insisted.” Continue reading

TechBlog: Software quality tests yield best practices

Screen Shot2

{credit}Alexandros Stamatakis/GitHub{/credit}

Life science research increasingly runs on software. A good fraction, perhaps even most of it, is made by academics, for academics: Rough around the edges, perhaps, but effective — not to mention free. But, is it of high quality?

Alexandros Stamatakis decided to find out.

Stamatakis is a computer scientist and bioinformatician at HITS, the Heidelberg Institute for Theoretical Studies in Germany, and a professor of computer science at the Karslruhe Institute of Technology. His team has been developing and refining software tools for evolutionary biology for more than 15 years, he says, including one called RAxML (from which the code snippet shown above was pulled). Yet for all that time, he says, his code still wasn’t perfect.

“The more I developed it the more bugs I had to fix and the more I started worrying about software quality,” he says.

Not software ‘accuracy’, mind you — when it comes to phylogenetics, it’s difficult to know whether software is providing the correct answer. “You don’t know the ground-truth,” Stamatakis says. Rather, he was curious whether popular tools meet computer-science standards for quality.

To find out, Stamatakis and his team downloaded the code for 16 popular phylogenetic tools (plus, as a control, one from the field of astronomy), which collectively have been cited more than 90,000 times. They then ran those codes — 15 of which were written in C/C++ and the last in Java — through a series of tests.

For instance, they looked at how well software can scale from a desktop computer to a large cluster, something that increasingly is necessary as life science datasets balloon in size. They measured the amount of duplicated code in the software to get a rough indication of maintainability. And they counted the number of so-called ‘assertions’ — logical statements in the code that assert, for instance, that a value falls within a certain range, and that cause the software to terminate should they fail — to obtain a measure of code ‘correctness’.

“There have been empirical studies by computer scientists working in the field of software engineering, where they showed that there is a correlation between incorrect code, or code defects, and the number of assertions used — or let’s better say, an anti-correlation,” Stamatakis says.

So, how did the toolset do? Not too well.

As documented in an article published 29 January in Molecular Biology and Evolution, none of the 16 programs in the round-up, including Stamatakis’ own RAxML, aced all the tests. (With 57,233 lines of code, RAxML exhibited both compiler warnings and memory leaks.) But, he stresses, that is neither to denigrate the programmers who wrote those tools — who, after all, were simply trying (and generally succeeding) to solve a particular problem — nor to suggest they do not work properly.

Rather, he says, potential users must exercise caution in using these tools. “They shouldn’t blithely trust software. And they shouldn’t view it as black boxes,” but instead (as he puts it in his article) as “potential Pandora’s boxes”.

Users should strive also to understand what their code is doing, Stamatakis advises. And if unexpected results arise, repeat them using a separate tool that performs the same task, to ensure they aren’t chasing digital phantoms.

Stamatakis concludes his article with a series of ‘best practices’ for software developers. These include running tests for memory allocation errors and leaks, using assertions, checking for code compilation warnings using multiple compilers, and minimizing code complexity and duplication — practices that are common in professional software development but less so in the life sciences.

The tools Stamatakis’ team used to run its tests are freely available, so readers can try them themselves to see how trustworthy their chosen software is.

Journal editors, he says, should consider requiring such tests of any peer-reviewed work, either performed by the authors themselves prior to submission, or by the peer-reviewers. In fact, during our conversation, Stamatakis suggested he might make the toolbox available as a Python script or Docker container, to make it easier for others to adopt. If and when he does, we’ll let you know. In the meantime, caveat emptor!

 

Jeffrey Perkel is Technology Editor, Nature

 

Suggested posts

‘Manubot’ powers a crowdsourced ‘deep-learning’ review

eLife replaces commenting system with Hypothesis annotations

Interactive figures, a mea culpa

TechBlog: ‘Manubot’ powers a crowdsourced ‘deep-learning’ review

2018-02-22_Tech-Feature_Deep-learning_WEB

{credit}Alfred Pasieka/SPL/Getty{/credit}

In Nature‘s February technology feature on ‘deep learning‘, a kind of artificial intelligence whose usage is spiking in life science research, author Sarah Webb points readers to a ‘comprehensive, crowd-sourced’ review of the field.

Available as a preprint on bioRxiv (ETA: and now online in the Journal of the Royal Society Interface), the review is indeed comprehensive: the PDF runs to 123 pages and 552 references, and has been downloaded nearly 27,500 times since May 2017. But it was an intriguing footnote on the article’s title page that really piqued my interest: “Author order was determined with a randomized algorithm”. Continue reading

TechBlog: eLife replaces commenting system with Hypothesis annotations

eLifeHypothesis

{credit}eLife/Hypothesis{/credit}

The next time you feel moved to comment on an article in the open-access online journal eLife, be prepared for a different user experience.

On 31 January, eLife announced it had adopted the open-source annotation service, Hypothesis, replacing its traditional commenting system. That’s the result of a year-long effort between the two services to make Hypothesis more amenable to the scholarly publishing community.

Continue reading

TechBlog: Interactive figures, a mea culpa

d41586-018-01322-9_15422878

{credit}The Project Twins{/credit}

For the 1 February issue of Nature magazine, I wrote a Toolbox article on interactive figures. Unlike static PDFs or JPEGs, these figures allow users to explore the underlying data and code used to create them, for instance to zoom in on a crowded region of interest, or to probe the robustness of a computational model.

It’s an exceptionally broad and growing field of tech development, and my article name-checks more than a dozen tools. Inevitably, omissions were made, one of which was pointed out within hours of the article going live.

Continue reading