It’s been nearly a decade since Eric Welsh first noticed some weirdness with Microsoft Excel. A senior staff scientist in the Cancer Informatics Core at the H. Lee Moffitt Cancer Center and Research Institute in Tampa, Florida, Welsh was using Microsoft’s venerable spreadsheet application to view mouse and human gene expression data, the better to sort and understand the numbers. But a quick glance revealed the import hadn’t gone exactly as planned. “Excel would screw them up every time,” he says.
How so? When data are imported into Excel, the program works hard to figure out what kind of value each cell holds. Most of the time, Excel is smart enough to do that correctly, and values like ‘BRCA1’ and ‘12345’ are converted into text and integers, as expected. But “Excel is a little too smart for its own good,” Welsh says. If a cell reads “SEPT7,” the program assumes the author meant to write a date, and converts it automatically. It also sometimes translates what appear be numbers in scientific notation – say, ‘2310009E13’ – into actual scientific notation (‘2.31E+13’). The problem is, those two terms are neither dates nor numbers – they are proper names, scientifically speaking: gene names, sample identifiers or accession numbers. And by autoconverting them, those names are lost, or at least, obscured.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-5-80
Continue reading →