Research involving vast quantities of data may be changing the image of scientific research, but is it changing the image of scientists too?
Scidata publishing better science through better data competition winner Jonathan Page.
An intrepid, khaki-clad explorer, machete in hand, cutting their way through some undiscovered wilderness. A bespectacled, grey-haired academic in a white coat, supervising some elaborate experiment in a lab, illuminated by glowing lights and flashing buttons. These are the classical images sometimes conjured when the word ‘scientist’ is mentioned.

Horace B. Carpenter as Dr. Meirschultz, a scientist attempting to bring the dead back to life in the 1934 film Maniac
The world in which these stereotypes were born is different to the technologically dominated one in which we live today. The work of a scientist is still the same: studies are devised, data are collected, analyses are performed and conclusions are drawn. The difference, now, is that all of this can be done without leaving the comfort of the office. In a world where enormous quantities of data can be collected and analysed with relative ease, scientists no longer need to swashbuckle their way through vast wildernesses or spend sleepless nights with flashing lights to make discoveries; they can do the same by exploring vast swathes of data instead.
These advances haven’t arrived without their challenges. Terabytes of data can’t be analysed in a traditional notebook. This new scale of data needs novel methods, in the form of complex programming languages and statistical methods. Many scientists across all disciplines now need to become wizards in the world of coding, a realm previously reserved for computer scientists and software developers, and spend more hours in the office than the field or the lab. Data-intensive research is forcing many scientists to rethink the way they carry out their research as a result.
Data-intensive research has clearly changed how many scientists work, but it’s also made it far easier to become a scientist, so much so that many of us don’t realise we’ve become one. Crowd sourcing has become an exceptionally effective way of collecting data, but while those organising it might fit the stereotype of a scientist, the people taking part often don’t. Anyone can take part in these events or activities regardless of their background, meaning that anyone can contribute to a scientific study.
These crowd sourcing efforts take a great variety of different forms. Some, like Galaxy Zoo allow users to analyse data already collected (in this case, images of galaxies already collected by astronomers). Others, like The Big Bat Map use crowd sourcing to collect new data about bat sightings around the UK. While Zooniverse is serving as a repository for a huge number of initiatives across a range of disciplines. If we define a scientist as someone who takes part in the scientific process, then anyone who collects or analyses data through these crowd sourcing activities is, surely, also a scientist.
The big-data frontier might force us to change our perspectives and ideas regarding what a scientist looks like, but it doesn’t change what it means to be a scientist. A great deal of research can now be done by scientists who specialise entirely in computer-based work, rather than having to venture into the outdoors or into the lab. More still can be achieved by members of the public from their homes. Scientists still push forward the boundaries between what we do know and what we don’t, uncovering truths by devising studies, collecting data, performing analyses and drawing conclusions. All data-intensive research has changed is how this is done, and who’s doing it.
Jonny Page is a PhD candidate at the University of Oxford, where he’s studying the biomechanics of insect flight. Outside of this, he has interests in data science, programming and science writing.
This piece was selected as one of the winning entries for the Publishing Better Science through Better Data writing competition. Publishing Better Science through Better Data is a free, full day conference focussing on how early career esearchers can best utilise and manage research data. The conference will run on October 26th at Wellcome Collection Building, London.
Suggested posts
So you want to be a data scientist?
Big data jobs are out there – are you ready?
Finding job satisfaction as a data scientist
Why building a start-up is probably your most sensible career path
Scientific data + effective communication = big changes
Recent comments on this blog
African astronomy and how one student broke into the field
From Doctorate to Data Science: A very short guide
Work/life balance: New definitions