Put your natural science skills to work in a data science career
Guest contributor Daniel Harris of SoftwareAdvice.com
The explosive economic impact of big data has blurred the line between the business world and the scientific world like never before. A new type of business leader, the data scientist, has evolved as an amphibian, capable of thriving in both worlds, swimming in data lakes to bring useful insights back to the solid ground of business concerns.
Of course, companies have been using business intelligence (BI) tools to analyse their operational and financial performance metrics for decades.
But datasets generated by the web are so large that they must be stored on clusters of servers with thousands of nodes. Traditional methods for analysing these datasets have faltered, necessitating a more scientific approach.
There are currently more opportunities for scientific research at Fortune 500 companies than there are data scientists to fill them. And, whilst department chairs at leading universities may get respect, data scientists get stock options.
In a previous article, we discussed the steps you can take to prepare yourself for a career as a data scientist. Here, we’ll discuss how you can turn a background in academic science to your advantage in the information economy.
Why are big businesses hiring scientists?
In the natural sciences, instruments now generate datasets on the terabyte or petabyte scale, and to deal with this flood of data, researchers have trained themselves in the use of statistical models to find useful information in all that noise.
Companies like Google and Facebook have found that scientific approaches to extracting meaning from patterns aren’t just limited to clarifying the results of a particle accelerator experiment, for example. These methods can result in quantifiable business benefits when they’re applied to customer information, operational metrics and vast datasets generated by the web.
Doug Laney, VP and distinguished analyst at information technology research and advisory company Gartner, has spent years researching the role of data scientists at major companies.
He explains that the title of data scientist isn’t just hype: “Someone should not be referring to him or herself as a ‘data scientist’ if he or she doesn’t understand or doesn’t follow the scientific method. That’s one of the things that really differentiates a data scientist from a statistician or a BI analyst.”
The scientific method is necessary in data science because, just like in natural science, you don’t know what you’re looking for when you set out to solve a problem. It’s one thing to analyse why revenue dropped in a given financial quarter. It’s something very different to use statistical models to invent products and services that address needs your customers don’t know they have yet.
Which scientific disciplines cultivate the skills needed for data science?
It’s probably clear from the emphasis I’ve already put on statistical models that only scientists working in certain fields are going to acquire the skills needed to solve a big company’s big data problems.
Laney defines big data as being characterized not only by the volume of the information needed to address a business problem, but also by the variety of data sources involved and the velocity at which they need to be processed. He observes that scientists dealing with information characterised by the “3 V’s” of big data are particularly suited for careers in data science.
The rapid processing of huge and diverse datasets requires considerable expertise with statistical modelling. This is where data scientists with backgrounds in the natural sciences can really shine.
As Laney explains, “A lot of statisticians and BI analysts are one-trick ponies: they know how to do multivariate regression analysis or how to create pie charts and bar charts, and that’s about it. Data scientists need to be able to look at the data in a variety of ways by applying a wealth of different analytic models, from regression analysis up through pattern matching, machine-learning, scenario analysis, Monte Carlo simulation — you name it.”
Natural sciences with computational branches are ideal incubators for data scientists. There are of course many sub-disciplines that fit this description, but here are a few examples:
|Biology||Population biology, genomics, molecular biology, neurobiology, epidemiology|
|Chemistry||Physical chemistry, nanochemistry, solid-state chemistry, quantum chemistry|
|Physics||Astrophysics, particle physics, statistical mechanics, fluid dynamics|
|Astronomy||Observational astronomy (radio, infrared, etc.)|
|Earth and environmental sciences||
Climatology, geology, systems ecology
So, how do you use your discipline to position yourself as a potential data scientist when applying for jobs?
How do I convince big businesses that I have what it takes?
This is the tricky part.
As Laney explains, “Companies want advanced analytics talent that knows their business.”
While it’s certainly easier to understand the day-to-day operations and revenue goals of a corporation than it is to build models that describe fluid dynamics, for example, actually convincing businesspeople that you can understand what they do is a different story.
Fortunately, there are alternatives to climbing the corporate ladder one rung at a time:
- Pick an industry, stick with it, and learn about it. Specialising in an industry will help you to understand the business problems it faces, and the kinds of data you can use to help resolve these problems. By speaking the lingo of your industry, you can convince business leaders that your analytical skills will show to good effect in quarterly earnings reports, as well as academic papers.
- Find forums to showcase your skills in solving data science problems. Topcoder and Kaggle are both websites that host programming contests for data scientists with cash prizes. Major companies post their problems and reward the programmers who come up with the best solutions.
Above all, remember that a job as a data scientist will require you to be an evangelist. You’ll have to preach the merits of methodical, scientific exploration of datasets involving lots of trial and error to colleagues who’ve spent their careers thinking about results rather than worrying about whether a hypothesis can be proven. Fortunately, you’re living at a time when the scale of business problems has made stakeholders more eager to hear your message than ever before.
Daniel Harris is a Market Researcher for SoftwareAdvice.com, a site that provides software buyers with access to free research, product comparisons and price quotes. Daniel’s research focuses on business intelligence and analytics applications. You can reach him directly at firstname.lastname@example.org.