Moving from a PhD into data science can be rewarding, but might be a bit of a culture shock
Are you one of the many PhDs considering a career in data science? I completed a PhD in neuroscience at Stanford three years ago; now I’m a data scientist at Uber. During my time in industry, I’ve found that the skills we develop in graduate school, such as analytical thinking, statistics, communication skills, and – oh yes – tenacity in the face of adversity, make us a great fit for the role.
But moving to industry from the world of academia can be a culture shock. Here are some of the differences between the two:
Pace
Industry moves at a much faster pace than academia — while academic projects can last years, industry projects typically span weeks or even days. This is especially true of smaller companies and startups with limited financial runways, which need to move fast and iterate quickly. If an analysis or experiment cannot produce results quickly, it might be best to rethink the design, or drop it in favour of something else.
Experiments
In the world outside the controlled environment of the lab, there are numerous challenges with experimentation. To name a few, it might be impractical to split users into treatment and control groups, random sampling might not be possible, or it might be tough to control for extraneous variables. For example, when testing the effects of an outdoor advertising campaign, we cannot randomly choose who sees the ads and who does not. In these cases, quasi-experiments like pre-post tests, where we compare engagement before and after the campaign, might be the best compromise.
Methods
In academia, you might earn some points for using cool new algorithms over “boring” tried-and-true techniques. In industry, however, the only thing that matters is the end result. In fact, instead of being more impressive, methods that are hard to understand might end up being harder to trust. Employing complicated methods might also take more time and involve more risk if their effectiveness is unknown.
Compromise
In academia, we need to follow best practices if we hope to pass peer review. In industry, with limited time and resources, some amount of compromise may be necessary to balance research excellence with business needs. If time is short or the experiment is costly, we may have to settle for a smaller sample size and lower statistical power.
Collaboration
In academia, work usually ends once we publish the results. In industry, after analyses and algorithms are complete, there is still the task of convincing decision-makers to use the insights to drive decisions. While academia already understands research is important, in industry, the rest of the company has to be convinced that experiments are worth the effort. Engineers have to spend effort and time to set up tools that make experimentation and data collection possible. Customer support representatives have to explain why users in A/B tests are seeing a different version of the product. For a data scientist to be effective in industry, the whole team has to embrace a data-driven culture.
Communication
In academia, your audience should understand all the stats jargon you throw at them. In industry, presentations have to be tailored to an audience that might contain stats experts and people who have never heard of a t-test. Be prepared to explain statistical concepts in layman terms. On the flip side, I’ve had to pick up industry terms, such as conversion rate, churn, organic vs. paid traffic, KPIs, OKRs, CPM etc.
Skill set
The term “data scientist” is so vague that expected skill sets range from running analyses in Excel to implementing neural networks in C++. Find out the expectations of the company you’re applying to work at, and see if it’s a good fit for your skill set (or what you’re willing and able to learn). Do they expect you to focus on data analysis, or produce more complicated machine learning algorithms, or set up the entire data pipeline? You might have to very rapidly pick up new skills beyond the statistical packages and programming languages you already know, and this is especially true if you join a startup, where you’d likely have to play many different roles.
Failure
In academia, we’re constantly chasing the elusive p < 0.05, but in industry, failure to reject the null hypothesis is sometimes just as useful as observing a significant result. For example, if your experiment (for example: an expensive ad campaign) ends up being unsuccessful, you save your team money, effort, and time by killing it in its infancy and not rolling it out more broadly. Furthermore, in some cases we want to ensure that there are no significant differences in certain key metrics when testing a feature. Gradual rollouts of new features are common to ensure an absence of unintended adverse effects.
Of course, companies differ in their culture and approach to data science, and your experiences may vary. In a way, our journeys from a doctorate to data science are experiments themselves — with some trial and error required.
Grace Tang is a data scientist at Uber, where she leads experimental research advising for the Asia-Pacific region. She obtained her PhD in neuroscience from Stanford University, working as part of the Decision Neuroscience Lab to study how personality traits, emotions, and external stimuli affect decision-making. She was formerly lead data scientist at online real estate platform 99.co.
Suggested posts
So you want to be a data scientist?
So you want to be a data scientist (again)?
How is the rise of data-intensive research changing what it means to be a scientist?
Report this comment
Dear Grace,
A very inspiring post and very well written. However, could you also give us inputs of how you approached transition from a doctorate in neuroscience to being a data scientist such as courses or certifications which you acquired? Give the large qualification requirements for data science jobs, I wonder what is needed really to make such a transition. I am, myself, doing a postdoctorate in plant genomics and want to move to a job with genomics data science but articles are divisive over the suitability of MOOCs in providing you with the required qualifications. I have adequate skills in R for statistical and computational analysis and recently worked with bioinformatics pipelines in Galaxy and bash Linux using Conda for NGS analysis. I am trying to improve upon my knowledge in these but need some inputs on how to approach right courses for making the job switch. What would you suggest? Thank you.