Moving from a PhD into data science can be rewarding, but might be a bit of a culture shock
Are you one of the many PhDs considering a career in data science? I completed a PhD in neuroscience at Stanford three years ago; now I’m a data scientist at Uber. During my time in industry, I’ve found that the skills we develop in graduate school, such as analytical thinking, statistics, communication skills, and – oh yes – tenacity in the face of adversity, make us a great fit for the role.
But moving to industry from the world of academia can be a culture shock. Here are some of the differences between the two:
Industry moves at a much faster pace than academia — while academic projects can last years, industry projects typically span weeks or even days. This is especially true of smaller companies and startups with limited financial runways, which need to move fast and iterate quickly. If an analysis or experiment cannot produce results quickly, it might be best to rethink the design, or drop it in favour of something else.
In the world outside the controlled environment of the lab, there are numerous challenges with experimentation. To name a few, it might be impractical to split users into treatment and control groups, random sampling might not be possible, or it might be tough to control for extraneous variables. For example, when testing the effects of an outdoor advertising campaign, we cannot randomly choose who sees the ads and who does not. In these cases, quasi-experiments like pre-post tests, where we compare engagement before and after the campaign, might be the best compromise.
In academia, you might earn some points for using cool new algorithms over “boring” tried-and-true techniques. In industry, however, the only thing that matters is the end result. In fact, instead of being more impressive, methods that are hard to understand might end up being harder to trust. Employing complicated methods might also take more time and involve more risk if their effectiveness is unknown.
In academia, we need to follow best practices if we hope to pass peer review. In industry, with limited time and resources, some amount of compromise may be necessary to balance research excellence with business needs. If time is short or the experiment is costly, we may have to settle for a smaller sample size and lower statistical power.
In academia, work usually ends once we publish the results. In industry, after analyses and algorithms are complete, there is still the task of convincing decision-makers to use the insights to drive decisions. While academia already understands research is important, in industry, the rest of the company has to be convinced that experiments are worth the effort. Engineers have to spend effort and time to set up tools that make experimentation and data collection possible. Customer support representatives have to explain why users in A/B tests are seeing a different version of the product. For a data scientist to be effective in industry, the whole team has to embrace a data-driven culture.
In academia, your audience should understand all the stats jargon you throw at them. In industry, presentations have to be tailored to an audience that might contain stats experts and people who have never heard of a t-test. Be prepared to explain statistical concepts in layman terms. On the flip side, I’ve had to pick up industry terms, such as conversion rate, churn, organic vs. paid traffic, KPIs, OKRs, CPM etc.
The term “data scientist” is so vague that expected skill sets range from running analyses in Excel to implementing neural networks in C++. Find out the expectations of the company you’re applying to work at, and see if it’s a good fit for your skill set (or what you’re willing and able to learn). Do they expect you to focus on data analysis, or produce more complicated machine learning algorithms, or set up the entire data pipeline? You might have to very rapidly pick up new skills beyond the statistical packages and programming languages you already know, and this is especially true if you join a startup, where you’d likely have to play many different roles.
In academia, we’re constantly chasing the elusive p < 0.05, but in industry, failure to reject the null hypothesis is sometimes just as useful as observing a significant result. For example, if your experiment (for example: an expensive ad campaign) ends up being unsuccessful, you save your team money, effort, and time by killing it in its infancy and not rolling it out more broadly. Furthermore, in some cases we want to ensure that there are no significant differences in certain key metrics when testing a feature. Gradual rollouts of new features are common to ensure an absence of unintended adverse effects.
Of course, companies differ in their culture and approach to data science, and your experiences may vary. In a way, our journeys from a doctorate to data science are experiments themselves — with some trial and error required.
Grace Tang is a data scientist at Uber, where she leads experimental research advising for the Asia-Pacific region. She obtained her PhD in neuroscience from Stanford University, working as part of the Decision Neuroscience Lab to study how personality traits, emotions, and external stimuli affect decision-making. She was formerly lead data scientist at online real estate platform 99.co.