A commenter on ZDNet captures the skepticism about data science skills demand:
From now on I won't call myself a database designer anymore, I will call myself a data scientist! All I need to do is brush up on some basis statistics and the world is my oyster! If you can't sell your current skills, try them under different marketing - works like a dream every time.
Though the comment reeks of truth, there's more to the picture. We have progressed to a different point in the debate. Here's where we are now:
- near-consensus on the importance of mathematical/'quant' skills
- outspoken (but increasing) views that quant skills without business/industry know-how make for an imperfect data scientist at best
- worthy debate about recruiting rock stars versus fostering data science skills across teams
- more actual studies of data science salaries and roles, giving clearer view of real demand
Let's run through these points (I'll hit them in reverse order).
New data scientist salary data
Burtch Works, an executive recruiting firm, just issued a new study on data scientists that analyzes salary data as well as career level, geography, educational background, gender status and industry. Claiming to be the first study of the data scientists' compensation and demographics, Burtch conducted interviews with 171 data scientists (based in the United States) to compile the data.
Burtch Works separates data scientists from other big data professionals based on the data scientists' ability to work with vast amounts of unstructured data. Though I haven't seen that distinction used to separate data scientists before, the report is still useful.
Here's some common themes from the 171 data scientists polled:
- Data scientists are predominantly male: 88% are male, 12% are female.
- Data scientists are (comparatively) young, with a median of nine years work experience.
- Data scientists often have advanced degrees. 88% have at least a Master’s degree; 46% have a Ph.D.
- The tech industry is the biggest employer of data scientists amongst the survey group, employing 40% of those interviewed.
The percentage of PhDs in the sample group was particularly striking, much higher than what I've seen across IT professionals as whole. Another interesting note: in 2013, Burtch Works conducted a survey with other big data professionals (data scientists were not included in that survey) - only 20 percent of those in the broader big data survey had PhDs.
As for the compensation data, Burtch Works makes three level distinctions for both individual contributor and manager-level data scientists. Geographic variations were also taken into account. While those levels are helpful for data accuracy, it's not possible to explain it all concisely here. Bottom line:
- Median salaries for individual contributors ranged from $80,000 (1-3 years experience) to $155,500 (9+ years experience).
- Median salaries for managers varied from $140,000 (1-3 direct reports) to $240,00 (10+reports).
KDNuggets issued a 2014 salary survey that included some international variants. The data points weren't exactly the same as Burtch Works, but were in the same ballpark. Hands-on data scientists in the U.S. reported average salaries of $118,000; management level came in at $140,000. Based on compiled data from the KDNuggets analytics salaries surveys in 2013 and 2012, it does appear that analytics/big data related compensation continues to trend upward.
The more interesting part is how demand for big data related skills compares to other IT skills. That gives us a better view of how companies are prioritizing these skills. A Dice.com 2014 salary survey with a large sample size (17,000 practioners) indicates the extent to which big data skills are commanding a salary premium. A list of the top ten IT skills by salary is a laundry list from the big data technical toolkit:
Source: Dice.com 2014 Salary Survey
In particular, the premium placed on R skills sketches a picture of companies looking to make better sense of data via statistical analysis and the development of analytics apps.
Tech skills are not enough
So we've established some level of demand for data scientists, and some idea of who these practitioners are. But have we done a good job of defining what an ideal data scientist looks like? Not really.
The technical components of a big data professional are pretty well defined by now. This piece on creating an effective Hadoop resume is a good window into the technical tools side of the skill set. Then there is the mathematical/quant side. Nutshell: vast amounts of external data and new tools to parse their meaning require a new/upgraded analytics skill set. Whether that means recruiting rock stars or upskilling internals is a critical issue to tackle.
But there is now an emerging clarity that the business side of the data scientist needs fleshing out. If you don't know an industry inside and out, how much value can you add? As Gartner analyst Svetlana Sicular put it, 'Organisations already have people who know their own data better than mystical Data Scientists… learning Hadoop is easier than learning the company’s business.'
Dan Woods dug into the definition of a well-rounded data scientist with Michael Rappa of the Advanced Institute for Analytics. Here's how Rappa breaks the skills profile down: technical skills, teamwork skills, communication skills, business skills ('with a little bit of empathy'), and tools mastery. That's quite a bit different than a Hadoop or R focused skill set. I'll get to team skills in a moment, but let's not allow 'communication skills' to slip by in some vague way - this is not about making a doughnut run for colleagues. As Woods puts it:
This isn’t just about giving a presentation. This is about bridging the 'trust gap' between hard analytics and an MBA’s view of the business. Rappa: 'The data scientist must learn how to communicate in a credible way with decision makers so that they can trust what they’re [being told].'
Now we're on the right track. Oshry Ben-Harush, data scientist at EMC, views the core of his skills as:
- Integration – You need to know more than the science - you must understand the business.
- Rapidly learn and adapt – 'You have to rapidly learn the domain in order to map the business problem to a form that is applicable by these set of algorithms.'
- Simplify your findings – Present your findings in a simple and informative manner: 'Having excellent technical skills isn’t enough because you have to be able to explain it well or it will fall down.'
In the obsessive research I did for this piece, the most human take on data scientists came from an IBM post:
A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It's almost like a Renaissance individual who really wants to learn and bring change to an organization.
Now that sounds like someone I'd want to work with.
There's still plenty of issues to resolve when it comes to data scientists. Just because these folks are commanding high salaries doesn't mean it's time to rush out and hire one. From an enterprise angle, it's far more important to articulate a business problem and then determine the skills gap and how it will be filled.
I didn't get to the issue of team development versus individuals this time around, but the consensus is that no single data scientist can bring the array of skills needed to successfully complete big data projects on their own. That opens up a key issue for companies: team training versus hiring specialists. I'll return to that in the future.
For now, we can safely say that the data scientist is no longer a unicorn. We now know quite a bit about what it costs to obtain one, and even how to evaluate them. But that doesn't mean your company needs one.
Image credit: Funny botanist © Serg Nvns - Fotolia.com
Postscript - a few more educational resources: here's my newsfeed tag of the most interesting big data stories I have come across. KDNuggets compiled a list of formal online big data educational offerings. Plus: a quick rundown for four (free) big data MOOCs. John Foreman's live introduction to data science is also worth a look.