How do we confront the data science skills gap?

Jon Reed Profile picture for user jreed November 14, 2014
The data science skills gap is a real problem. Lessons from the cutting edge MSA program at NC State provide new answers to chew on. But can big data skills be cultivated at scale?




I'm wary of so-called "skills gaps." Usually, IT "skills gaps" are borne of tech hype that never materializes. Economic conditions change. Who gets caught holding the bag? Too often, it's IT professionals who overpaid for certifications that didn't lead to actual jobs. That's not even counting all the online "training" hucksterism that takes advantage of folks in dire professional need.

But data science seems to be an exception. The skills demand for "big data" is backed up not by breathless analysts predictions, but by actual data (see Janine Milne's recent piece, Want to double your pay? Become a Big Data Expert.) Previously, I noted a hiring survey where the top 10 IT skills were dominated by big data needs.

Data science skills: we know what we need, but can we train them?

If the big data skills shortfall is real, it's also troubling. Vendors of all flavors are pushing big data tech - but are they pushing customer skills readiness with the same zeal? If not, we know the drill - expect delays due to staffing shortfalls, and add increased likelihood of project failure to the list.

There is some good news - clarity on what companies are looking for in big data professionals has increased. The problem is that the depth of math/tech know-how needed is not a simple retraining exercise. And it's not as straightforward as getting a master's in statistics either. As I documented in my last piece on data science skills, the background companies want combines "quant" skills with business savvy, including the ability to communicate statistical findings to decision makers in a credible way. Crunching numbers without context doesn't matter.

One question I left hanging in my last piece is a vital one: can data scientists function as hired guns, or do they need to be built as teams? It's a question of import because historically, IT skills shortfalls have been addressed by sourcing sought-after subject matter experts, usually on a contract or consulting basis.

Add another pressing question: if data science team skills are necessary, then how can academic institutions produce the next generation of data scientists, given that most schools focus on individualized learning via reports, research and defending that research as an individual?

NC State's MSA: tackling the big data skills problem differently

A Glimpse Into A2014: The Analytics Talent Gap, an event replay from Analytics 2014 Las Vegas, provides fresh answers. North Carolina State's Mike Rappa shares provocative lessons from NC State's cutting edge Master's of Science and Analytics program, now seven years into a history that includes a startling 95 percent employment rate by the time of graduation - with an average starting salary of $97,000.

Rappa paints a stark picture of the challenges schools have serving the needs of skills-hungry employers. As he explains, undergraduate institutions are shifting towards "a la carte" course enrollment models - a mentality where "students are the customers" and education is consumed like any other form of entertainment.

It's counter-intuitive to expect students to actively pursue tough, rigorous courses in hard subjects that push them to their limits. Then there is the "work as a team" problem. As Rappa says: "Speaking as a professor, teamwork is not really in the DNA of universities. Universities don't understand the degree to which team is [vitally important for data scientists.]"

When NC State designed their MSA course, they took a completely different tack: the employer was the customer. Students would dedicate themselves in a fully immersive, ten month residential experience. Most of the time, they'd be working in small teams, often in direct collaboration with corporations, tackling real-life data sets (90 such projects have been completed to date).

In his presso, Rappa makes the case that there is no one definition of a great data scientist - it varies from industry to industry. But he makes a crucial point: if we insist all data scientists have an MS in Computer Science and a PhD in Statistics, then big data skills development is NOT scaleable.

Fortunately, as Rappa puts it, "One PhD goes a long way" on big data teams. Overall, Rappa believes that the core skills of a good data science team can be cultivated in a shorter timeframe. That's the challenge at the core of NC State's MSA curriculum.

Based on analysis of employer job postings, NC State's MSA defined their core skills as:

  • statistical modeling techniques and applied mathematics
  • ability to work in multi-functional, cross-disciplinary teams
  • strong communication skills, including ability to present business case
  • hands-on skills in relevant big data tools and complex analytics software
  • creative problem solving skills

And, last but not least: employees must be productive from day one. Employers don't need perfect skills, but they won't hire employees who can't hit the ground running. Via the program's team-oriented structure, students acquire these baseline skills solving real world problems. Rappa cited a problem brought to NC State by the CIA:

A sick individual shows up and registers (with incomplete information) at a health clinic. The person waits six hours at the clinic and leaves without being seen, infecting the rest of the waiting room in the process. By analyzing a variety of large data sets, from travel data to localized information like shopping transactions, this individual must be identified.

The NC State team solved the problem in three months, rather than the eight months allotted - without a PhD data scientist on the team. Rappa credits the team's chemistry and resourcefulness, which included sourcing outside experts where needed. It wasn't an easy problem - solving it brought inquiries from other intelligence agencies looking for similar insights.

They also surprised experts who did not believe a younger team without PhDs could solve this. As one person told Rappa, "This team should not have been able to solve this problem. There's no PhDs on the team; they don't have enough knowledge with graph theory." .

My take

Rappa's experience at NC State has convinced him that the data science talent gap can be solved, but only in a team skills framework:

It's going to be all about teams. It's not going to be about producing more unicorns; its about producing lots of people with an array of skills... It's a solvable problem. The shortage is solvable, but it's only solvable by trying to produce high-performing team players who have an array of skills.

Rappa's story addresses my lingering question on data science teams versus rock stars. Hands-on, team-based education is the right way to proceed in a corporate context. Rappa also addresses some problems of scale: if such teams can function well without needing PhD-level schooling, then the need for a decade (or so) of schooling goes away.

Where the scale problem is NOT solved is in the limited number of institutions who approach education in this manner. Nor will it be easy to convince students to embrace a data science focus. Rappa's argument of students-as-consumers is correct, and poses the greatest challenge to his own vision.

It might sound odd, given eye-popping entry level salaries, to imagine students veering away from such curriculums - and yet it's a reality. I'd like to better understand what the "complementary" skills on such team are comprised of, something Rappa did not fully address in this presentation (example: break out the skills profile of each member of a successful big data team).

Programs like NC State's MSA advance the skills conversation, but solving the problem of producing data science skills at scale remains elusive. How much of an obstacle that will pose remains to be seen.

Image credit: Search © lassedesignen -

A grey colored placeholder image