Women in data science careers - insights and obstacles

Jon Reed Profile picture for user jreed April 29, 2015
I recently stumbled onto a data science event replay from the Berkeley School of Information that sheds valuable light on women in data science careers, as well as insights on project success and pitfalls to avoid.

While adding videos to my data science and analytics talks playlist, I came across a very interesting replay from a Berkeley ISchool event in September 2014 - a data science panel comprised entirely of women active in the field. I was surprised that I couldn't find a single piece analyzing the lessons from the talk.

I'm stepping in to fill that void - with some bigger picture stats thrown in. The panel content was a good mix of lessons on data science skill requirements, industry use cases, and guidance on how to avoid the perils of poor execution. Some panelists confirmed my views on building effective data science teams, but there were some surprises as well. I'll analyze those tips first, then I'll share the panel's advice for women in the field.

Data science skills - keys to execution

The panels' views on data science skills included both academic degrees and skills needs that became apparent while pushing projects forward.

1. Advanced degrees matter - virtually all the panelists had either master's or PhDs in related fields. Emi Nomura, Data Scientist at Jawbone, said her current team was entirely PhDs. Pinar Donmez, Chief Data Scientist at Kabbage, noted her team was all PhDs except one. However, the panelists agreed that a PhD was not a requirement. As Donmez says, the important thing is years of wrestling with data:

A PhD has been working with data, trying to solve predictive problems for a longer time than Master’s students. You need to have done something to compensate for the loss of the time. That can be compensated through a number of ways. You may have done some internships. You may have gained relevant industry experience. Actually, I would prefer industry experience to a PhD because of some of the reasons that have been mentioned today.

2. Data cleaning is a unavoidable chore, and requires creativity - The time-consuming necessity of data cleansing was a common theme, with moderator Anno Saxenian, Dean of Berkeley's ISchool, referring to a colleague's research that 75-80 percent of a data scientist's time is spent cleaning data. Vesela Gateva, Sr. Data Scientist, Eventbrite,  said that she spends 70 percent of her time on data cleaning and extraction. For Nomura, creativity is a big part of her job, even when she's cleaning data:

Everything involves data cleaning. Sometimes that data cleaning is totally different from the last project. The thing I’m doing now: I’m trying to categorize 'What is a hamburger.' Hamburger, double quarter pounders...   It’s a really interesting cleaning problem (that needs creativity).

3. Setting the right data expectations is an art form - Almost all panelists agreed that managing user/business expectations around data projects requires rigorous communication. It's not necessarily a fun part of the job, but as Gateva, says, there's no way around it:

One of my least favorite parts of the job is setting the right expectations and communicating with the other teams and the time it takes to solve a specific problem. You have to sometimes tell people that what they’re expecting is just not realistic. They have no idea what it's like to actually to work with the actual data, and how messy the data we have is, and how long it takes to clean it, and to come up with some answers.

4. Asking the right questions can yield business results - The panel emphasized that asking the right questions - both to the business users and the data sets they are crunching - is they key to producing a good result. Elena Grewal, Data Scientist, Airbnb shared how rethinking reviews led to an breakthrough:

Airbnb has a review system where a guest reviews the host and the host reviews them back. We got complaints from people, saying that when they had a bad experience, they felt like they couldn’t write an honest review because they were worried that the person would retaliate back against them. We saw that in our data, and so we suggested changing the system so that when I leave a review, you can’t see it until you leave one back for me. It’s a little change but we ran it as an experiment.

It was really interesting to see the results.  One of the unexpected results was that our review rates skyrocketed. People wanted to see what the other person had written about them. All the time, there are these really amazing insights about how people behave and how we can improve experiences. It’s really fun to take the leap from doing the research to actually having an impact.

5. Industry/domain knowledge matters, but other team members can compensate - The audience wanted to know how much domain knowledge impacts the ability to apply data science effectively. This was perhaps the trickiest area for the panel. They largely agreed that domain knowledge matters, but some believed you could either acquire that on the job, or by working closely with industry experts on the project team.

Katharine Matsumoto, Data Scientist in Product Intelligence, pointed out that good project sponsors can bring a domain knowledge view:

The best projects are the ones where you’ve got a great sponsor for that project. Often times, they are the domain expert. It’s a balance between picking up the pieces that you need in order to do the job, but also finding the right people to work with on projects, to make sure they’re impactful, because they’re invested in the project as domain experts.

Advice to aspiring female data scientists

The panel had plenty of hard-won advice for aspiring female data scientists. Many had workplace anecdotes of being the only female on a project. Tips to succeed included:

  • Find the right mentors who can raise you up and guide your skills growth
  • Don't be intimidated by your data science knowledge gaps - we all have them, so dive in
  • Don't get sidetracked if you find you're the only female on the team - follow your natural curiosity
  • The ability to problem solve and ask the right questions trumps perfect knowledge

Gateva's advice reflected the panel's views:

Once you have a very genuine curiosity in a quantitative field or anything science-related, let your curiosity be your main guidance. You shouldn’t think that you’re a woman. I never aspired to be a data scientist. It’s a very recent term. I just ended up being one. All I knew was that I wanted to apply my quantitative skills, solving interesting problems.

Women in general tend to give themselves less credit than they deserve. What women should know is that once they have the curiosity, and the basic fundamentals of probability and statistics, computer science, and machine learning, they can figure out the rest on their own.

Final thoughts

As the panel pointed out, the experience of being the only woman on a data science team is indicative of a vexing and persistent problem. Recently the Berkeley ISchool blog published a woman in tech infographic with the kinds of (discouraging) numbers you'd expect, with a lack of female representation both at the hands-on and management levels. (Note: our own Cath Everett recently posted a strong piece on this topic also (Monster initiative to get more women in tech – can it work?).

Those problems are beyond the scope of this piece, but it's good to bear in mind that motivational advice falls flat without structural changes, a la Marc Benioff's review of all Salesforce employees to (hopefully) correct gender pay discrepancies.

For broader lessons, this panel covered more than I had time to dig into here, but other notable themes included:

  • The importance of data infrastructure experts (and owning your own data, if not your own data warehouse)
  • The need to prioritize effectively and narrow the project scope, in order to compile quick wins, and
  • The value of continuous learning, and the many resources aspiring data scientists can avail themselves of, from industry events to Stack Overflow to their own open source sandboxing.

That's a good note to end on. In a field that isn't easy to break into due to the sophisticated skills combinations (statistics, programming, modeling, predictive modeling, etc), it's nice to hear an emphasis on that career lifesaver: insatiable intellectual curiosity.

Full panel replay:

Image credit: feature image is YouTube video screen shot.

Disclosure: Salesforce is a diginomica premier partner as of this writing.

A grey colored placeholder image