Main content

Forget orange; for Netflix, Big Data is the new black

Jessica Twentyman Profile picture for user jtwentyman July 10, 2014
Summary:
Big Data is the ‘secret sauce’ that helps Netflix continue to develop and deliver popular content at a price point that appeals to its subscribers.

Screen Shot 2014-07-11 at 10.20.49
There’s no data in the world that can guarantee that a new TV show will be a sure-fire hit, but when it comes to launching new content, online streaming service Netflix is able to make some pretty impressive bets, based on a deep, data-driven understanding of its audiences. Forget ‘Orange is the New Black’. At Netflix, Big Data is the new black.

This insight will be vital as the company plans a major assault on new European markets over the course of this year. In May, it announced it was coming to Germany, France, Luxembourg, Belgium, Austria and Switzerland during 2014. Having finished the first quarter of this year with 48 million customers, around 13 million of them outside North America, the company expects to be operating in 46 countries worldwide by the end of the year.

Understanding the viewing habits and preferences of new international audiences is a massive big data challenge, according to Justin Ward, manager of the data science and engineering team at Netflix. But, he adds, his team’s track record in the area of analytics is already pretty impressive, with the skills and architecture already in place to dig down into the two billion hours of viewing that Netflix subscribers notch up each month:

We use a lot of data in our day-to-day decision-making. Data is crucial to our business. We’re known for our great recommendation algorithms that allow us to make suggestions to subscribers about what kinds of content we think they’ll enjoy, based on what they’ve watched already and what people like them have watched.

We also do lots of predictive modelling. We want to make sure we have early feedback on any content that’s bringing us new customers. After one day, we can predict the lifetime value of a customer, based on their activity for that day.

Netflix’s Big Data efforts also enable it to quickly detect and respond to any service-delivery problems, he says:

If we see, for example, a number of sign-ups on a certain kind of device that aren’t interacting with the service very much, and therefore might leave the service, then maybe there’s a problem with our delivery to that kind of device. We can pick out very small anomalies across a very large data set and respond quickly.


Related stories:


So how has the data science team at Netflix built the architecture that gives them this deep insight? It’s been a long and winding road, says Ward, with plenty of pitstops for re-evaluation along the way:

It used to be so easy - we started small and simple. Our application layer interacted with a single, vertically scaled application database that we would query directly for our analytics. It was beautiful.

Breaking badly?

Beautifully simple, yes - but unsustainable in the face of mounting data volumes. Today, Netflix’s data science environment combines a range of traditional business intelligence tools (including a Teradata data warehouse and reporting and visualisation tools from Microstrategy) with more cutting-edge big data technologies (a Hadoop platform and Hadoop-focused tools such as Hive, Pig and Presto).

Screen Shot 2014-07-11 at 10.21.01
All of it, however, is based in the cloud. The Teradata data warehouse, for example, is in Teradata’s cloud. The Hadoop environment is based on Amazon’s Elastic MapReduce (EMR) distribution of Hadoop and hosted by Amazon Web Services.

Ward says:

Big data means both more and less to a data science team. It’s more storage than you’re used to handling, so you need to look to alternatives. It’s more processing than you’ve ever had to do before. It’s more skills that you need to find correlations and surprises.”

But it also means less - less of the tools and approaches that you’re used to. Less structure, for example, which demands more flexible tools and less maturity in those tools, too. A lot of the tools that can handle this kind of analysis don’t work like you might expect them to. It’s a totally new world and you need to explore it in ways that work for you.

That means tolerating some setbacks along the way, he says:

In the big data world, everything that can go wrong, will go wrong. A lot of interesting concessions need to be made in order to get the sheer volume of data stored in ways that you can use it. Sometimes, information comes out of order. Sometimes, information doesn’t make it into the environment at all. You need to be selective: what are the most crucial pieces of information, which are the ones you can’t be without?

These can be tough decisions to make - but Ward is confident that he and his team are making the right choices. They can now look at around 30 million daily ‘plays’, identifying what individual viewers watch and at what time of day, as well as when they pause, rewind and fast forward. They can see what they search for and how they rate the content they view.

In other words, Big Data is the ‘secret sauce’ that helps Netflix continue to develop and deliver popular content at a price point that appeals to its subscribers. Ward says:

 We can’t afford to deliver everything, all the TV shows and movies in existence, and it wouldn’t make business sense to do that. We want to deliver what our customers want to watch. Without Big Data, that wouldn’t be possible.

Loading
A grey colored placeholder image