[sws_grey_box box_size="690"]SUMMARY: Increasing online personalisation presents an ‘echo chamber’ trap to the unwitting online explorer - UK TV broadcaster Channel 4 help viewers circumnavigate the trap with open source based predictive analysis [/sws_grey_box]
At Channel 4, head of data planning and analytics Sanjeevan Bala is on a potentially perilous mission. He wants the UK broadcaster to take its TV audiences on a “journey of discovery" via predictive analysis without allowing them to stumble off into the “echo chamber” along the way. It’s a difficult route to navigate, Bala explains, because the echo chamber is such a tempting location in which to settle.
The term is used by Internet theorists to describe the way that online personalisation means we’re all increasingly served content that we enjoy and that keeps us clicking. Over time, there’s a tendency for us to spend more and more time in the echo chamber, surrounded by comfortable and familiar content, rarely venturing outside to explore uncharted territory. Think of the echo chamber like the soap opera that keeps replaying the same themes over and over.
On the face of it, that should be good news for Channel 4: by using predictive analysis to understand the content its audiences most enjoy, it can continue to deliver that kind of content and generate the advertising revenue that funds new programmes. On the flip-side, the echo bubble is a cul-de-sac for both viewers and broadcasters. It’s a comfort zone where viewers’ pre-existing beliefs and tastes are reaffirmed and reinforced. It does little to challenge them, to open their minds to new experiences or to help them to develop new tastes.
Over the longer term, that presents a real risk for broadcasters that need to identify new talent, develop new formats and capture new audiences because ultimately, the content becomes, well, dull. This problem means that a broad-brush approach to personalisation, based on historical data about viewers’ watching habits, can only ever be a short-term solution.
A more nuanced approach, based on predictive analysis and machine learning, he says, is what it will take to guide viewers along the path between comfortable familiarity and new discoveries. With this in mind, one of his main priorities over the last couple of years has been to build infrastructure capable of supporting that kind of analysis. When he joined the broadcaster back in 2011, it was little more than a vague idea.
“The reason I was so excited by the opportunity was that I was looking at a clean piece of paper, in a sector that was changing rapidly. There was a chance here to design and architect something from scratch, because absolutely nothing was in place: no way of capturing extremely high volumes of data, no way to analyse them, none of the data science skills needed to make sense of data. Nothing.”
The scale of the infrastructure he’s built since then is impressive: it currently holds around 170 terabytes of data, representing somewhere between 70 billion and 80 billion viewer interactions with Channel 4’s online content - both its catch-up and on-demand services - across a wide range of digital devices. And it’s all based in the cloud, on Amazon Web Services (AWS). Bala explains why:
“Because this was an entirely new capability we were bringing into the business, and because of the types of business problems we were looking to tackle, I wasn’t able to reliably explain to our in-house IT team exactly what would be required. Basically, we needed an infrastructure that could evolve over time and that would offer us flexible, elastic resources. Above all, I needed to be able to give our data analysts freedom: I didn’t want them to be constrained by size of machine or the processing power at their disposal. I wanted them to be able to spin up as many or as few clusters as they needed, as and when they needed them. What data scientists tend to do is consider a wide range of tools to solve a particular problem. If you constrain them at an early stage in their thinking with commercial concerns over what tools they can use, then you can negatively impact the final result.”
Open source to the rescue
Amazon’s Hadoop-based Elastic MapReduce (EMR) service also offered his team the chance to load huge volumes of data onto infrastructure very quickly. On top of EMR, he explains, Channel 4 is running the Hive and Pig query languages. On top of these, it runs the R analytics platform. At the very top of the stack, meanwhile, sits D3 for data visualisation. For Bala, open source was the only way to go:
“The breadth and flexibility I wanted to give our data scientists meant that we needed to be able to tap into a wider software development community that was constantly evolving - a community that we could take from, but to which we could also make our own contributions. Some of the challenges we faced, from a technical perspective, were ones that I believed we could solve more quickly in collaboration with others.”
That stands in marked contrast to his previous experiences with proprietary technologies from established analytics vendors, he says.
“With proprietary software, if you’re slightly ahead in your thinking compared to the vendor you’ve bought from, or if the apps you’re looking to develop are not ones they’ve got experience in helping other customers to develop, you’ve got nowhere to turn to for support. With open source, we can tap into a vast support infrastructure and keep moving the stack forwards at all times.”
In the race to keep moving forwards, meanwhile, Channel 4 has run trials of other open source technologies and adopted many of them as a result: Mahout for machine learning, Spark and Shark for high-speed in-memory analytics, Amazon Kinesis for real-time data processing and H2O for predictive analytics. Today, predictive analysis beats historical reporting at Channel 4 by a ratio of about 70 percent to 30 percent, Bala reckons. It’s what’s helping the company to build a more complex model for its recommendation engine and also feeds into marketing and sales activities.
“Marketing uses our predictive analysis models to great effect. Understanding the depth of a viewer’s relationship with Channel 4 has doubled everything for them: the open rates of emails, the click-through rates to content; the viewing rates of content. And sales is using our predictive model, too: 15 percent of our digital revenue is now delivered off a predictive model we’ve built for them and that’ll be up to 50 percent by 2016, we think.”
Predictive analysis supports predictive modelling
Over time, predictive modelling will be used more to support the creative side of the business - both internally, with the scheduling and commissioning teams, and externally, with independent producers and advertisers. Today, the big challenge for analytics professionals is not how to tackle a problem, but which problem to tackle first, says Bala.
“When I began my career, technology was a major inhibitor of what you could achieve with analytics. Inexpensive compute power wasn’t there, nor was the flexibility. These have largely been removed as constraints - but now we’ve got the opposite problem: you can store as much data as you like in these new environments, but how do you make sense of it all? That demands a more mature approach: what’s the business outcome you want to achieve, what’s the organisational change you need to effect? Start from there, and only then go back to considering what sort of data and infrastructure you need."