Auto Trader is the UK and Ireland's largest online automotive marketplace. Its digital platforms have over 50 million visits each month and it lists more than 470,000 cars each day. With this high level of engagement, the company has access to vast swathes of data, which it uses to not only better its relationship with advertising customers, but also to improve the products it builds for end users.
In recent years Auto Trader has shifted its data strategy from on-premise Spark clusters to creating a data lake and intelligence platform within a multi-cloud environment. However, more interestingly the company has realised that in order to fully take advantage of data internally, it needs to democratise the use of data for all users. As such, Auto Trader has embarked on fostering a ‘culture of data' by setting up its own data academy in order to train and democratise intelligence across the organisation.
Edward Kent, principal developer, data engineering at Auto Trader, spoke with diginomica about the company's journey towards comprehensive data use for a wide variety of internal users. He explained how three years ago Auto Trader began by using totally open source software - a Spark cluster - on premise, but that this quickly proved to have limitations. Kent says:
At that time all of our hosting was on premise and we had two data centres in the UK and we'd not really used any cloud technology, so it felt natural to try and do it all in house. That was a bit of a failure, for a number of reasons, including the operational overheads in managing a Spark cluster and predicting what our storage costs were going to be and then making sure we had the required hardware.
As Auto Trader began to shift to the cloud more broadly as a business, Kent and his team decided to take the company's data platform into a hosted environment too. It started by using AWS and its Elastic MapReduce EMR service, which is essentially a managed Spark cluster, which began the organisation's shift to cloud environments.
It now has a hybrid setup. Auto Trader now has a data lake, an unstructured data store, running in AWS (S3). On top of that it has Apache Spark, using DataBricks, which allows the company's data scientists to run low level analysis on its raw unstructured data. From there, data flows into Google Cloud BigQuery and it is using Snowplow as an event tracking framework to run web analytics with Looker. Kent explains:
One of the challenges that we've had with our data platform is that it's very easy to collect data, but surfacing it is where we initially struggled. We initially used AWS's BI tool, QuickSight, and struggled to gain traction in surfacing data - even just building dashboards was pretty painful with that. So we didn't get very far. So we knew after a bit of use with QuickSight that we needed something else. Looker came about from a recommendation from SnowPlow, which had had good success with customers using Looker in conjunction with SnowPlow and BigQuery.
The use case
Looker is deployed for a variety of use cases internally at Auto Trader. For example, the data platform is used by the company's retailer development support squad, which is essentially the sales and support team. Kent says:
They're out there dealing with our customers, talking to them day to day. As part of those conversations they find it useful to be able to talk about how their customers are performing. The kind of key metrics we report on are things like advert views, search views. And to be able to talk to our customers about how they're performing relative to their peers. A big use case is a suite of dashboards that supports that team.
It is also used by internal product teams to do AB testing on how new products are performing, such as Auto Trader's vehicle check tool. Vehicle check brings external checks on vehicles - such as if the car has been stolen, outstanding finance, etc - into the Auto Trader marketplace, making it easier for users. Kent says:
We want to report on a) the proliferation of the product, how many dealers were taking it up, and b) also show there was an uplift in performance for adverts that had that check on vs those that didn't.
For us the key metric is usage. Are people coming back? Are they getting value in dashboards that we're giving them? Are they using them as part of their day to day workflow? Number of active users, number of active dashboards, is a good metric.
A data academy
With Auto Trader having set up a comprehensive data platform for the organisation, the mindset internally shifted from tooling and tech towards culture and skills. The organisation has set up a data academy internally, which was inspired by a similar approach taken at Airbnb.
The aim of the academy is to educate and train a variety of users on the value of data, giving them the skills that they need to get intelligence without requiring a highly skilled data expert on their team. Kent explains:
The reasoning behind [the academy] was that it's absolutely not possible to give every single team their own data scientist and own data analyst. If you're scaling data to the point that you want every team to be using data as part of every decision they make, then the only scalable way to do that is to upskill the team themselves in terms of how they use data and how they understand data.
To that end, we've been running a series of courses internally on all aspects of our data platform. Everything from how to write Spark jobs for developers, so existing developers that just need upskilling in lots of technology that we are using. Also, data modelling, that might be aimed at developers, it might also be aimed at a data analyst or people with a SQL background. We have also then been running a series of Looker courses as well, which is effectively focused on using Looker's markup language so that they can write their own data models.
And then the most popular course we've run has been the Looker explorer course, which is on exploring data for our business users. That's aimed at anyone with an interest in data. It's been taken up by people from a wide range of backgrounds within the business.
Kent says that the organisation has seen extremely positive results so far, in terms of the number of people self-serving and exploring data in Looker, as well as writing their own jobs in Spark. Kent adds:
The culture before we started building this data platform was having a centralised team that handled all data requests. So if you needed anything from a new report through to a new underlying table to be built, you would go to the centralised data and insight team. You'd get a backlog of work. The whole idea is we want to empower teams to do this sort of work themselves.
Empower your users to learn the skills that they need. Don't try and centralise knowledge, make sure that it's distributed around the company.