Main content goes all-in on AWS to power decisions with machine learning

Sooraj Shah Profile picture for user Sooraj Shah June 13, 2018
Expedia has gone all-in on AWS which means business unit can power more of its decision making using Spark and TensorFlow for machine learning

Old hotel bell with global clocks behind © Dan Morar -, part of the world’s largest online travel company Expedia, wants to power all of its decisions with machine learning, with the support of Amazon Web Services (AWS). That’s the ambition of Vice President and Chief Data Science Officer of the company, Matthew Fryer.

Expedia started using AWS back in 2013 to speed up various large-scale projects using capacity and traffic management. It has since been moving the vast majority of its workloads to AWS, and at AWS re:Invent last year, the company announced that it would be going all-in on AWS. That will include standardizing its use of AWS machine learning technologies across all of its brands, including, HomeAway and

Speaking to diginomica at AWS Summit London last month, Fryer explains that this will mean all of Expedia group’s applications, websites and products, along with supporting technologies such as those focused on data and machine learning, will move from its data centers over a period of time onto the AWS technology stack. He explains:

The data engineering side were the early adopters of the tech stack, where capabilities of innovation, curiosity and in particular integration and elasticity services were core benefits to the company.

Building on top of those initial benefits, has been using AWS machine learning services to deploy a variety of modules that add more intelligent capabilities to processes such as bidding on search engine marketing, providing post-booking recommendations to travellers, and matching specific hotels with the best prices for each booked trip.

Fryer took to the stage at AWS Summit to state that as the company’s scale is so huge, increasingly the only way it can efficiently operate is to use machine learning with the underlying support of data – or what he called “the rich oil that can power a lot of our services”.

Legacy machine learning

Fryer explains that much of’s business has actually relied on algorithmic processing for many years, even before machine learning became a buzz phrase within the IT industry:

Machine learning is really the core of why we exist –  to help match customers with our partners – such as hotels, flights, cars, location rentals, and give them the best price. To do that requires a machine learning matching algorithm, and we’ve had that for over 20 years.

But the challenge with machine learning in Hotels,com’s business is the speed of innovation and the degree of complexity it can operate in, he adds:

That’s not to say we wouldn’t have continued to innovate on-premise, but AWS allows us to innovate even faster.

In its infancy, had relied on basic database technology, a SQL server, from which the company was running spreadsheet models, before migrating to using Hadoop SQL queries.

At the time this made sense, as while machine learning operating in a single machine is relatively straightforward thanks to the likes of Python, Scala and libraries like Pandas, this wasn’t the case when it came to higher volumes of data. Fryer explains:

Historically people played Moore’s law with this, where you could get bigger boxes that could get you so far, but as multi-core processers came about, just to scale the opportunity meant you had to go into clusters which meant the Apache Hadoop stack was needed to start with – at the time it was state of the art.

Apache Spark and TensorFlow

From there, the company slowly moved closer to what he now considers modern-day machine learning practices.

It’s about being able to create, train and deploy workflows and at the core of that is Apache Spark – it is the bedrock. We started using it on-premise five years ago and then increasingly as we moved to the cloud it has been the bedrock of our operations, and as we move forward technologies like Apache, Amazon, .NET and TensorFlow will all be core to us.

Apache Spark helped to offer performance benefits over Hadoop. Fryer suggests that whatever question was asked of Hadoop would always take ten minutes to come back with an answer – and this wasn’t because of the infrastructure, but rather because of the way Hadoop was architected. Nothing could help to make it better until Spark came along, offering the company better scalability and performance in a clustered environment.

Now, geared with Spark, and using Databricks and AWS Sagemaker, is able to use the open source machine learning framework TensorFlow more seamlessly, and this has stimulated something that goes beyond mere technological innovation. There’s a data science and AI engineer skills shortage that Fryer believes is made worse because many data scientists are not being used to their full potential:

If you make things easier to use, you increase curiosity and you increase innovation cycles – these technologies are taking a heavy load off.

90% of data scientists’ time is invested in lower value data science tasks such as how to get data, how to enrich data, coping with data quality issues and finding it amongst organisations and enterprises.

If we can change that equation, I will double my data scientists’ productivity and make them much happier.

All-in on AWS is using almost all of the services AWS provides, and using a lot of the products in different areas of the business. For example, its service team has used AWS to figure out how it can use chatbots in a smarter way – ensuring that people interacting with the chatbot can speak to someone over the phone if they want to.

It is also using EC2 P3 to analyse unstructured data such as text and images to improve user experience, and is looking into the use of Alexa, so consumers could communicate with the company just as if they were talking with a travel agent – but with the convenience of existing travel websites.

Perhaps most significantly, the company has worked out how it creates and captures data sources and moves these around the business. Fryer says that is now at a stage where this is easy, thanks in part to Amazon:

If it’s not easy, people talk about priorities, processes and making choices – when it’s easy, you don’t have to make choices, the conversation is different.

One of the main differences in the way data is used in the company has come as a result of machine learning. Fryer explains that data now flows back around the source instead of being analyzed in isolation – and this has given engineers who track, build and send data into the data stack a completely different experience.

The cost delta has made a big difference too. To build the technology the company currently has on-premise would have cost in the region of seven figures, says Fryer. runs this in the cloud at five figures, and this, he says, has enabled the company’s employees to be curious again, spurring on innovation at the company – a critical component of digital transformation.

A grey colored placeholder image