A look at why Transport for London online, a Microsoft house, chose AWS
- Summary:
- Dan Mewett, solution architect for Transport for London, explains that AWS allows him to mix and max solutions and deal with requests on a huge scale.
London's travel network is hugely complicated. There are over 30 million journeys made across the capital on a daily basis, via a wide range of transport options, including on the tube, on bus, by rail, on bike, and on foot.
Even those of us that have been living in London for a number of years, and are very familiar with the transport network, we still often require some level of planning assistance to make a journey from one part of the city to another.
And this planning assistance often is required in real-time. If I've been to an event in Soho in the morning and need to get back home, my route could be made up of a number of modes of transport at any given time. Depending on the time of day and the level of service on the network, my quickest route home could just be a bus, or it could be the tube and then a bus, or it could be two tube journeys and then a bus, or it could consist of a tube journey, a train and then a short walk.
How do I decide on the best option? My best bet is Transport for London's website. Although places like Google Maps also can create routes for you, Transport for London is the most reliable option because it takes into account disruption in real-time and will reroute accordingly. It's a website I use throughout any day I'm getting around London.
This week at the Amazon Web Services Summit in London, Dan Mewett, solution architect at Transport for London, provided some insight into the complexity of delivering this information to millions of people travelling across the city on a daily basis and also gave some reasons as to why his organisation, which is traditionally a Microsoft house, went for AWS over Azure. Mewett said:Our fundamental asset is the Transport for London (TfL) website. What do our customers want and what do I want from AWS as a service? We have got a really complex customer model at TfL, because everybody is different. There's lots of different things that we can model about our customers. Customers want value for money, things to be on time, they want to feel safe, they need reliable information – when is my bus service coming? When is my tube service coming? Is it delayed? Which stations have a toilet? One of the key delivery channels for this website is the TfL website.
Back in 2007 we had a website and it had a customer satisfaction of 70% and it was very good. It had lots of good feedback at the time, it provided all the information people wanted when they needed it. Except it was a little bit old fashioned and in 2012 we decided that we needed to reengineer the website.
The old website had to deal with high volumes and the way that we dealt with high volumes was through a CDN, which meant that the website had to be fairly static. Our ambition for the new website was to make it location capable, so that you can use your mobile phone to deliver services that are near you. We wanted to make it possible to personalise.
Mewett explained that the TfL website gets approximately 3 million page views a day, with about 600,000 unique visitors. And as is hinted above, one of the main reasons for people visiting the website is TfL's journey planner, which puts commuters on the most viable route across the London network. And as already highlighted, this can be incredibly complicated.
Mewett said that one of the main reasons for choosing AWS was that it was both able to deal with this level of complexity, in real-time, and it allowed the Transport for London engineering team to mix and match technologies onto the platform. He explained:
The tube map is complex enough, but if you were to add that we have over 700 bus routes, plus all the other forms of transport around, London is complicated. What we want people to be able to do through the website is plan their journeys. Because of the size and the number of visitors we have, we need to make sure that we can meet the demand. Our Journey Planner has to deal with 18,000 stops, it has to cope with 700 routes.
So if you are planning a route from A to B, and the main artery for that route has a delay, our Journey Planner will route you around that. In order for that to work, you need to have a hell of a lot of data coming into the system all the time. We deal with 750,000 journey planning requests a day, so again it's quite complicated.
We have to show you our status information across all modes of transport, jam cams, bike statuses, we have to provide this at volume so we have to have some very smart engineering. We use a bunch of AWS technologies for this, but we also use a caching technology called Varnish – and one of the reasons we chose AWS is that it is very good at actually allowing you to build what you need. We considered things like Azure, given that the majority of our stack is Microsoft based, but it can't allow us to mix and match the solutions to deliver these of problems. That's one plus point for AWS.
This real-time complexity is also seen on London's bus network, which consists of over 8,000 bus stops, with
any number of routes going through each stop. TfL aims to give commuters information on what bus they need and when it will be arriving, as they are standing at a bus stop, in real time. However, Mewett said that this situation means that Transport for London is dealing with 130,000 predictions every thirty seconds. He said:The key thing about the delivery of that information is that if you get it with any latency, then the information is no good. If a bus is due in two minutes, but I've told you that two minutes too late, you've missed the bus. So we had to make sure that we can deliver this on our website.
The implementation that we have come up with uses some smart technologies, such as WebSockets, which allows us to push the information to citizens as soon as the information is available. The alternative to this is that you poll me every five minutes. Coming back to AWS, in order to use WebSockets, you can't just use any old type of load balancer, you need to use a TCP load balancer – through AWS we didn't have this problem, we were able to build what we wanted to, and it processes that service for everybody.
However, what's the main reason for TfL going with AWS? Snow days.
Yup, those few days a year when London experiences a light frosting across the capital were enough to make Transport for London shift its infrastructure for online into the cloud. Mewett said:
We could have gone down the traditional route, we had a traditional data centre, but we considered the reasons to go with cloud – the main reason, snow day. On a snow day, all of our services go
awry. We have major disruption, because all our daily commuters suddenly surge onto the website. I would say that your average person that commutes probably doesn't look at the website on a daily basis. But when you have a severe disruption, they need to know what's running, when the next bus is coming, all these kinds of questions. This creates a situation where we get a huge spike and we need to be able to handle that spike.
The thing is, those services that I've described, journey planning, status boards, those are complex. If I wanted to provision that hardware to enable me to handle that spike, I'd have to buy a huge amount of hardware. So the number one reason that we chose AWS was to deal with a snow day and we wanted to use auto-scaling to meet with demand.
This would unlock a massive amount of cost savings, the cost savings are huge. If you had to have 20 times your current deployment on stand by to deal with a massive spike, which might happen five days a year, the cost savings in terms of physical hardware were phenomenal.