Both companies have aggressive expansion plans, so it’s never been more important that Just Eat’s website and mobile apps can take the strain on holidays and weekends, when the public’s appetite for pizza, prawn korma and Pad Thai soars.
There’s a lot of complexity, however, behind Just Eat’s customer front-ends, which serve around 19 million customers across 13 different countries worldwide (10 million of whom are in the UK). According to the company’s director of technology Richard Haigh, what customers see as ‘the product’ is made up of around 400 discrete services, each typically managed by its own team, he explains:
So a service could be something as customer-facing as the iOS app, or it might be more back-end, like a service that talks to order pad systems in restaurants that the staff use to manage incoming orders. Or it could be a data store; we run many large SQL databases and each of these might be worked on as a service by a dedicated team.
However diverse they are, what services tend to have in common are their dependencies; in other words, the vast majority interact with other services. It’s that complex mesh of technologies that make up the full Just Eat product suite and understanding what’s happening within that mesh is key to tackling issues before they become apparent to hungry customers, says Haigh.
This is where the company’s Spring 2017 implementation of Application Performance Management tools from AppDynamics comes in. Says Haigh:
We’re clearly dealing with significant technical complexity and scale and part of what my team does is to provide monitoring, logging and alerting services across this infrastructure. So we’re really interested in knowing what ‘normal’ looks like, so that we can recognise ‘abnormal’ when we see it. What AppDynamics has given us is the ability to get a very clear view of what’s going on.
Just Eat relies heavily on open source tools to provide monitoring and logging for its 400 services running across thousands of Amazon Web Services (AWS) instances in the cloud, as well as for an alerting system –but AppDynamics sits on top, providing a ‘single pane of glass’ into the entire estate, through dashboards tailored to roles and responsibilities.
At a glance, says Haigh, it’s easy to see how Tier 1 and Tier 2 services are performing; the former are the services that need to work for a customer to successfully place an order; and the latter are services that provide background support to the restaurants that receive the orders. In other words, these are the services that are most critical to an order being taken and fulfilled. If developing issues are spotted early, Haigh’s team can intervene, delivering a triage service, so that problems are nipped in the bud.
AppDynamics has been so successful at Just Eat, he says, that he’s looking to expand its use:
Whereas we started with basic instrumentation around our core services, we’re now adding other services that we think are important enough to warrant being ‘up in lights’ on our dashboards. We’re also using AppDynamics to develop a number of new dashboard designs so that as well as the high-level view, we can also have dashboards for individual feature teams that really care about a subset of a service, so they can see the nuts and bolts and dependencies, but within the specific context of their particular domain.
It has also helped his team spot dependencies that are no longer required, allowing to make code changes and simplify the estate as part of general good housekeeping, he says. Finally, knowing whether or not systems and services are under strain provides valuable clues to the cloud-based resources that Just Eat needs to keep running at its busy and not-so-busy times:
It could be that there’s been a lot of marketing around a particular weekend and customer demand is high. I’ll want to know how much stress that’s placing on our infrastructure so that I can scale up our systems. Or it could be that we’re having a quiet day and actually what I’d like to do is scale down, saving money for the business. It’s given me a lot of flexibility here.
But what’s most important, of course, is staying up and running. Using AppDynamics flow maps, my team can very quickly identify if something looks like it might be going wrong, and identify which teams are involved there. We’ve certainly had events where we’ve seen information through AppDynamics early enough and engaged our teams on triage so that it never became a problem for customers.