Cloud services COVID ‘good war’ - three Dynatrace use cases

Profile picture for user mbanks By Martin Banks March 22, 2021
Real world use cases make the proof points for the management challenges associated with ever more complex cloud services.

Hands break chains of technical debt with bright clouds © Merydolla - shutterstock

Cloud services have had ‘a good war’ when it comes to the COVID pandemic. The ready availability of scalable computing resources was just what was required when unexpectedly faced with requirements, such as providing across the board remote working resources.

This has also changed the rules how cloud services are managed from the user perspective, reckons  John Van Siclen, CEO of software intelligence tools provider, Dynatrace:

Today, the industry calls it observability, and every modern cloud needs it. The old multi-tool approach is dead. One holistic view of all data is a requirement - no partial view, no sampling, no blind spots. We knew that the volume, velocity and variety of data would simply be too great to deal with solely on dashboards and that dependencies up and down the stack and across a sprawling set of components and services would be measured in the millions per minute.

Mitchells & Butlers

The Dynatrace thesis has been put to the test in the real world by customers such as Mitchells & Butlers (M&B), one of the largest restaurants and bar companies in the UK, with brands ranging from major family restaurants, through cocktail bars, and all the way down to local pubs on the street corners of London.

To support this network, Mark Forrester, M&B’s Digital Readiness Manager, manages the development pipeline, while at the same time continuously looking for new innovations and new technologies that can be brought to the fore:

My role sits between that and the support side. We need to make sure that we have the right monitoring, that we're getting the right metrics, that the support documentation is handed over, that we've got the service contract SLA aligned to whether third party suppliers, all of the digital services and platforms, conform to the same standard. That means that we can monitor across the board to see what's going wrong and be able to identify which supplier has got a problem.

Dyantrace is used to secure Mitchells & Butlers platform and allow for understanding more about what's going on underneath the layers, as well as using the API of Davis, the Dynatrace AI tools, to help understand where the system pinch points are.

A  lot of additional data was generated  that would be useful to other departments, beginning with the marketing team. For example, when they were doing an email campaign and generating sales off the positive responses,  relevant data, often in real time, was on offer, such as the number of potential customer contacts that were bouncing or what part of the webpage were not constructed correctly. Forrester explains:

We can now visualise with the marketing team about that Return On Investment on an email campaign, the digital assets that were created as part of that cost, the business value or money we can turn that into and what was the actual end result. You can attribute sales to that and then say that there was ‘X’ amount of growth in sales because of this one campaign. Our marketing team understand that now, and that's developed into a new world where they're now coming to us and asking us to help them look at other initiatives that they want to do to help streamline and make that guest journey better.

A major marketing opportunity comes at Christmas, probably the busiest times in the restaurant and bar trade. An obvious stumbling block with the guest journey is when they find there are no tables available at their favourite restaurant. So why not have the capability to search for available tables at other M&B restaurants in the vicinity? After all, the data is available as a by-product of the prime support and maintenance function, in real time.

In that respect, the ability for Dynatrace powers of investigation in monitoring and tracing the operations of applications and identifying problem areas to extend outside of the environment of a single business is important. Given the number of transactions that occur between business partners collaborating on the delivery of a product or service, and given that a growing proportion of that combined process is now digitized, the primary partner – the one with the customer relationship and therefore prime target for any customer complaint –  is able to identify where across the whole workflow the problem occurred, what caused it, and what remedial action needs to be taken.

Porsche Informatik

Peter Friedwagner is Head of Infrastructure and Common Platforms at Salzburg-based Porsche Informatik, a leading IT provider of information management services for the automotive retail business, and has been involved in the firm’s migration to Dynatrace from a VMware-based infrastructure:

We started a couple of years ago to modernise our infrastructure with an environment based on Kubernetes, primarily running Red Hat  OpenShift. However, this created a bit of a challenge, as distributed applications across micro-services have dependencies which are really cutting across different teams. This leads into challenges to resolve these issues quickly, particularly for problems where several teams are involved. As you know, customers expect around the clock availability.

An example of this is a car configurator which is used by customers to individually configure the equipment of a new car. This needed to include business KPIs, such as conversation goals, which allowed them to define alerts based on those metrics. This drove the need for integrated monitoring of every component across the full stack, plus the ability to support the autonomous teams responsible for the application development.

These teams are now able independently to customize the monitoring environment according to their own defined business KPIs. They can run the deployments and set the thresholds for anomaly detection on site. One example relates to inevitable JavaScript errors, where the only question is how many to tolerate? This can now be easily answered with an integrated view across the web services, the database, and the network. There is now a common tool for both infrastructure engineers and application developers. Finally, they can trace the capability of the model for automatic instrumentation and reading a battery of different technologies. According to Friedwagner:

This allows teams to concentrate on what they really need to do, develop and deliver the best possible applications. This also demonstrates how intelligent monitoring really provides us with an understanding of the user experience before users face an issue. We are able to identify the root cause and fix it quickly.

Ultimately, what we want to achieve is have deployment issues automatically detected and rolled back without human intervention in order to make sure related applications are as close as possible to 100% available.

Lockheed Martin

David Walker, Chief Architect and Fellow at Lockheed Martin, told a different tale, of how a Lockheed Martin has used Dynatrace to manage licence auditing in a transforming environment. This required Chief Architect David Walker and his team to step back and figure out what they could explore through Dynatrace API:

We came from a legacy approach initially, so we brought everything up into our environment and we started managing things in a very basic, archaic way, in the sense of, Okay, we'll stand this up, we'll deploy this service. And we realised that for us to scale and for us to grow, that we were going to need to really think smarter and harder at this.

The first step was to identify deficiencies, in particular where there was an inability to get the right data to make a business decision. One area of focus was the need to get licence information live for our customers so that they could understand what their consumption rates are. This led to a review of the API data extractions to help determine automated activity, which allowed for data to be pulled out and used to calculate and display the results for its users.

This became the use case goal - to determine the licence consumption by customer, a task which required passing a lot of unstructured data through an API command to get it semi-formatted. It would also need to have the host units grouped by their host groups, with all the data visualised in real time. What they were able to do was develop a live dashboard out of cloud-watch metrics, through serverless technology with AWS Lambda, to extract the data and export that to cloud-watch.

According to Walker, Amazon does “a really great job” at monitoring activity over time and as it as it keeps going the Machine Learning works better and better:

We're able to tell, right now, if somebody puts something that's outside of our standard deviation, and the amount of licences that it normally sees at ‘X’ period of time, so we're alerted immediately. So we went from no visibility to full visibility for this just by pulling in API's to this.

One upcoming project follows on from a recently completed use case of building out a complete customer evaluation environment that automatically starts and ends an internal 30-day customer trial period for licence consumption. The new project looks to bring more automation to customer engagements so that processes can be onboarded through ServiceNow workflows, or even automation design like, but not limited to, auto-creation of standard host groups, management zones, alerting profiles, and dashboards for customers.