Dynatrace gets DAVIS to do what humans can’t in operations management

Profile picture for user slauchlan By Stuart Lauchlan May 8, 2017
Summary:
Here's an example of how – and why – AI can produce something positive and target a focused, definable job that users need done, but humans can’t deliver in the timescale required.

dynatrace

The back end of last year, and the early part of this one, have been the time for mega-hype on the impacts of Artificial Intelligence – and there is still plenty of time for many of the direst predictions to come true. But it is now also a time when some of the smaller, more focused applications of AI are starting to appear, and it is here that most of its beneficial possibilities will probably be unearthed.

Take, for example, the recent introduction of the Dynatrace Artificial Virtual Intelligence System, or DAVIS for short. This is an AI-powered digital virtual assistant aimed squarely at IT operations managers who may be looking for a different way of doing application or digital performance management. The users want snappier response times and there's more data than ever coming out of the monitoring tools. And at the same time the IT operation is expected to the handle all of this change and greater complexity with the same or fewer resources.

We felt that as everything is getting a little more complex we need to have the monitoring tools look for the problem patterns in the data, not just detecting problems but then leveraging that information to go through all of the different permutations as to what could be the root cause and offer up answers, rather than just provide data to the operator.

That is how Michael Allen, VP EMEA at Dynatrace, sees DAVIS, at least. But that can make it sound like it is just a `human-replacement’ solution, whereas it does, arguably, go a bit deeper. In practice it could solve an operations management issue that could soon become intractable.

The growing presence of new micro-service hyper-scale, hyper-dynamic platforms may have many benefits, but there is also real potential for inherent problems to develop. This is because virtual machines or containers are moving around. They are not under the control of humans any more, and it takes a lot longer for humans to find out `where is my app actually running’. That's when users need these platforms and these applications to adapt, says Allen:

So we built that kind of artificial intelligence capability into the platform and thought actually what we've got here is actually building a virtual operator a virtual assistant into the solution.

Calling in

No sooner thought than implemented, as well. Dynatrace has put a voice interface onto DAVIS using Amazon's Alexa technology, together with tools to allow humans to interact with it not only through voice and natural language but also through solutions like Slack where you can interact with written text to what is effectively a DAVIS-Bot.

It means that an Operations Manager can even 'call in' to the service while driving to work and ask what the current problems are. This does have a measure of the `dark and controlling’ about it, for there is bound to be at least one CIO or IT manager who will make the fact that it is possible an obligation.

Setting that thought to one side, DAVIS is well positioned for monitoring and managing the hyperconverged sector. That really does get very granular and impossible for humans to track effectively, no matter how many people are put on the task.

Indeed, it's designed to work with the hyperdynamic and hyper web-scale environment. The number of permutations that would need to be analysed to come up with a high degree of probability as to the root cause of a problem is just beyond the reaches of what humans can do within a reasonable amount of time. That's the challenge today. I think the complexity has gone beyond human reach.

Here is an example of a relatively small environment, about 142 servers. With 30 micro services having a problem, that can mean there is something like eight billion dependencies, or permutations of dependencies, to evaluate to come out with a single root cause.

What the system alone won't do without users specifically integrating other applications with DAVIS is notify other applications so that remedial action can be set in motion - it won't automatically go and spin up new instances of a server to self-heal an issue. However, Allen indicated that many customers are integrating the core platform with orchestration engines to do that.

There are a total of 50 different technologies that DAVIS can be integrated with for such purposes, using app extensions within the Dynatrace offering, explains Allen:

For example, it may be that an organisation is using something like ServiceNow for service management, and once the root cause of a problem has been detected, they may want that brought up in ServiceNow for some of the remediation tasks and workflow.

And the next step, predictive maintenance and management, is also very much part of the plan. According to Allen,  the whole goal of Dynatrace with the Artificial Intelligence Engine behind DAVIS is not just to correlate information but to semantically link it:

The beauty of having this model is that you really understand the causal semantics. What it can then start doing, because it understands what's normal and what metrics lead to a problem, is detect degradations and problems that are evolving before they have any end-user impact. It steals you a head-start on being aware of them, and because you've been made aware of the root cause, you can prevent it from proliferating any further.

This is already having a direct impact for Dynatrace in is move to a more DevOps delivery style. This is important because, while DAVIS is available as a cloud service, many of the company’s enterprise users still seek what they see as the greater comfort of an on-premise operation. The downside of this, of course, would normally be that users have to manually upgrade themselves every time there is an update. So the Dynatrace compromise is to update the applications in the cloud and provide the users with a service it calls 'managed'. And the reason can be seen in the update numbers:

If you rewind three or four years ago we were doing two major releases per year for our customers. Today we're doing 170 deployments a day of Dynatrace into production and one major release every two weeks. So the platform that our customers use to monitor their applications, we also use to monitor our software. We're kind of drinking your own Kool-Aid because without it we couldn't be deploying so much so frequently.

My take

This is a largely positive use of AI to help users achieve a result that could not be achieved any other way, regardless of the number of people employed. It is also a result that could prove to be an important complement to the complexities of hyperconverged environments.