Keptn - keeping a tight rein on autonomous clouds with a NoOps option

Profile picture for user mbanks By Martin Banks January 13, 2020
Summary:
A new tool from Dynatrace claims to automate much of the low-level 'grunt’ work of managing the delivery of cloud services. 

Image of someone using cloud computing

One of the fascinating sub-texts of the cloud is the way that a concept which should, arguably, have removed most of the barriers of technical incompatibilities that had kept IT systems locked perpetually in their silos of exclusivity, has instead simply grown its own layers of technological complexity and mystique that maintains the phrase ‘ease of use’ as a fairy tale from another universe.

There are signs that steps are being made to circumnavigate these issues, however, with the term ‘autonomous cloud’ starting to feature more prominently. It is fair to suggest that it will be far too easy, and far too hopeful, for users to read too much into the word ‘autonomous’. There is nothing around that is going to take over the job of running everything in a cloud environment. But there is help now available for a good bit of the nitty-gritty grunt work of managing the detail changes required by any change in workload, allocated resources and similar tasks.

This is certainly the role that Dynatrace is pitching at with Keptn, which is a packaging-up of a range of tools used internally by the company to provide automation and orchestration services on behalf of customers. Feedback from those customers suggests to Dynatrace that there is a market for process automation and orchestration in cloud-native environments where continuous service delivery is a major goal.

In short, the goal is to create a NoOps capability, at least for the general user base of the Dynatrace Applications Performance Management tools, that captures the lessons the company has learned and the best practices it has developed during its own development of a road to building a NoOps environment.

The result is what Alois Reitbauer, Chief Technical Strategist and Head of the Dynatrace Innovation Lab, calls a simple pluggable control panel that automates the continuous delivery pipeline linking development with production, a.k.a. Keptn. This provides a declarative way to specify multiple continuous delivery pipelines for hundreds of micro-services, while automatically generating all the plumbing that underlies them.

He states that customers can automate operational tasks, such as reacting to failed deployments based on performance and business feedback or remediation for production problems, in a simple and readily-maintainable way. The control panel separates process definitions from the actual tool integrations and orchestrates processes at runtime, increasing manageability and adaptability.

The company is following the open source route, via a GitHub repository, to make Keptn available to its user community. It is also open to those that are using the Prometheus open source monitoring system. This also means that customers and partners can develop their own additions to it and feed them back into the repository - on the fly if necessary.

NoOps

The rise of the term `NoOps’ is one that is likely to attract CIO interest and operations team fear in equal measure as the move towards autonomous cloud services gathers pace. The obvious implication is one of staff rationalisation and all that that entails. It is a subject Reitbauer is happy to address:

It is a term that usually creates a lot of friction, or let's put a positive spin on it, a lot of interest. The whole autonomous cloud project consists of two levels: the enabling of the technology, and obviously Keptn is a key part here, but also the continuing cultural change in an organisation. We want to help people build a system where most of the manual processes they are doing today and have been doing over the last 20/30 years, are automated.

This does play to the argument that, if cloud services are to work effectively, such automation is crucial. There may not be enough people – or enough KPI management skills – in the world to achieve the same results manually. And as Reitbauer observes, the cloud services world is still at the point where much of what constitutes best practice is being developed and learned. Having best practice as part of the automated capability is far more efficient than any staff education programme. A growing amount of deployment of such processes is already automated, but he sees Keptyn being the text step – the automation of when and why such deployment is triggered.

There is obviously a level of AI capability underpinning what Keptyn delivers and here Dynatrace has addressed what could be one of the more unnerving factors for most CIOs - the ability to know what the AI system is doing and why, rather than just find, post hoc,  that it has done something.  So the company has built in a level of explainability:

The problem is with the machine learning approach it is that you will get the results, but you don't understand why you're actually getting them. But when you have to take massive decisions about your core business applications you want to know why you should be doing something like scaling them up or changing the configuration. So you want to ask the system to explain back how it got to a certain conclusion.

For this Dynatrace has adopted the same fundamental approach to AI explainability as taken by Google. Here, the system quantifies the contribution each data factor has made towards the final decision. This makes human intervention possible by determining the parameters used by the AI system to arrive at a decision. The Dynatrace implementation of the core technology is different however, as Google is initially pitching at applications such as facial recognition systems.

Reitbauer also expects to see Keptn coupling up with Kubernetes as an increasingly powerful pairing. He sees it raising the level at which operations can be automatically implemented because users not only get the applications required for a complete service or process, but also the data that is associated with it. This will raise the level of management abstraction at which Keptn can then work.

My take

This marks but a single step along the road to truly autonomous cloud operations, but it does look like one worth taking for those that can, for it does contain a fair degree of proprietary affiliation in order to be really useful. That is probably inevitable, and I expect to see other vendors starting to offer parallel capabilities, no doubt with similar levels of potential ‘silo-ization’. But the next move for Dynatrace – the suggestion of a step up to working in conjunction with Kubernetes – could give it the chance to acquire ‘one-ring’ ‘status, the system that binds and orchestrates discrete process automation tools to build the automated full-service delivery environment that cloud is ultimately going to demand.