AIOps has been the target of some skeptical snipes here at diginomica over the past couple years, being rightly criticized as little more than a marketing buzzword designed to make incremental product improvements seem like breakthrough innovations.
We can actually blame the jargon-masters at Gartner for coining the term, but it didn’t take long for vendors to latch onto the hype-worthy appeal of anything including the phrase “Artificial Intelligence” and re-label their products.
I attempted to demystify the concept last November by profiling “a growing number of companies developing intriguing IT operations products that apply data analytics to network, application and cloud/container infrastructure.” I characterized AIOps as:
the application of data aggregation and lakes, statistical techniques and machine learning to infrastructure management enable powerful new capabilities for IT operations and applications teams.
Gartner takes this one giant leap further by dubbing AIOps as a full-fledged platform that,
combine(s) big data and machine learning functionality to support all primary IT operations functions through the scalable ingestion and analysis of the ever-increasing volume, variety and velocity of data generated by IT. The platform enables the concurrent use of multiple data sources, data collection methods, and analytical and presentation technologies.
However, in describing the market, Gartner admits (emphasis added) that “AIOps can enhance a broad range of IT operations processes and tasks” that includes:
- Application performance analysis (APM)
- Anomaly detection (SIEM)
- Event correlation and analysis (incident management)
- IT service management (ITSM)
Given the diversity of applications and usage scenarios for data-driven AI enhancement that have been demonstrated over the intervening months since my initial article leads me to disagree with Gartner and vendor marketing mavens that AIOps isn’t a platform or product category, but a feature that can infuse many different types of IT operations tasks and software products. In this light, AIOps isn’t something you buy, but a capability you seek when evaluating products. Recent events illustrate that IT organizations face a variety of problems in which the nexus of über-granular data collection and machine learning, aka AI, can improve IT effectiveness and efficiency.
Why and when AI is useful
As Gartner states, AI can be applied to a range of IT functions and related software products. A recent survey by OpsRamp, an IT operations software developer, details the many problems IT practitioners believe that machine intelligence can address, if not outright solve. I’ll make my usual caveat about vendor-sponsored surveys being subject to statistically-insignificant sample sizes (200 in this case), cherry-picked respondent groups (“all the IT decision makers who participated in the survey had already implemented AIOps solutions in their organization”) and questions handcrafted to elicit a particular response, hence I’ll refrain from using reported numbers in preference for ordinal rankings. Nevertheless, the information is useful in illuminating the reasons early adopters have implemented what are leading-edge capabilities that are still on the path to overwrought expectations.
As Gartner’s AIOps categorizations indicate, one of the primary uses for AI in IT operations involves the incident management process and making sense of the overwhelming amount of data flooding operations centers today. Digging deeper, the OpsRamp survey respondents cite these three factors as their top incident management challenges:
- Extracting signal from noise and establishing data accuracy
- Probabilistic, i.e. data-driven, root cause analysis
- Too many routine and redundant tasks
Machine Learning that triggers automated workflows can address each of these. Outside of AIOps, the survey reveals that operations teams rely on domain-specific tools that target a particular product or environment (e.g. virtualization stack) and rules-based filtering (e.g. traditional event management software like Splunk, Sumo Logic or the ELK stack).
When evaluating AIOps products, survey respondents, which remember are current users of some form of AIOps software, prioritized the following features:
- Inference models, namely the expertise and machine learning algorithms embedded in the product.
- Incident visualization to graphically simplify reams of data.
- Data-agnostic ingestion that supports the myriad of infrastructure products and data formats in the modern data center.
- Integrations ecosystem, namely the ability to consolidate the management and data analysis of multiple components under a single UI.
Although not mentioned in the survey, I would add a related sub-feature:
- An extensible API that supports multiple scripting languages to allow adding features and using the AIOps engine to feed other automation scripts.
To reiterate, AI-enhanced operations dsoftware is primarily used to improve routine administrative tasks like event monitoring, alerts and problem diagnosis. According to the survey, respondents value the following features:
- Event context and history in notification systems along with support for online collaboration. I would add that a critical feature of such systems is the ability to correlate events from disparate data streams while filtering duplicates and graphically highlight those related events.
- An ability to work back from correlated events to trace to the root cause of a problem and illustrate all potential ramifications, not just the system triggering the alert.
- Use of machine learning and predictive analytics to accurately establish normal parameters of operations that account for temporal or other predictable fluctuations and then highlight anomalous behavior indicative of an impending system failure or security breach.
- Programmability that enables automating routine, tedious tasks.
The combination of these features allows early users of AIOps products to resolve problems in half the time it took using traditional methods.
Impediments to adoption
As with many new technologies, adding features, machine intelligence in this case, to operations software is the easy part. The real work of improving the efficiency of IT operations involves data management, expertise acquisition and cultural adaptation. Respondents to the OpsRamp survey say that data accuracy, which includes collecting, processing, filtering and verifying the data used to feed ML models, is the source of most concern when implementing AIOps systems.
Acquiring the skills necessary to configure, use and interpret AI-enhanced software is a close second on the list of implementation challenges. Whether through retraining or hiring, operations teams need a significant upgrade in technical capabilities before unleashing such systems. Although it doesn’t provide immediate results, retraining is probably the better and less expensive option in the long run since most respondents said it took almost a year to hire ML engineers.
Finally, early AIOps implementors say they faced a significant hurdle in getting operations teams to trust the system’s analysis and conclusions and generally feared ceding control to algorithms. Indeed, it’s impossible to implement automatic remediation measures that can correct problems based on machine diagnosis without the humans responsible fully trusting the system.
AIOps is a natural evolution of IT infrastructure and application management software that incorporates machine and deep learning, a class of algorithms with demonstrated excellence at:
- Digesting massive quantities of data to find and tag patterns
- Correlate seemingly unrelated events and features
- Flag outliers
- Set baselines for normal operations and
- Ascertain the probabilistically optimal set of steps to fix problems
As such, AIOps is not a platform, but a feature of many products related to IT operations and DevOps. Although Gartner calls these distinct platforms, its own diagram depicts AIOps as a logical addition to monitoring software.
Source: Gartner Market Guide for AIOps Platforms
Although several vendors sell self-described AIOps products, these are destined to be absorbed as features into either existing categories of operations management products or become elements of comprehensive management suites that address the gamut of infrastructure, application and user experience management. Even the cloud services like AWS tout the ability to stitch various services into an AIOps system.
Source: AWS re:Invent 2018 slides, AIOps: steps towards autonomous operations
Indeed, many capabilities evolve from products to features. Indeed, the consumer electronics industry is rife with examples of standalone innovations succumbing to product integration, including:
- Digital cameras → phone cameras
- Handheld GPS units → in-car navigation systems
- Standalone set-top boxes → smart TVs
Machine intelligence is merely the latest technology to pervade both enterprise and consumer products and AIOps will eventually be seen as a critical component of IT operations software, not a new category.