Grok combines Machine Learning and the Human Brain to build smarter AIOps

Profile picture for user jbowles By Jerry Bowles February 19, 2020
Summary:
Can neuroscience and data science solve the problem of IT operations complexity?  

Code with brain artificial intelligence AIOps concept © Antonov Serg - shutterstock

A few weeks ago I wrote a piece here about Moogsoft which has been making waves in the service assurance space by applying artificial intelligence and machine learning to the arcane task of keeping on keeping critical IT up and running and lessening the business impact of service interruptions. It’s a hot area for startups and I’ve since gotten article pitches from several other AIops firms at varying levels of development.  

The most intriguing of these is a company called Grok which was formed by a partnership between Numenta, a pioneering AI research firm co-founded by Jeff Hawkins and Donna Dubinsky, who are famous for having started two classic mobile computing companies, Palm and Handspring, and Avik Partners. Avik is a company formed by brothers Casey and Josh Kindiger, two veteran entrepreneurs who have successfully started and grown multiple technology companies in service assurance and automation over the past two decades—most recently Resolve Systems.

 Josh Kindiger told me in a telephone interview how the partnership came about:

Numenta is primarily a research entity started by Jeff and Donna about 15 years ago to support Jeff’s ideas about the intersection of neuroscience and data science.  About five years ago, they developed an algorithm called HTM and a product called Grok for AWS which monitors servers on a network for anomalies. They weren’t interested in developing a company around it but we came along and saw a way to link our deep domain experience in the service management and automation areas with their technology. So, we licensed the name and the technology and built part of our Grok AIOps platform around it.

 Jeff Hawkins has spent most of his post-Palm and Handspring years trying to figure out how the human brain works and then reverse engineering that knowledge into structures that machines can replicate.  His model or theory, called hierarchical temporal memory (HTM), was originally described in his 2004 book On Intelligence written with Sandra Blakeslee.  HTM is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the mammalian (in particular, human) brain.  For a little light reading, I recommend a peer-reviewed paper called A Framework for Intelligence and Cortical Function Based on Grid Cells in the Neocortex.  

Grok AIOps also uses traditional machine learning, alongside HTM.  Said Kindiger:

When I came in, the focus was purely on anomaly detection and I immediately engaged with a lot of my old customers--large fortune 500 companies, very large service providers and quickly found out that while anomaly detection was extremely  important, that first signal wasn't going to be enough. So, we transformed Grok into a platform.  And essentially what we do is we apply the correct algorithm, whether it's HTM or something else, to the proper stream— events, logs and performance metrics. Grok can enable predictive, self-healing operations within minutes.

 The Grok AIOps platform uses multiple layers of intelligence to identify issues and support their resolution: 

 Anomaly detection

 The HTM algorithm has proven exceptionally good at detecting and predicting anomalies and reducing noise, often up to 90%, by providing the critical context needed to identify incidents before they happen. It can detect anomalies in signals beyond low and high thresholds, such as signal frequency changes that reflect changes in the behavior of the underlying systems.  Said Kindiger:

We believe HTM is the leading anomaly detection engine in the market. In fact, it has consistently been the best performing anomaly detection algorithm in the industry resulting in less noise, less false positives and more accurate detection. It is not only best at detecting an anomaly with the smallest amount of noise but it also scales, which is the biggest challenge.

Anomaly clustering

To help reduce noise, Grok clusters anomalies that belong together through the same event or cause.

Event and log clustering

 Grok ingests all the events and logs from the integrated monitors and then applies to it to event and log clustering algorithms, including pattern recognition and dynamic time warping which also reduce noise.

My take

 IT operations have become almost impossible for humans alone to manage. Many companies struggle to meet the high demand due to increased cloud complexity. Distributed apps make it difficult to track where problems occur during an IT incident. Every minute of downtime directly impacts the bottom line. 

In this environment, the relatively new solution to reduce this burden of IT management, dubbed AIOps, looks like a much needed lifeline to stay afloat. AIOps translates to "Algorithmic IT Operations" and its premise is that algorithms, not humans or traditional statistics, will help to make smarter IT decisions and help ensure application efficiency. AIOps platforms reduce the need for human intervention by using ML to set alerts and automation to resolve issues. Over time, AIOps platforms can learn patterns of behavior within distributed cloud systems and predict disasters before they happen.  

 Grok detects latent issues with cloud apps and services and triggers automations to troubleshoot these problems before requiring further human intervention. Its technology is solid, its owners have lots of experience in the service assurance and automation spaces, and who can resist the story of the first commercial use of an algorithm modeled on the human brain.