Phil Tee is a serial entrepreneur, data scientist and CEO who over the past 30 years has become a legend in the IT service assurance industry which is focused on keeping critical IT up and running and lessening the business impact of service interruptions.
In 1993, two years after graduating from the University of Sussex, he co-founded Omnibus Transport Technologies Limited to commercialize a platform he had created called Netcool that was later acquired by IBM. That technology continues to be the core of the management support systems in many of the world’s largest networks in the form of IBM Tivoli Netcool. He has led other startups to successful exits--RiverSoft, which went public, and Njini, which was acquired by Riverbed.
Tee founded his latest startup, Moogsoft, in 2011, with cofounder Mike Silvey. The company's solution is based on research the founders initiated at the University of Sussex to determine how data science, machine learning, and natural language processing could be applied to IT operations management as a way to improve service assurance.
Tee told me in a telephone interview last week that they realized fairly early on that combining algorithmic and human intelligence could provide the kind of total visibility into the state and performance of IT systems that successful digital transformations would require as software environments became more complex. Said Tee:
The service assurance industry is basically reactive to changes in digital infrastructure.When companies restructure digitally, it also changes the way in which they manage that infrastructure and guarantee service. The movement toward shifting the workload off of the mainframe and onto a distributed concept of computing has greatly accelerated with the mass adoption of cloud, hybrid cloud, containerization and so on.
This has had an incredibly impactful effect on service assurance because the complexity involved is orders of magnitude more difficult than the traditional enterprise. It has probably been the biggest revolution in my professional lifetime.It was clear that we were approaching a tipping point when humans would need additional automated tools to handle the issues of complexity and scale.
The result of Tee and Silvey’s research was a new operations management product based on AI, a new company called Moogsoft, and the pioneering of a new industry category called AIops (although it didn’t get a proper name until Gartner named it in 2016). Originally called Algorithmic IT Operations, now Artificial Intelligence for IT Operations) Gartner’s official descriptionof AIOps is:
AIOps platforms utilize big data, modern machine learning and other advanced analytics technologies to directly and indirectly enhance IT operations (monitoring, automation and service desk) functions with proactive, personal and dynamic insight. AIOps platforms enable the concurrent use of multiple data sources, data collection methods, analytical (real-time and deep) technologies, and presentation technologies.
AIOps works with existing data sources, such as traditional IT monitoring, log events, application and network performance anomalies, and so on. All data from these source systems are processed by a mathematical model that is able to identify significant events automatically, without requiring laborious manual pre-filtering. A second layer of algorithms analyzes these events to identify clusters of related events that are all symptoms of the same underlying issue.Said Tee:
Software algorithms can process millions of events in just a few milliseconds but even better they can derive meaning from large data sets on their own with—and without—human input. They can distinguish between incidents that are high entropy, which require immediate attention, and low entropy, which are less serious.By automating the process of analyzing and correlating event data. Algorithms enable humans to focus on the tens of incidents vs. millions of events/alerts that overload them every day.
This level of automation means incidents can be detected instantly without requiring humans to manually connect the dots across various tools and silos. AIOps can also automate incident ticketing, notifications, knowledge re-use, and decision support. In fact, what takes humans hours to achieve can be done in milliseconds as alerts unfold in the environment. This real-time insight now allows IT operations to be proactive 24/7.
Like most emerging technologies, AIops has had its share of hype and hiccups on its road to mainstream acceptance. Moogsoft had one of those hiccups in 2018 when it had to lay off about 30 employees. But, the market has made a healthy recovery and Gartner’s 2019 AIops Market reportsays “has grown to become a near requirement for successful network operations” and predicts that “by 2023, 40% of DevOps teams will augment application and infrastructure monitoring tools with artificial intelligence for IT operations (AIOps) platform capabilities.” Tee said:
The rate of market adoption has accelerated tremendously in the past 18 months…two years and an even faster pace toward profitability. We’re now looking to a double digit market penetration.Fundamentally, we’ve seen a massive acceleration in the number of enterprises that are initiating discussions with us instead of us having to research and go out and find these opportunities. And, the fact is, it’s still very early in this
Moogsoft’s customers include Qualcomm, Fiserve, Fannie Mae, KeyBank, Northern Trust, Royal Bank of Canada, SuccessFactors, HCL Technologies.
AIOps is an important part of the digital transformation wave that is sweeping through virtually every enterprise right now.
For many, it’s a matter of survival. Be more agile or die. As enterprises seek to achieve agility by moving from monolithic to modular architectures--in the case of software, by adopting a DevOps software development lifecycle methodology—they raise the complexity level of those systems exponentially. This has massively complicated the support of IT environments, creating the demand for continuous operational assurance (All customer-facing services and applications must be ‘always-on.’) and the need for ‘cross silo’ situation awareness.
In this environment, AIops looks to be a lifeboat in the storm. 2020 may be the year when user cases establish whether it really is or not.